Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Session IDs in google! Have I done the right things?!


Becki

Recommended Posts

Posted

Hi,

 

I have just searched google and the session id's are shown appended to the URl. Anyway I presume this must have happened before i turned prevent spider sessions on in damin, so at the moment i have:

 

Force Cookie Use False

Check SSL Session ID False

Check User Agent False

Check IP Address False

Prevent Spider Sessions True

Recreate Session False

 

I have updated spiders.txt with the latest version from stevel. I have created a robots.txt file with the login pages/checkout etc in.

 

I have also implemented this from Chemo:

 

 

A common scenario is for store owners that were not aware of the "Prevent Spider Sessions" option to have several URLs indexed by spiders with the session ID appended. This situation is troublesome and there are a few options to handle referrals sent through the "wild" session ID URL.

 

However, the true solution to the problem is to REMOVE THE SESSION ID's from the search engine index! So, how hard is it? Pretty easy!

 

In includes/application_top.php find this code:

 

// include the language translations
 require(DIR_WS_LANGUAGES . $language . '.php');

 

Under that paste this code:

 

 if ( $spider_flag == true ){
 if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){
  $location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);
  header("HTTP/1.0 301 Moved Permanently"); 
  header("Location: $location"); // redirect...bye bye  
 }
}

 

This code will redirect the spider to the non-SID URL with a 301 header and over time will remove the session appended URL from the index.

 

So I have done what i think will remove the session ID's from google's index. Although how long is this likely to take?!

 

1) Does chemos code stop the session id being added at all for spiders - so preventing them from ever getting into google - although I presume having the prevent spider sessions on and updated spiders should do that.

 

2) Before i created a robots.txt I didn't have one so at the moment there is a link in google to my login.php page with a session ID attached. Now I have put login.php in robots.txt does that mean that chemos code will not be able to delete it from google because google can no longer go there? If so should i remove login.php from robots.txt for a while?

 

3) What issues are there with people following links from google with the session id attached - Can i delete the session id from the database as it seems as if there is only 1 session ID in googles listings so shoul dbe easy enough?!

 

Many thanks

Becki

Posted

As you said yourself, you only set "prevent spider sessions" to true recently. If you roughly remember the date, you can google for your pages and click on the "cache" link under an entry with a session id. At the top of the page google tells you when the page was indexed. So you only need to worry if you find entries with session ids listed after you changed your settings in osC.

 

The session settings are something where you need to find the ideal combination for your shop. I had some problems with customers using aol to create an account and then to login because aol switches ips assigned to a customer while they are surfing. Here are the settings in my shop which seem to work (meaning no aol using customer complained and the last one could login properly):

 

Force Cookie Use True

Check SSL Session ID False

Check User Agent False

Check IP Address False

Prevent Spider Sessions True

Recreate Session True

 

Now for your questions.

 

1) Chemos code doesn't prevent session id's to be added to spiders. The "prevent spider sessions" in combination with your spiders.txt do that.

 

2) The spiders.txt file should only contain strings used by spiders and bots in there user agent string so that they can be identified by this and osC doesn't add a session id. Remove anything you added there like login.php etc. Chemos code doesn't delete anything from google. All it does is, that if a bot or spider (identified through a string in spiders.txt) accesses a page with an appended session id, it is redirected to the same page without the session id in the url. Over time search engines will replace indexed pages with session ids with the proper urls.

 

3) I wouldn't delete the session ids from the database. You might mess up more than you fix. As far as I know the "recreate session" setting takes care of this. So if someone follows such a link from google, they might find their shopping cart already filled but once they login or create an account, osC assigns them a new session id protecting their personal information.

 

I hope this helps,

 

abra

The First Law of E-Commerce: If the user can't find the product, the user can't buy the product.

 

Feedback and suggestions on my shop welcome.

 

Note: My advice is based on my own experience or on something I read in these forums. No guarantee it'll work for you! Make sure that you always BACKUP the database and the files you are going to change so that you can rollback to a working version if things go wrong.

Posted
2) The spiders.txt file should only contain strings used by spiders and bots in there user agent string so that they can be identified by this and osC doesn't add a session id. Remove anything you added there like login.php etc. Chemos code doesn't delete anything from google. All it does is, that if a bot or spider (identified through a string in spiders.txt) accesses a page with an appended session id, it is redirected to the same page without the session id in the url. Over time search engines will replace indexed pages with session ids with the proper urls.

 

I hope this helps,

 

abra

 

Thanks abra,

 

My spiders files does only contain strings as you say it should - i just used the most recent on in the contribs section. The Disallow: login.php is in my robots.txt file. I just wondered that now the spiders can't get to login.php the listing in google for login.php-ocsid-85464..... will not be corrected with chemo's code as spiders session will not = true. So that listing will always stay in google?

 

Thanks

Becki

Posted

Sorry, my mistake. Got spiders and robots txt mixed up...

 

Actually, there is no real harm if you allow spiders to access login.php as they will not be able to submit to the form. Like you have it now, pages which return a 404 error will sooner or later drop from the index anyway.

 

abra

The First Law of E-Commerce: If the user can't find the product, the user can't buy the product.

 

Feedback and suggestions on my shop welcome.

 

Note: My advice is based on my own experience or on something I read in these forums. No guarantee it'll work for you! Make sure that you always BACKUP the database and the files you are going to change so that you can rollback to a working version if things go wrong.

Posted

What are the potential risks of pepople coming to the site from the google link with the same session ID. OK heopfully it will be corrected with time but if someone comes right now. Would someome browsing see things added to their cart if someone else using the same session id added something to theirs etc? What happens when they try to checkout etc? I have turned the recreate sessions on - it didn't change the session ID when i came from the google link to the site - i presuem it only recreates when someone actually logs on is that correct? basically i just ewant to know if i shoul dbe worried about the session id's being appended to the links in googles index?

 

Thanks

Becki

Posted
As far as I know the "recreate session" setting takes care of this. So if someone follows such a link from google, they might find their shopping cart already filled but once they login or create an account, osC assigns them a new session id protecting their personal information.

No it doesn't. The ID is the same. This one does what you're saying.

http://www.oscommerce.com/community/contributions,4112

Posted
No it doesn't. The ID is the same. This one does what you're saying.

http://www.oscommerce.com/community/contributions,4112

 

If I haven't got register globals installed do I just not include:

 

 

// >>> BEGIN REGISTER_GLOBALS

  // Work-around to allow disabling of register_globals - map all defined

  // session variables

  if (count($_SESSION)) {

	$session_keys = array_keys($_SESSION);

	foreach($session_keys as $variable) {

	  link_session_variable($variable, true);

	}

  }

// <<< END REGISTER_GLOBALS

 

in cat/inc/functions/sessions.php

and make use the change you have already put in your install instructions for catalog\includes\classes\navigation_history.php

 

Thanks

 

Becki

 

Thanks

Posted
What does the 'recreate sessions' via the admin panel actually do then?

Does what it says. It recreates the session. But with the same ID. It may look strange but it's what it does.

 

If you use the contribution it will always differentiate customers once they login. But not visitors who come with the same session ID because until they login this does not take effect. So this means the cart contents before login can still be mixed up. So it will not protect for the cart contents. But it will protect against mixing customer info. You should check the support thread of the contribution for details.

Posted
Does what it says. It recreates the session. But with the same ID. It may look strange but it's what it does.

 

If you use the contribution it will always differentiate customers once they login. But not visitors who come with the same session ID because until they login this does not take effect. So this means the cart contents before login can still be mixed up. So it will not protect for the cart contents. But it will protect against mixing customer info. You should check the support thread of the contribution for details.

 

Ok thanks for the description - i understand now. Was my previous post correct in using the contribution without reg globals installed?

I.E that code can just be left out.

 

Thanks

Becki

Posted

well it explains it in the readme file doesn't it?

Posted
well it explains it in the readme file doesn't it?

 

yes it does, i'll except that one! my mistake. I thought your 'NOTE' in the readme was talking about only the code in nav_history - i must learn to skim read better :)

 

Sorry.

 

Becki

  • 3 weeks later...
Posted

OK, I still have the problem that google still has links with SID's in them - all from one day back in january. I have noticed one link to index.php that hasn't got a SID cached from this month - isn't it strange google only got my index.php page this time? Has it got something to do with chemo's code or my spiders file etc?

 

The problem really is that if someone comes from a SID link in google they get the came cart as everyone else! people are not going to sign up etc (or know to) to be able to use their cart properly (it'll work correctly after they log in as i have installed the contrib noted above.) SO is there a way of removing the SID automatically when someone enters the site? and what implications might this have...if any?

 

Thanks all

 

becki

Posted

Hi,

 

I just looked at my access log, I have:

 

 

66.249.65.161 - - [02/Apr/2007:04:28:53 +0100] "GET /products_new.php?amp;language=en&page=11&action=buy_now&products_id=206 HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.161 - - [02/Apr/2007:04:29:04 +0100] "GET /index.php?manufacturers_id=3&page=1&sort=4a&osCsid=e008df1bccb72bc331b9e201cf4103d7 HTTP/1.1" 301 41992 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

 

 

Does the presence of the 301 mean that the code I'm using is working and that these url's will be removed from the SE index? There are still lots of instances of these osCid's in google though.

 

Should this be the way it is

products_new.php?amp;language=en

i.e the 'amp' part? I read somewhere ages ago about a fix for the Ultimates SEO's regarding the &amp - I haven't implemented it so wondered if this is correct? should it not be &amp?

 

Thanks

 

Becki

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...