Becki Posted March 9, 2007 Posted March 9, 2007 Hi, I have just searched google and the session id's are shown appended to the URl. Anyway I presume this must have happened before i turned prevent spider sessions on in damin, so at the moment i have: Force Cookie Use False Check SSL Session ID False Check User Agent False Check IP Address False Prevent Spider Sessions True Recreate Session False I have updated spiders.txt with the latest version from stevel. I have created a robots.txt file with the login pages/checkout etc in. I have also implemented this from Chemo: A common scenario is for store owners that were not aware of the "Prevent Spider Sessions" option to have several URLs indexed by spiders with the session ID appended. This situation is troublesome and there are a few options to handle referrals sent through the "wild" session ID URL. However, the true solution to the problem is to REMOVE THE SESSION ID's from the search engine index! So, how hard is it? Pretty easy! In includes/application_top.php find this code: // include the language translations require(DIR_WS_LANGUAGES . $language . '.php'); Under that paste this code: if ( $spider_flag == true ){ if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){ $location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false); header("HTTP/1.0 301 Moved Permanently"); header("Location: $location"); // redirect...bye bye } } This code will redirect the spider to the non-SID URL with a 301 header and over time will remove the session appended URL from the index. So I have done what i think will remove the session ID's from google's index. Although how long is this likely to take?! 1) Does chemos code stop the session id being added at all for spiders - so preventing them from ever getting into google - although I presume having the prevent spider sessions on and updated spiders should do that. 2) Before i created a robots.txt I didn't have one so at the moment there is a link in google to my login.php page with a session ID attached. Now I have put login.php in robots.txt does that mean that chemos code will not be able to delete it from google because google can no longer go there? If so should i remove login.php from robots.txt for a while? 3) What issues are there with people following links from google with the session id attached - Can i delete the session id from the database as it seems as if there is only 1 session ID in googles listings so shoul dbe easy enough?! Many thanks Becki
abra123cadabra Posted March 9, 2007 Posted March 9, 2007 As you said yourself, you only set "prevent spider sessions" to true recently. If you roughly remember the date, you can google for your pages and click on the "cache" link under an entry with a session id. At the top of the page google tells you when the page was indexed. So you only need to worry if you find entries with session ids listed after you changed your settings in osC. The session settings are something where you need to find the ideal combination for your shop. I had some problems with customers using aol to create an account and then to login because aol switches ips assigned to a customer while they are surfing. Here are the settings in my shop which seem to work (meaning no aol using customer complained and the last one could login properly): Force Cookie Use True Check SSL Session ID False Check User Agent False Check IP Address False Prevent Spider Sessions True Recreate Session True Now for your questions. 1) Chemos code doesn't prevent session id's to be added to spiders. The "prevent spider sessions" in combination with your spiders.txt do that. 2) The spiders.txt file should only contain strings used by spiders and bots in there user agent string so that they can be identified by this and osC doesn't add a session id. Remove anything you added there like login.php etc. Chemos code doesn't delete anything from google. All it does is, that if a bot or spider (identified through a string in spiders.txt) accesses a page with an appended session id, it is redirected to the same page without the session id in the url. Over time search engines will replace indexed pages with session ids with the proper urls. 3) I wouldn't delete the session ids from the database. You might mess up more than you fix. As far as I know the "recreate session" setting takes care of this. So if someone follows such a link from google, they might find their shopping cart already filled but once they login or create an account, osC assigns them a new session id protecting their personal information. I hope this helps, abra The First Law of E-Commerce: If the user can't find the product, the user can't buy the product. Feedback and suggestions on my shop welcome. Note: My advice is based on my own experience or on something I read in these forums. No guarantee it'll work for you! Make sure that you always BACKUP the database and the files you are going to change so that you can rollback to a working version if things go wrong.
Becki Posted March 9, 2007 Author Posted March 9, 2007 2) The spiders.txt file should only contain strings used by spiders and bots in there user agent string so that they can be identified by this and osC doesn't add a session id. Remove anything you added there like login.php etc. Chemos code doesn't delete anything from google. All it does is, that if a bot or spider (identified through a string in spiders.txt) accesses a page with an appended session id, it is redirected to the same page without the session id in the url. Over time search engines will replace indexed pages with session ids with the proper urls. I hope this helps, abra Thanks abra, My spiders files does only contain strings as you say it should - i just used the most recent on in the contribs section. The Disallow: login.php is in my robots.txt file. I just wondered that now the spiders can't get to login.php the listing in google for login.php-ocsid-85464..... will not be corrected with chemo's code as spiders session will not = true. So that listing will always stay in google? Thanks Becki
abra123cadabra Posted March 9, 2007 Posted March 9, 2007 Sorry, my mistake. Got spiders and robots txt mixed up... Actually, there is no real harm if you allow spiders to access login.php as they will not be able to submit to the form. Like you have it now, pages which return a 404 error will sooner or later drop from the index anyway. abra The First Law of E-Commerce: If the user can't find the product, the user can't buy the product. Feedback and suggestions on my shop welcome. Note: My advice is based on my own experience or on something I read in these forums. No guarantee it'll work for you! Make sure that you always BACKUP the database and the files you are going to change so that you can rollback to a working version if things go wrong.
Becki Posted March 9, 2007 Author Posted March 9, 2007 What are the potential risks of pepople coming to the site from the google link with the same session ID. OK heopfully it will be corrected with time but if someone comes right now. Would someome browsing see things added to their cart if someone else using the same session id added something to theirs etc? What happens when they try to checkout etc? I have turned the recreate sessions on - it didn't change the session ID when i came from the google link to the site - i presuem it only recreates when someone actually logs on is that correct? basically i just ewant to know if i shoul dbe worried about the session id's being appended to the links in googles index? Thanks Becki
Guest Posted March 9, 2007 Posted March 9, 2007 As far as I know the "recreate session" setting takes care of this. So if someone follows such a link from google, they might find their shopping cart already filled but once they login or create an account, osC assigns them a new session id protecting their personal information. No it doesn't. The ID is the same. This one does what you're saying. http://www.oscommerce.com/community/contributions,4112
Becki Posted March 9, 2007 Author Posted March 9, 2007 No it doesn't. The ID is the same. This one does what you're saying.http://www.oscommerce.com/community/contributions,4112 Hi, What does the 'recreate sessions' via the admin panel actually do then? Should I install your contribution above then if there is any chance at all of someone coming to my site via a session ID'd url? Thanks Becki
Becki Posted March 9, 2007 Author Posted March 9, 2007 No it doesn't. The ID is the same. This one does what you're saying.http://www.oscommerce.com/community/contributions,4112 If I haven't got register globals installed do I just not include: // >>> BEGIN REGISTER_GLOBALS // Work-around to allow disabling of register_globals - map all defined // session variables if (count($_SESSION)) { $session_keys = array_keys($_SESSION); foreach($session_keys as $variable) { link_session_variable($variable, true); } } // <<< END REGISTER_GLOBALS in cat/inc/functions/sessions.php and make use the change you have already put in your install instructions for catalog\includes\classes\navigation_history.php Thanks Becki Thanks
Guest Posted March 9, 2007 Posted March 9, 2007 What does the 'recreate sessions' via the admin panel actually do then? Does what it says. It recreates the session. But with the same ID. It may look strange but it's what it does. If you use the contribution it will always differentiate customers once they login. But not visitors who come with the same session ID because until they login this does not take effect. So this means the cart contents before login can still be mixed up. So it will not protect for the cart contents. But it will protect against mixing customer info. You should check the support thread of the contribution for details.
Becki Posted March 9, 2007 Author Posted March 9, 2007 Does what it says. It recreates the session. But with the same ID. It may look strange but it's what it does. If you use the contribution it will always differentiate customers once they login. But not visitors who come with the same session ID because until they login this does not take effect. So this means the cart contents before login can still be mixed up. So it will not protect for the cart contents. But it will protect against mixing customer info. You should check the support thread of the contribution for details. Ok thanks for the description - i understand now. Was my previous post correct in using the contribution without reg globals installed? I.E that code can just be left out. Thanks Becki
Becki Posted March 9, 2007 Author Posted March 9, 2007 well it explains it in the readme file doesn't it? yes it does, i'll except that one! my mistake. I thought your 'NOTE' in the readme was talking about only the code in nav_history - i must learn to skim read better :) Sorry. Becki
Becki Posted March 28, 2007 Author Posted March 28, 2007 OK, I still have the problem that google still has links with SID's in them - all from one day back in january. I have noticed one link to index.php that hasn't got a SID cached from this month - isn't it strange google only got my index.php page this time? Has it got something to do with chemo's code or my spiders file etc? The problem really is that if someone comes from a SID link in google they get the came cart as everyone else! people are not going to sign up etc (or know to) to be able to use their cart properly (it'll work correctly after they log in as i have installed the contrib noted above.) SO is there a way of removing the SID automatically when someone enters the site? and what implications might this have...if any? Thanks all becki
Becki Posted April 3, 2007 Author Posted April 3, 2007 Hi, I just looked at my access log, I have: 66.249.65.161 - - [02/Apr/2007:04:28:53 +0100] "GET /products_new.php?amp;language=en&page=11&action=buy_now&products_id=206 HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.65.161 - - [02/Apr/2007:04:29:04 +0100] "GET /index.php?manufacturers_id=3&page=1&sort=4a&osCsid=e008df1bccb72bc331b9e201cf4103d7 HTTP/1.1" 301 41992 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" Does the presence of the 301 mean that the code I'm using is working and that these url's will be removed from the SE index? There are still lots of instances of these osCid's in google though. Should this be the way it is products_new.php?amp;language=en i.e the 'amp' part? I read somewhere ages ago about a fix for the Ultimates SEO's regarding the & - I haven't implemented it so wondered if this is correct? should it not be &? Thanks Becki
Recommended Posts
Archived
This topic is now archived and is closed to further replies.