M@rcel Posted October 25, 2002 Share Posted October 25, 2002 It seems that doing a referer check works But does it also work when the link is followed by a spider? :?: M@rcel Quote Greetings from Marcel |Current version|Documentation|Contributions| Link to comment Share on other sites More sharing options...
Ian Posted October 25, 2002 Author Share Posted October 25, 2002 Yes, if a spider follows a link from allprods.php then there's no session_id and they get the product page in the appropriate language. Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
Big_Daddy Posted October 26, 2002 Share Posted October 26, 2002 Ian, Believe it or not I got spidered again by google, no index, no anything when I search google for me site, but thats besides the point. I noticed that while they were re-spidering me, they lost the session Id's only when viewing the product_info pages. When the bot hit a category page and transferred to the cpath, they picked up the Sid again. Granted, there is no info on my cpath area that would make good content for an index, but it caused them to loop around again. Just some input as to what is going on. Like i said though, they are losing the sid's on product_info.php but as far as I can tell, this is the only page they lose it on. Quote Brandon Sweet Link to comment Share on other sites More sharing options...
Ian Posted October 26, 2002 Author Share Posted October 26, 2002 Brandon, I'm no google expert, but from reading all these topics I seem to remember someone saying you have to wait 2-3 days before a spider session makes any noticeable impact on the google index. The categories sid issue is interesting. I am going to spend some time just wandering round a stock osc site with cookies off, just to get a feel for the issues this may throw up. As a matter of interest are you using any other google/spider fixes. e.g. allprods, robots.txt etc. I'll also wander around yous site (oh such hard work I give myself) to see if it has any special issues. Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
Ian Posted October 26, 2002 Author Share Posted October 26, 2002 Brandon, Just had a though, do you use the buy now button in your product listings. If you do, this is almost certainly the cause of the reapperance of the sid. My code is built to work for a customer who has cookies turned off. If this customer adds something to the cart they need sid's to retain the cart. This is fine for the add to cart button as this is a form action which bots can't do. However the buy now button is a straight forward link that siders will follow. My code will see that something has been added to the cart and turn sid's back on. It's a major failing that I've mentioned before. I have yet to come up with a simple fix(the non-simple fix is to recode product listing to make buy now a form action) Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
xaraya Posted October 27, 2002 Share Posted October 27, 2002 What about using Search-Engine Safe URLs option to strip the sid out of the pages? It seems to work fine... no sid in the code that I can see. http://www.mountainwatersspa.com/catalog Anyone know when this feature was implimented. I found it by surprise in a 101402 snapshot. I was so excited to see that it worked. Quote Link to comment Share on other sites More sharing options...
Ian-San Posted October 27, 2002 Share Posted October 27, 2002 Gegory Dont get too excited - the SID disapears as you have a cookie set instead. It will come back for the search engines :( Also, note that with Search Engine Safe URLs, you cannot add to basket or log-on if cookies are disabled. :( :( This feature is still under development I think. Quote Ian-san Flawlessnet Link to comment Share on other sites More sharing options...
xaraya Posted October 27, 2002 Share Posted October 27, 2002 Thanks for bringing me up to date. This really helps :) I suppose I can live with trade off of functionality for search engine indexing for a little while. It's very important to get our new catalog indexed. Our web site is already well indexed on google.com. I'll use Ian Wilson's fix until a better solution is implemented. Warm Regards, Greg Quote Link to comment Share on other sites More sharing options...
xaraya Posted October 27, 2002 Share Posted October 27, 2002 I never checked our cookie output before. We use PostNuke for content management and I see it generates a session number in cookies. Is this what's throwing off google bots in OSC? The PostNuke session ID never seemed to affect our indexing status with google.com in the past. I just added Ian's session remover code to be on the safe side. Since we use PayPal to process all orders cookies have to be enabled so the search engine safe URL option will not affect us. Need to post a comment at the sign in page to have "cookies enabled". Here's what our cookie file looks like now with both oscommerce (Ian's code added) and PostNuke data: www.mountainwatersspa.com FALSE /catalog FALSE 1040913873 email_address greg%40mountainwatersspa.com www.mountainwatersspa.com FALSE /catalog FALSE 1040913875 first_name Gregory .www.mountainwatersspa.com TRUE / FALSE 1038926728 POSTNUKESID 088c8254e59c0586633edfbe38fd0515 Quote Link to comment Share on other sites More sharing options...
Ian-San Posted October 27, 2002 Share Posted October 27, 2002 Gregory I am not an expert on cookies but if OS manages to store a cookie on the customers computer, it turns off the SIDs in the URL as they are not required anymore. But when Google comes, clearly it does not accept cookies so the SID comes back. There is an enormous amount of stuff in these forums on cookies, Google, session IDs, SEFUs etc. Like a cold, there are so many cures because non of them really work 100%. Here is the cookie you stored on my computer: POSTNUKESID 672c1113616f512a432f541ba329f063 www.mountainwatersspa.com/ 1536 309190016 29524804 2981840080 29523395 * clearly with SID. And here is one of my own: email_address XXXXXXXX(DELETED)XXXXXXX www.nowsayit.com/catalog_en 1024 1454922752 29529413 797726816 29523379 * first_name Ian www.nowsayit.com/catalog_en 1024 1454922752 29529413 797726816 29523379 * I am not sure, but it doesn't look like a SID to me. I have Ian's mod added as well. Quote Ian-san Flawlessnet Link to comment Share on other sites More sharing options...
winterradio Posted November 5, 2002 Share Posted November 5, 2002 Thanks for the contribution Ian. I've had it it installed for a week or more and I notice that once the customer adds something to their cart and then decides to click on say for instance contact_us.php or any other product page the ocid is dropped and therefore the shopping cart appears empty. This is probably part of the optimization for spiders but if a customer clicks on another page and sees that their basket is empty they are more than likely not going to complete the transaction. Have there been any updates in your code since you last posted. Thanks, H Quote Link to comment Share on other sites More sharing options...
Ian Posted November 5, 2002 Author Share Posted November 5, 2002 This is not the way it is supposed to work, and certainly not how it works on my test system. The sid should be carried on after adding a product to the cart and should not be dropped unless the customer empties the cart. Are you sure you have the latest version. Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
winterradio Posted November 6, 2002 Share Posted November 6, 2002 Hi Ian, I'm using the latest version through CVS and the latest version of your code gathered from this post. //================================================================ if ( ($HTTP_GET_VARS['currency']) ) { tep_session_register('kill_sid'); $kill_sid=false; } if ( ($HTTP_GET_VARS['language']) ) { tep_session_register('kill_sid'); $kill_sid = false; } if (basename($_SERVER['HTTP_REFERER']) == 'allprods.php' ) $kill_sid = true; if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) $kill_sid = false; if (basename($PHP_SELF) == FILENAME_LOGIN ) $kill_sid = false; //================================================================ I changed the second line from $kill_sid = true; to $kill_sid = false; It works now and I actually had google crawl all throughout my site from product_info.php. I'm guessing that this function stops killing a customer sid once a session is registered so if the bot/spider crawls from product_info.php the sid is killed from there. I do not have buy now functions enabled. I was enjoying pretty decent rankings before implementing this so I'm anxious to see if the spider will crawl every product page now. Henry Quote Link to comment Share on other sites More sharing options...
Ian Posted November 6, 2002 Author Share Posted November 6, 2002 Let's just look at what that line of code should be doing. If a customer is has logged in or if there is something in the basket, we want sid's to be produced. So the line in my code. if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) $kill_sid = true; N.B. In my code notice it sets $kill_sid = true if crawler/customer is not logged in and has nothing in the cart - kill the sid. We don't need it. If you change that to false then if a customer does not have cookie's enabled they won't be able to add item's to the cart. Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
Ian Posted November 6, 2002 Author Share Posted November 6, 2002 Just to make a couple of points clear. First all othe code presented in this thread which was written to stop osCommerce producing an sid when google crawls was NOT written specifically to help you get a better page rank. It's intention is to stop the spider getting trapped and producing 1000's of hit's on the site. Second, the attachment of sid's to an url is not what causes google to get trapped per se. Normally if a user visits your site and wanders around (with cookies disababled) they will do so with a consistent sid. However googlebot does not appear to behave like a normal user following link to link. If it did then the sid would be consistent across it's visit. The problem appears to be that when it visit's a new link, it also generates a completely new sid. Why it does this is still a mystery to me. I'm currently working through a number of sites/forums to see why this as. Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
Ian-San Posted November 6, 2002 Share Posted November 6, 2002 If it did then the sid would be consistent across it's visit. The problem appears to be that when it visit's a new link, it also generates a completely new sid. Why it does this is still a mystery to me. I'm currently working through a number of sites/forums to see why this as. I think what happens is that the Google Spider initially hits the site with all its existing stored links and so collects many successful hits each with their own Sid. Google then rationalises the list but as many of the hits have a unique Sid, they seem to be different links to Google. So, when the Google indexer comes back a couple of days later it has many 'unique' links to try to index. As all these indexers are wandering around at the same time, it just looks random. With this mod, the initial hit should result in no Sids being returned so no duplication of links for the indexer. Keeping Google from wandering around the site at random is the task of the robots.txt file. Quote Ian-san Flawlessnet Link to comment Share on other sites More sharing options...
winterradio Posted November 6, 2002 Share Posted November 6, 2002 I've noticed many oscommerce users complaining about the hundreds of requests attributed to sid. I've never had such a problem with over 100 result pages appearing high in Google. Google seems to follow the all_prod.php links perfectly after looking at my log files. So I'm not completely convinced this problem could be completely attributed to sid. Google seems to follow links throughout my site without discrimination. I'm not quite sure why so many problems have been reported. I know Google has and is continuing to improve its indexing of dynamic pages and would actually prefer a raw output rather than search engine friendly urls. I included this add on to see whether more pages would be indexed in case (although not yet identified by myself) spiders/bots time out through category paths because of sid's. I've changed the previous code back to "true" and it seems to be functioning ok now. I think the problem was related to html_output. Thanks for the feedback Henry Quote Link to comment Share on other sites More sharing options...
Guest Posted November 11, 2002 Share Posted November 11, 2002 I have a snapshot from June 4th and cannot find the line of if ( isset($sid) ) { $link .= $separator . $sid; } in the html_output.php file to change. Has anyone else implemented this with a date around this time that could give me a hand? I'm getting ready for google since I should be spidered soon... Quote Link to comment Share on other sites More sharing options...
CC Posted November 11, 2002 Share Posted November 11, 2002 I got same problem dude. I think the code we need to alter would be this: // Append the session id string to the URL if ($sess) { $sess = $separator . $sess; } $link .= $sess; return $link; } But I altered the code uploaded it and then it buggered up the loading of my site. So I didnt leave it in. If anyone can help, gis a shout plz. CC. Quote Link to comment Share on other sites More sharing options...
sefu Posted November 12, 2002 Share Posted November 12, 2002 Yeah I cant find that peice of code also. Can anyone help on adding it Quote Link to comment Share on other sites More sharing options...
Ian Posted November 12, 2002 Author Share Posted November 12, 2002 Are you trying to add my sid killer mod, If so, try the following. // Append the session id string to the URL if ($sess) { $sess = $separator . $sess; } $link .= $sess; return $link; } and change this to // Append the session id string to the URL if ($sess) { $sess = $separator . $sess; } if (!$kill_sid) $link .= $sess; return $link; } Not having the full code for your snapshot can't be 100% certain, but looks right to me. You will of course have to add the application_top.php code as well Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
CC Posted November 12, 2002 Share Posted November 12, 2002 Ian can yours be run alongside this: // start session ID removal if (eregi("Googlebot",getenv("HTTP_USER_AGENT")) || eregi("googlebot",getenv("HTTP_USER_AGENT"))) { $sess = NULL; } if (eregi("WebCrawler",getenv("HTTP_USER_AGENT")) || eregi("InternetSeer",getenv("HTTP_USER_AGENT"))) { $sess = NULL; } Or will it cause problems? Cheers CC. Quote Link to comment Share on other sites More sharing options...
Ian Posted November 12, 2002 Author Share Posted November 12, 2002 Well, possibly, My code was intended to do away with testing user_agent, ip address etc, as this could be an ever moving target. e.g Alta Vista are supposed to be about to redo their whole search engine experience. If it causes problems you then have to add another spider test. Google upgrade their network, change their spder name and ip, your f*. I'm not saying my code is the definitive solution, it still has one or two problems, however it's ad vantage is it's not tied to trying to recognise who is browsing your site. Quote Trust me, I'm an Accountant. Link to comment Share on other sites More sharing options...
CC Posted November 13, 2002 Share Posted November 13, 2002 So Ian, Are you saying this mod will only work if there are no ppl on my site and there is nothing in any carts? And if so, is there anyway to test this to make sure it works with my snapshot? Also in the part that goes in application_top you said it goes after the first line, but my code all sits on one line, so... Do you mean like this: function tep_href_link($page = '', $parameters = '', $connection = 'NONSSL', $add_session_id = true, $search_engine_safe = true) { global $kill_sid; Or like this: function tep_href_link($page = '', $parameters = '', $connection = 'NONSSL', $add_session_id = true, global $kill_sid; search_engine_safe = true) { Cheers CC. Quote Link to comment Share on other sites More sharing options...
winterradio Posted November 19, 2002 Share Posted November 19, 2002 Hi, I noticed that this topic might have finally been addressed in the latest CVS commit. Does this update to html_output.php make this contribution obsolete/unecessary? Hopefully it does so that the behaviors in this contribution are performed by default. Thanks Henry Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.