YSC Posted July 26, 2004 Posted July 26, 2004 For some reason, my prevent known spiders from creating sessions is not working. I have it set to false in the control panel. Here is the text from my spiders.txt file: $Id: spiders.txt,v 1.2 2003/05/05 17:58:17 dgw_ Exp $almaden.ibm.com appie 1.1 architext ask jeeves asterias2.0 augurfind baiduspider bannana_bot bdcindexer crawler crawler@fast docomo fast-webcrawler fluffy the spider frooglebot geobot googlebot gulliver henrythemiragorobot ia_archiver infoseek kit_fireball lachesis lycos_spider mantraagent mercator moget/1.0 muscatferret nationaldirectory-webspider naverrobot ncsa beta netresearchserver ng/1.0 osis-project polybot pompos scooter seventwentyfour sidewinder sleek spider slurp/si [email protected] steeler/1.3 szukacz t-h-u-n-d-e-r-s-t-o-n-e teoma turnitinbot ultraseek vagabondo voilabot w3c_validator Yahooseeker YahooSeeker/1.1 zao/0 zyborg/1.0 I am using oscommerce ms2 on this site. Come to think of it, I don't think that it has ever worked correctly. Any help or direction would be greatly appreciated. Best, Rob
YSC Posted July 26, 2004 Author Posted July 26, 2004 I forgot to ask if anyone could tell me where to manually set the session_block_spiders. I checked a number of files but can't seem to locate it. Thanks Again, Rob
koie Posted July 26, 2004 Posted July 26, 2004 Set it to true and check if it works with a spider simulator http://www.webconfs.com/search-engine-spider-simulator.php I believe it identifies itself as google.
YSC Posted July 26, 2004 Author Posted July 26, 2004 that one worked okay, but for somereason other ones that i use do not. for example: http://www.gritechnologies.com/tools/spider.go xenu link sleuth tool among others However I went back and turned off the prevent know spiders and the tool that you sent me to did show session id's, so the module must be working but perhaps I just need to update my spiders.txt file? I know that when MS2 was first released yahoo was still getting their results from google and inktomi. Does anyone have an updated spiders.txt they would like to share? Maybe we should post the updated deffinitions as a contribution. Thoughts?
koie Posted July 26, 2004 Posted July 26, 2004 I believe that if you add : Poodle predictor 1.0 to your spiders.txt the sessions id's will disappear. p.s. mine spiders.txt is about the same as yours. Should work most the most popular spiders.
YSC Posted July 26, 2004 Author Posted July 26, 2004 couldn't get it to block the poodle predictor, however I am more satisfied that it is working then when I started this post. I wonder is there a way to see how a spider is identifying itself? Anyways I thank you for all your help in this matter, I will have to wait to have my pages indexed by the engines to see if it is working.
koie Posted July 26, 2004 Posted July 26, 2004 Have you noticed that your URL of pages in the shop end with something similar as : osCsid=5e086ff3e5e98c4a48fe3463ad2f4fd0 This is an session ID that oscommerce uses to track customers. What products they have added into the cart for example. You don't need this voor spiders. There is a whole lot of info on this forum about spiders. Worthwile to read if you want your site to be well indexed by the search engines.
breedingexotics Posted July 26, 2004 Posted July 26, 2004 just a quick question on the spiders i have mine set to true ...... but in the text file it shows a bunch of them do i delete them ? i have had to set them false then i just reset my spiders to true today
koie Posted July 26, 2004 Posted July 26, 2004 No, this is a list of known spiders. There are a whole bunch of spiders out there but these are the most common once. If a spider visist your site and its indentification is in this list, the sessionID are left out. If you happen to come across a spider that is not in the list you can add this spider to the list. I have recently added for example the dutch Wiseguys spider to mine spiders.txt Look at your logfiles and do for example a search on googlebot. You will be able to see how it identifies itself. This name should also be found in the spiders.txt.
jodo Posted July 28, 2004 Posted July 28, 2004 I just added Poodel predictor 1.0 to my list but they still get the session code. I'm curious as to why and or how. Anyone know?
koie Posted July 28, 2004 Posted July 28, 2004 Curious, indeed you are right. I have done this with a couple of other spider simulators and they all worked fine. Wonder why this one doesn't.
John Posted July 29, 2004 Posted July 29, 2004 my spiders.txt files conatains a lot of entries for search eniges crawler. All the things are working fine except inktomi. Whenever I view the page who is online, I view that inktomi is indexing my site but with session ids. How to overcome this sessions Ids problem with inktomi yahoo slurp. I think I have the proper entry for this search engine in my spiders.txt file.
John Posted August 4, 2004 Posted August 4, 2004 Please help me to solve this problem. I have enabled the safe url for search engine value to true. The site is indexing properly at google, msn and other crawler search engines. But only yahoo shows session ids. I have run the search engine simulator. it shows the url without session ids. But in actual yahoo shows session ids. Please help me.
koie Posted August 4, 2004 Posted August 4, 2004 Perhaps the SID Killer Contribution whould help out here. I haven't tried it (yet) but from what it reads it should do the trick.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.