coremaster Posted November 22, 2003 Share Posted November 22, 2003 I have OS-2.2 running and I have Prevent Spider Sessions turned on and have added YahooSeeker and YahooSeeker/1.0 to the spiders.txt file but still getting sid's from "Agent: YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)" So far this month yahoo has taken down over 3 gigs of downloads from my website and I got to stop this asap but would still like my items to show up in there shopping site. it's a lot like Froogle and don't want to kill them off in the robot.txt file... Thanx Here is a copy of my SID file ---------------------------------------------------------------------------------------------------- $Id: spiders.txt,v 1.2 2003/05/05 17:58:17 dgw_ Exp $ almaden.ibm.com appie 1.1 architext ask jeeves asterias2.0 augurfind baiduspider bannana_bot bdcindexer crawler crawler@fast docomo fast-webcrawler fluffy the spider frooglebot geobot googlebot gulliver henrythemiragorobot ia_archiver infoseek Inktomi inktomi Slurp kit_fireball lachesis lycos_spider mantraagent mercator moget/1.0 muscatferret nationaldirectory-webspider naverrobot ncsa beta netresearchserver ng/1.0 osis-project polybot pompos scooter seventwentyfour sidewinder sleek spider Slurp slurp/si [email protected] steeler/1.3 szukacz t-h-u-n-d-e-r-s-t-o-n-e teoma turnitinbot ultraseek YahooSeeker YahooSeeker/1.0 vagabondo voilabot w3c_validator zao/0 zyborg/1.0 Link to comment Share on other sites More sharing options...
wizardsandwars Posted November 22, 2003 Share Posted November 22, 2003 This may be a result of previous URL gatherings by this bot. You can download the firebird webbrowser to mask your useragent to test to see if your spider sid killer is working. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
HornInc Posted June 21, 2006 Share Posted June 21, 2006 I too am experiencing a similar problem. I have prevent spiders sessions turned true and have updated the spiders.txt with the latest contrib. However, most spiders (other than google) still get session IDs. After updating to the latest spiders.txt I began to parse my web server access logs and still found that yahoo and MSN (among others) are still getting an osCsid. Any help is greatly appreciated. Here is the spiders.txt that I am using: crawl slurp spider seek ebot obot abot dbot hbot kbot mbot nbot pbot rbot sbot tbot ybot zbot bot. bot/ _bot accoona appie architext asterias atlocal atomz augurfind bannana_bot bdfetch blo. blog boitho booch ccubee cfetch csci digger ditto dmoz docomo dtaagent ebingbong ejupiter falcon findlinks gazz genieknows goforit grub gulliver harvest helix heritrix holmes homer htdig ia_archiver ichiro iconsurf iltrovatore indexer ingrid inktomisearch.com ivia jakarta java/ jetbot kit_fireball knowledge lachesis larbin libwww linkwalker lwp mantraagent mapoftheinternet mediapartners mercator metacarta miva mj12 mnogo moget/ multitext muscatferret myweb najdi nameprotect ncsa beta netmechanic netresearchserver ng/ npbot noyona nutch objectssearch omni osis-project pear. poirot pompos poppelsdorf rambler salty sbider scooter scrubby Sensis Web Crawler shopwiki sidewinder silk smartwit sna- sohu sphider spinner spyder steeler/ sygol szukacz tarantula t-h-u-n-d-e-r-s-t-o-n-e /teoma theophrastus tutorgig twiceler updated vagabondo volcano voyager/ w3c_validator wavefire websitepulse wget wire worldlight worm zao/ xenu xirq zippp zyborg .... ! spiders.txt Contribution version 2006-05-18 - Please read readme before editing Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.