networkdad Posted December 14, 2002 Posted December 14, 2002 I've got this piece of code in my html_output.php file to strip off the session ID of any spiders. It appears to work well, with the exception of the almaden IBM spider. // Add more Spiders as you find them. MAKE SURE THEY ARE LOWER CASE! $spiders = array("googlebot", "teomaagent", "zyborg", "gulliver", "architext", "fast-WebCrawler", "slurp", "ask jeeves", "ia_archiver", "scooter", "mercator", "crawler@fast", "crawler", "infoseek sidewinder", "lycos_spider", "fluffy the spider", "ultraseek", "mantraagent", "moget", "t-h-u-n-d-e-r-s-t-o-n-e", "muscatferret", "voilabot", "sleek spider", "kit_fireball", "webcrawler", "http://www.almaden.ibm.com/cs/crawler"); // get useragent and force to lowercase just once $useragent = strtolower(getenv("HTTP_USER_AGENT")); foreach($spiders as $Val) { if (!(strpos($Val, $useragent) === false)) { // found a spider, kill the sid/sess // Edit out one of these as necessary depending upon your version of html_output.php $sess = NULL; // $sid = NULL; break; } } Can anyone help me ?? I'm guessing some how im misnaming the almaden IBM spider. Here is a direct listing out of my log file for this spider: 66.147.154.3 - - [08/Dec/2002:13:52:35 -0600] "GET /robots.txt HTTP/1.0" 200 195 "-" "http://www.almaden.ibm.com/cs/crawler [c01]" How should i list this spider ??? Or am i going about this all wrong? I say i *think* this works, as the google spider no longer has a session ID when it comes to my site.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.