rudolfl Posted September 11, 2007 Posted September 11, 2007 Hi,all Option "do not start session for spiders", works, but spider detection is not reliable. Yahoo and MSN ones are missed and some others too. Here is a suggestion (I implemented it on my site, but want to test it a bit more). 1. Get a reverse DNS lookup on IP. 2. If hostname contains words "livebot" (MSN) or "crawl", it is very likely to be a spider. I have modified my whois contribution as well as my applicatio_top.php file to use this improved spider detection algorithm together with original one that is based on getting USER_AGENT string and looking it up in spiders.txt. Seem to work OK so far. The only drawback i can see in this approach is time taken by looking up the name. Here is a code snapshot: (from application_top.php) $ip=$_SERVER['REMOTE_ADDR']; $hostname=gethostbyaddr($ip); if ((strstr(strtolower($hostname), "crawl") != FALSE) || (strstr(strtolower($hostname), "livebot") != FALSE)) { $spider_flag = true; break; } Any comments? Thanks, Rudolf
Recommended Posts
Archived
This topic is now archived and is closed to further replies.