Jump to content
  • Checkout
  • Login
  • Get in touch


The e-commerce.

Detecting spiders


Recommended Posts



Option "do not start session for spiders", works, but spider detection is not reliable. Yahoo and MSN ones are missed and some others too.


Here is a suggestion (I implemented it on my site, but want to test it a bit more).


1. Get a reverse DNS lookup on IP.

2. If hostname contains words "livebot" (MSN) or "crawl", it is very likely to be a spider.


I have modified my whois contribution as well as my applicatio_top.php file to use this improved spider detection algorithm together with original one that is based on getting USER_AGENT string and looking it up in spiders.txt.

Seem to work OK so far.


The only drawback i can see in this approach is time taken by looking up the name.


Here is a code snapshot: (from application_top.php)




if ((strstr(strtolower($hostname), "crawl") != FALSE) ||

(strstr(strtolower($hostname), "livebot") != FALSE))


$spider_flag = true;




Any comments?



Link to comment
Share on other sites


This topic is now archived and is closed to further replies.

  • Create New...