Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Detecting spiders


rudolfl

Recommended Posts

Hi,all

 

Option "do not start session for spiders", works, but spider detection is not reliable. Yahoo and MSN ones are missed and some others too.

 

Here is a suggestion (I implemented it on my site, but want to test it a bit more).

 

1. Get a reverse DNS lookup on IP.

2. If hostname contains words "livebot" (MSN) or "crawl", it is very likely to be a spider.

 

I have modified my whois contribution as well as my applicatio_top.php file to use this improved spider detection algorithm together with original one that is based on getting USER_AGENT string and looking it up in spiders.txt.

Seem to work OK so far.

 

The only drawback i can see in this approach is time taken by looking up the name.

 

Here is a code snapshot: (from application_top.php)

 

$ip=$_SERVER['REMOTE_ADDR'];

$hostname=gethostbyaddr($ip);

if ((strstr(strtolower($hostname), "crawl") != FALSE) ||

(strstr(strtolower($hostname), "livebot") != FALSE))

{

$spider_flag = true;

break;

}

 

Any comments?

Thanks,

Rudolf

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...