Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Prevent Spider Sessions Not Working


pstrid

Recommended Posts

Posted

Not sure why but whenever inktomi's bot crawls our site, it is still getting session IDs. Any ideas/help would be much appreciated. Here our my details...

 

Current settings:

 

Sessions

Force Cookie Use False

Check SSL Session ID False

Check User Agent False

Check IP Address False

Prevent Spider Sessions True

Recreate Session True

 

and here is my current spiders.txt file:

$Id: spiders.txt,v 1.2 2003/05/05 17:58:17 dgw_ Exp $
ask jeeves
crawler
crawler@fast
docomo
fast-webcrawler
frooglebot
geobot
googlebot
infoseek
lycos_spider
ncsa beta
polybot
scooter
slurp/si
[email protected]
teoma
voilabot
w3c_validator
Yahooseeker
YahooSeeker/1.1
inktomisearch.com
lj1029.inktomisearch.com
inktomisearch

 

When i check my tracking i see visits from:inktomisearch.com with seession IDs attached to each page it crawls, for example:

/contact_us.php?osCsid=47a78760759c755ac3b3c376a29a4cd0

 

Am i missing something?

Thanks.

Posted

Hi,

 

What is the exact user agent name ?

 

Can you post one line from your web server logs, showing the full details from inktomi ?

 

Peter

Posted

Peter,

Is this what you are loking for?

 

lj1159.inktomisearch.com - seems to access my robots.txt file and then one of the follwing does the crawl:

 

lj1038.inktomisearch.com

lj1015.inktomisearch.com

lj1236.inktomisearch.com

lj1220.inktomisearch.com

Posted

Or something more like this:

5 1 0 0 0 18169971 0 0 18169971 0 0 15262 lj1165.inktomisearch.com
Posted

Hi,

 

Just noticed you have some uppercase characters (Yahoo... ) in your spiders.txt file. You should change them to ALL lowercase, because of this:

 

$user_agent = strtolower(getenv('HTTP_USER_AGENT'));

 

then the var $user_agent is compared to the value in the file 'spiders.txt'

 

Can you paste the complete line from your web server logs, like this:

 

64.68.82.159 - - [30/Jun/2004:10:18:05 -0400] "GET /contact_us.php HTTP/1.0" 200 23537 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

 

Thanks,

 

Peter

Posted

Hi,

 

Just noticed my 'awstats' shows "Inktomi Slurp", and we have a "slurp" entry in 'spiders.txt'. I see you don't have 'slurp' on it's own, might be wise to add this:

 

slurp

 

...... to spiders.txt

 

Peter

Posted

change all to lowercase and added 'slurp'

 

Here's are two different lines from my log file

 

66.196.90.175 - - [18/Jul/2004:10:06:44 -0500] "GET /robots.txt HTTP/1.0" 404 3152

and

66.196.90.54 - - [18/Jul/2004:07:29:12 -0500] "GET /product_info.php?products_id=73&osCsid=49ea1123434915cefd633388ae2655af HTTP/1.0" 200 4412
Posted

Hi,

 

You can get rid of the "404" messages for robots.txt by placing this in your webroot path (usually called 'public_html).

 

User-agent: *
Disallow: /images/
Disallow: /includes/

 

and call the file robots.txt of course. :)

 

Your web server log files are not showing the additional portins, the referrer and user agent. See this one:

 

66.196.90.54 - - [18/Jul/2004:07:29:12 -0500] "GET /product_info.php?products_id=73&osCsid=49ea1123434915cefd633388ae2655af HTTP/1.0" 200 4412

 

should have referrer and user agent appended to it. Do you have a control panel, like CPanel ?

 

The only other thing I noticed is you have:

 

Recreate Session True

 

I'm 99% certain we usually leave that set to false, but honestly, I can't remember what it does (doh !! ).

 

Peter

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...