Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

msnbot


sam6

Recommended Posts

  • 4 weeks later...

Try replacing

 

msnbot/0.11

 

With just

 

msnbot

Mark Evans

osCommerce Monkey & Lead Guitarist for "Sparky + the Monkeys" (Album on sale in all good record shops)

 

---------------------------------------

Software is like sex: It's better when it's free. (Linus Torvalds)

Link to comment
Share on other sites

Hi,

 

Try replacing

 

msnbot/0.11

 

With just

 

msnbot

 

It wouldn't make any difference, because if you have both:

 

msnbot

msnbot/0.11

 

in /catalog/includes/spiders.txt , the following code (in application_top.php), used to check:

 

if (is_integer(strpos($user_agent, trim($spiders[$i])))) {

 

would result in true whether the "HTTP_USER_AGENT" was either:

 

msnbot (+http://search.msn.com/msnbot.htm)

 

OR

 

msnbot/0.11 (+http://search.msn.com/msnbot.htm)

 

I have tested this quite a bit this afternoon, and it is not an issue of values in 'spiders.txt', or anything else in osC (it seems), but something that "msnbot" is doing. For example, I forced the same user agent string that appears in my web logs like this:

 

$user_agent = strtolower("msnbot/0.11 (+http://search.msn.com/msnbot.htm)");

 

and the osC code resulted in:

 

Spider - Yes

use Sessions - No

spider name - msnbot/0.11

 

Peter

Link to comment
Share on other sites

The URLs were gethered by the bot before you got the spider session ID killer in place. It'll take the bot alittle while to realize that it can't get back to that URL again (with the same session id), and when it does, those URLs will drop off it's list of URLs to parse.

 

This will probably take a couple of weeks, maybe enven a couple of months.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Hi,

 

I'm having the same problem. Make sure that "FORCE COOKIE USAGE" is set to false; the Wiki said that had to be false, for the session id's to be turned off for spiders.

 

Peter

If 'Force Cookie Usage' is set to TRUE, then SIDS are not assigned at all, spiders or no.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

'slurp' should already be there, and that's all you need.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Hi Chris,

 

The URLs were gethered by the bot before you got the spider session ID killer in place. It'll take the bot alittle while to realize that it can't get back to that URL again (with the same session id), and when it does, those URLs will drop off it's list of URLs to parse.

 

This will probably take a couple of weeks, maybe enven a couple of months.

 

Thanks for that tip there, I will take a note of the cSid that 'msnbot' has used, and see if they use the same one on the next crawl.

 

Thanks, :)

 

Peter

Link to comment
Share on other sites

Well, that's a good thought, but I think you'll find that the bot has been assigned several dozen, if not hundreds, and possibly even thousands of oscids.

-------------------------------------------------------------------------------------------------------------------------

NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit.

If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...