sam6 Posted February 12, 2004 Share Posted February 12, 2004 ok i put the msnbot and msnbot/0.11 in spider txt but in the log files it shows the msnbot with the oscids please help Link to comment Share on other sites More sharing options...
peterr Posted March 11, 2004 Share Posted March 11, 2004 Hi, I'm having the same problem. Make sure that "FORCE COOKIE USAGE" is set to false; the Wiki said that had to be false, for the session id's to be turned off for spiders. Peter Link to comment Share on other sites More sharing options...
Mark Evans Posted March 11, 2004 Share Posted March 11, 2004 Try replacing msnbot/0.11 With just msnbot Mark Evans osCommerce Monkey & Lead Guitarist for "Sparky + the Monkeys" (Album on sale in all good record shops) --------------------------------------- Software is like sex: It's better when it's free. (Linus Torvalds) Link to comment Share on other sites More sharing options...
peterr Posted March 11, 2004 Share Posted March 11, 2004 Hi, Try replacing msnbot/0.11 With just msnbot It wouldn't make any difference, because if you have both: msnbot msnbot/0.11 in /catalog/includes/spiders.txt , the following code (in application_top.php), used to check: if (is_integer(strpos($user_agent, trim($spiders[$i])))) { would result in true whether the "HTTP_USER_AGENT" was either: msnbot (+http://search.msn.com/msnbot.htm) OR msnbot/0.11 (+http://search.msn.com/msnbot.htm) I have tested this quite a bit this afternoon, and it is not an issue of values in 'spiders.txt', or anything else in osC (it seems), but something that "msnbot" is doing. For example, I forced the same user agent string that appears in my web logs like this: $user_agent = strtolower("msnbot/0.11 (+http://search.msn.com/msnbot.htm)"); and the osC code resulted in: Spider - Yes use Sessions - No spider name - msnbot/0.11 Peter Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 11, 2004 Share Posted March 11, 2004 The URLs were gethered by the bot before you got the spider session ID killer in place. It'll take the bot alittle while to realize that it can't get back to that URL again (with the same session id), and when it does, those URLs will drop off it's list of URLs to parse. This will probably take a couple of weeks, maybe enven a couple of months. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 11, 2004 Share Posted March 11, 2004 Hi, I'm having the same problem. Make sure that "FORCE COOKIE USAGE" is set to false; the Wiki said that had to be false, for the session id's to be turned off for spiders. Peter If 'Force Cookie Usage' is set to TRUE, then SIDS are not assigned at all, spiders or no. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
peterr Posted March 11, 2004 Share Posted March 11, 2004 Hi, If 'Force Cookie Usage' is set to TRUE, then SIDS are not assigned at all, spiders or no. Yep, you can't have both, it's either one or the other. :D Peter Link to comment Share on other sites More sharing options...
fishy Posted March 11, 2004 Share Posted March 11, 2004 what about filtering the the yahoo bot: Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp would putting Yahoo! Slurp in spiders.txt get it done ? best. Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 11, 2004 Share Posted March 11, 2004 'slurp' should already be there, and that's all you need. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
fishy Posted March 11, 2004 Share Posted March 11, 2004 10-4. cheers. Link to comment Share on other sites More sharing options...
peterr Posted March 11, 2004 Share Posted March 11, 2004 Hi Chris, The URLs were gethered by the bot before you got the spider session ID killer in place. It'll take the bot alittle while to realize that it can't get back to that URL again (with the same session id), and when it does, those URLs will drop off it's list of URLs to parse. This will probably take a couple of weeks, maybe enven a couple of months. Thanks for that tip there, I will take a note of the cSid that 'msnbot' has used, and see if they use the same one on the next crawl. Thanks, :) Peter Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 12, 2004 Share Posted March 12, 2004 Well, that's a good thought, but I think you'll find that the bot has been assigned several dozen, if not hundreds, and possibly even thousands of oscids. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.