Guest Posted February 20, 2004 Posted February 20, 2004 I've seen the other threads about html_output.php, Inktomi, session ID's, etc., but I'm still stumpted. I have "Prevent Spider Sessions" turned on in the admin, but inktomisearch is still getting SID's. It comes to my site, adds thousands of products to it's shopping cart, and stays for hours and hours. Can someone provide me with some advice on ridding myself of this nuisance? Thanks in advance, Bob
1quicksi Posted February 20, 2004 Posted February 20, 2004 Dunno if you saw this... http://www.oscommerce.com/forums/index.php?showtopic=39566 knowledge base | Contributions | Search
Guest Posted February 21, 2004 Posted February 21, 2004 Yes, that was one of the threads I referred to in my initial post, although not with a specific link. But I gather that this was for MS1, as MS2 has the spider killer built in. So my question is this: MS2 has an option to prevent spider sessions, which I've enabled. Nonetheless, inktomisearch is given a session id, gets stuck for hours and hours on my site, and adds thousands of dollars worth of merchandise to its shopping cart. Does anyone know how to add inktomisearch to the list of "known spiders" that MS2 uses with it's built-in "Prevent Spider Sessions" option? Thanks, Bob
burt Posted February 21, 2004 Posted February 21, 2004 Find the correct UserAgent for the troublesome spider and manually add it to the text file called spiders.txt Should work fine.
Guest Posted February 21, 2004 Posted February 21, 2004 Whats settings are recommended for sessions in OS MS2 admin for a site that doesn't have an ssl server (https) I don't fully understand the processes but I guess I don't want to set all options to true ? TIA
Pouli Posted February 29, 2004 Posted February 29, 2004 I encountered the exact same problem starting a few days ago. Inktomi has been visiting me for several months without any problem and no session id. Then, all of a sudden, it is getting a session id, when I look at its visits through the user tracking contribution. I am also tracking spider visits through an app called robotstats, and it tells me that the session id is not always passed as an explicit parameter. In this latter case, are the pages indexed correctly? Find the correct UserAgent for the troublesome spider and manually add it to the text file called spiders.txt I searched the web for an update of the UserAgent name for Inktomi and did not find anything. Is there an easy way to find out the UserAgent, like reverse IP lookup or so? Thanks for any help. Michel
MickiCheers Posted March 2, 2004 Posted March 2, 2004 I have the same problem and went in search for an answer. At http://www.searchenginejournal.com/index.php?p=289 I found this 2/17/2004 Yahoo Intros New Search Robot - Yahoo! Slurp [ Search Engine News ] Yahoo just got a step closer to dropping the Google search results from its search function and replacing them with Yahoo?s own Inktomi search engine- which will be a bit of a blow to Google, and a sign of potential dominance by Yahoo. Yahoo has just unleashed a new site indexing robot to crawl the web with - Yahoo! Slurp. Yahoo?s new robot keeps a similar name to the Inktomi Slurp crawler and some features listed on Yahoo include: The page goes on a bit, but I figured this was the explanation for the sudden change. On some other page I found Yahoo! renames SlurpWe didn't really see this coming, but hey, it had to happen. Yahoo! have just renamed their new Inktomi's web spider from "slurp" to "Yahoo! Slurp". They also modified their User-agent (a user identification code used mainly by webmasters to track robots, users and spammers). Yahoo!'s new Slurp user-agent is: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Now I know I can't put in "compatible" as the user agent. Can I put in "Yahoo! Slurp" in the spiders.txt?
MickiCheers Posted March 2, 2004 Posted March 2, 2004 Ok, I kinda feel like an idiot. But not really. Yahoo! Slurp will obey the first entry in the robots.txt file with a User-Agent containing "Slurp". If there is no such record, it will obey the first entry with a User-Agent of "*". So I'm just going to enter "Slurp" in addition to the other slurps I already have on my list.
sam6 Posted March 2, 2004 Posted March 2, 2004 i am having a problem with inktomisearch it says user agent is yahooseeker/1.1 i have this in my spider txt still getting oscid i need help also
Pouli Posted March 4, 2004 Posted March 4, 2004 As suggested by MickiCheers, I tried to add "Slurp" to my spider.txt, but the robot is still getting a session Id... Any other hint?
MickiCheers Posted March 8, 2004 Posted March 8, 2004 just so you know, i haven't found anything that works either. and inktomi is driving me stone crazy! inktomi is wearing me out. most of the sessions on my user_tracking are all inktomi. makes it so hard to find the real visits. if anyone has any idea what to do, please share.
wizardsandwars Posted March 8, 2004 Posted March 8, 2004 If you've fixed it so that Intomi is not begin assigned any NEW sids, then you'll just have to wait it out. It can take several months for intomi to stop trying to parse old URLs with sids that it collected previously. You can, however, filter your user tacking and whos online scripts so that it doesn't show Intomi anymore. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.
Guest Posted March 19, 2004 Posted March 19, 2004 my logs show that Inktomi is using a user-agent of "YahooSeeker/1.1" -jared
Guest Posted March 19, 2004 Posted March 19, 2004 ok, I feel dumb. I see that someone already posted the user-agent. Can anyone confirm that this indeed stops the SIDs for that agent? From other threads, it seems that the SIDs may still continue for weeks, gradually dying down. Is this understanding correct ? Thanks!
sam6 Posted March 19, 2004 Posted March 19, 2004 If you've fixed it so that Intomi is not begin assigned any NEW sids, then you'll just have to wait it out. It can take several months for intomi to stop trying to parse old URLs with sids that it collected previously. i have the same problem i temporaraly blocked yahoo from my site be cause of this. i did indeed use yahooseeke/1.1 and it stoped producing new oscids but they cept poping up i put yahooseeker/1.1 in by browser as the user agent and did not get any sids. when i get back from my vacation i will allow yahoo to come back and see if it produces sids again. i balieve this quote to be true.
Pouli Posted March 22, 2004 Posted March 22, 2004 I don't know what Yahoo recently changed in its spider behavior, but WITHOUT trying the new user agent that was recently suggested in the thread (Yahooseeker), the Inktomi spider no longer gets a sid... Back to normal. Of course, I still have the spider checking urls containing sids that it received earlier in February, but even in this case, it appears in the user tracking contrib without a sid (as expected). I hope this is the same for you.
sam6 Posted April 2, 2004 Posted April 2, 2004 Hello i just started looking at my log files and i noticed yahooseeker is still going to those old oscsids it got before i stoped found the user agent. So i was looking in my logs and it is still able to go to those pages with the oscsid? i get a code 200 for those pages with the oscsid i thought it would not be able to access them at this point ?
♥yesudo Posted April 2, 2004 Posted April 2, 2004 I am continually getting problems with the Inktomi spider/s. Your online success is Paramount.
wizardsandwars Posted April 2, 2004 Posted April 2, 2004 Hello i just started looking at my log files and i noticed yahooseeker is still going to those old oscsids it got before i stoped found the user agent. So i was looking in my logs and it is still able to go to those pages with the oscsid? i get a code 200 for those pages with the oscsid i thought it would not be able to access them at this point ? It can still 'attempt' to get there several months afterwards. Just make sure it doesn't collect any new URLs with SIDs and you should be fine. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.