tlelliott77 Posted June 28, 2004 Posted June 28, 2004 MSNBot has been crawling my site regularly recently. Even though I have added msnbot to my spider.txt file it continues to append osCsid's to the end of the page names. It always seems to have the same osCsid on every visit, as far back as i can go on my user tracking. Has this hapepned to anyone else? I thought maybe it was happening because it was following links it made itself before I added it to the spiders.txt. If this is the case, will it correct itself in the end and remove the sids or just continue to spider the site with this sid? Anyone know how I can fix this? Thanks Tim
tlelliott77 Posted June 29, 2004 Author Posted June 29, 2004 Bump! Anybody else have this problem? Can I stop a specific osCsid from being used at all? Thanks Tim
pamperyourpuppy Posted July 8, 2004 Posted July 8, 2004 I have been having the same problem. I put msnbot in my spiders.txt and checked this morning, it's still pulling osCsid's I just put msnbot/0.11 and I'll see if that fixes it. Looking in my logs today the tag is "msnbot/0.11" and not "msnbot" so I'm not sure how specific the spiders.txt file has to be.
Guest Posted July 8, 2004 Posted July 8, 2004 I've had the same problem with msnbot. I've added to spiders.txt and still the same result. spiders.txt only has to have the first few letters of the spider and it SHOULD catch all that contain the same info. However, MSN and Inktomi both seem to get oscid's MOST of the time. Find a solution, please pass it along. Best Regards - John
pamperyourpuppy Posted July 11, 2004 Posted July 11, 2004 I just checked and msnbot is spidering properly now. It's been on my site all day with no osCsid's. Looks like adding "msnbot" worked fine. John
tlelliott77 Posted July 11, 2004 Author Posted July 11, 2004 On myu site msnbot is still occasionally using an oscsid that it had previously used. I guess it is revisiting the same links that it previously created. Also getting the Yahoo Slurp (Inktomi) doing the same thing. I'm hoping this will stop in the end but if anyone has any tried and tested ways of getting rid of these osCsid's I'd appreciate hearing. Thanks Tim
stevel Posted July 11, 2004 Posted July 11, 2004 I see the same with msnbot and Yahoo, but the msn problem seems to have disappeared. Eventually I guess the Yahoo problem will too. I do find it helpful to put in robots.txt disallow clauses for pages I never want spiders to visit, such as shopping cart, login, my account, etc. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description
stevel Posted July 18, 2004 Posted July 18, 2004 Guess I wrote too soon. msnbot is well-behaved, but Yahoo Slurp keeps collecting SIDs. I can't figure it out. I have "slurp" in spiders.txt, and a test shows that it seems to correctly pick it up and not issue a session, yet Yahoo Slurp continues to rack up sessions - or so it seems, anyway... I did have one customer place an order starting with a link from Yahoo that included an SID. I deleted that session so that it couldn't be reused, and a few others Yahoo had created, but I'd like to find a way to have such sessions deleted automatically. I looked at the "check user agent", but that redirects to the login page and is probably not what I want in the long run. But for now I've enabled recording the user agent in the sessions and will monitor to see if I am getting new Yahoo sessions. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description
211655 Posted July 18, 2004 Posted July 18, 2004 how do u restrict some page sin robot.txt. any sample. 211655 SEO Optimization Export Orders into CSV file
stevel Posted July 18, 2004 Posted July 18, 2004 User-agent: * Disallow: /shopping_cart.php Disallow: /advanced_search.php Disallow: /login.php Disallow: /checkout_shipping.php Disallow: /account.php Disallow: /login.php Disallow: /create_account.php Disallow: /password_forgotten.php Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description
John Posted August 6, 2004 Posted August 6, 2004 I too have the list of msnbot and Yahoo! Slurp in my spiders.txt in lower case. But Yahoo is still getting the SID for my site. MSN was ok for sometimes but its again showing hte SIDs in the url. Any idea how to get rid of this SID situation with only these two search engines. No other search engine is showing this behavior. Any idea to prevent sids for msnbot and yahoo! slurp. I have also put the the robots.txt file on my site. properly.
stevel Posted August 6, 2004 Posted August 6, 2004 What I find is that Yahoo is revisiting URLs it has saved in its index - it is not getting new SIDs. Not much you can do about that other than see if the sessions are still in the database and remove them. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description
Recommended Posts
Archived
This topic is now archived and is closed to further replies.