Guest Posted March 4, 2003 Share Posted March 4, 2003 Hello all! I had installed oscommerce on my site a few months back, and it would spin googlebot chasing its own tail due to multiple sessions-id's it would generate. This would result in my 30-page site being indexed by google as having 65000 pages! Well, I dived into the code and corrected the code so session id's would not be generated by robot visits (thanks to the likes of ICW whose help is much appreciated). But, this time, google is coming back only to look up those 65000 pages (my 404 log calls in awstats says that) and is still going around in circles hogging my bandwidth by 5 gb every 6 hours! Could someone please help me out on this? Any help will be most appreciated. Thanks for you time and attention. JanSomen Link to comment Share on other sites More sharing options...
Guest Posted March 4, 2003 Share Posted March 4, 2003 I am so tired of this .... sue Google... i MEAN it Link to comment Share on other sites More sharing options...
gdfwilliams Posted March 4, 2003 Share Posted March 4, 2003 Johnson - sue google? for what?? it's not their fault if their robots got stuck into an infinite loop the first time around. Should they detect when they're stuck in an infinite loop? Perhaps, but it's not like it's difficult to disallow all bots prior to ensuring a site is search engine safe. The primary responsibility falls on the developer, NOT google. JanSomen - So, now they're updating their files, they'll see that some of the 65,000 pages are now generating 404s and remove them from their queue. A quick fix to get them to stay off of at least some of the files would be to be sure your robots.txt file is excluding bots from non-product/content related pages: # Robots.txt file # User-agent: * Disallow: /address_book_process.php Disallow: /account.php Disallow: /account_edit.php Disallow: /account_edit_process.php Disallow: /account_history.php Disallow: /account_history_info.php Disallow: /address_book.php Disallow: /checkout_process.php Disallow: /advanced_search.php Disallow: /advanced_search_result.php Disallow: /checkout_address.php Disallow: /checkout_confirmation.php Disallow: /checkout_payment.php Disallow: /checkout_success.php Disallow: /conditions.php Disallow: /contact_us.php Disallow: /create_account.php Disallow: /create_account_process.php Disallow: /create_account_success.php Disallow: /download.php Disallow: /info_shopping_cart.php Disallow: /login.php Disallow: /logoff.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /popup_search_help.php Disallow: /privacy.php Disallow: /products_new.php Disallow: /product_notifications.php Disallow: /product_reviews.php Disallow: /product_reviews_info.php Disallow: /product_reviews_write.php Disallow: /redirect.php Disallow: /reviews.php Disallow: /shipping.php Disallow: /shopping_cart.php Disallow: /specials.php Disallow: /tell_a_friend.php Disallow: /disclaimer.php Disallow: /download/ Disallow: /images/ Disallow: /includes/ Disallow: /pub/ Disallow: /cgi-bin/ That way, google will at least have received instructions from you not to touch those pages. Perhaps some of their other indexed pages will generate 404s, but it will surely be fewer than 65,000 hits! Regardless, make absolutely sure that you're not appending a SID anywhere. There's a thread called 'Useful Tool' that should help. - Greg Link to comment Share on other sites More sharing options...
Guest Posted March 4, 2003 Share Posted March 4, 2003 The code you mentioned above to "disallow" the Google spider from certain pages, where do I insert this code? Thanks! Link to comment Share on other sites More sharing options...
gdfwilliams Posted March 4, 2003 Share Posted March 4, 2003 just create a file called robots.txt and put it in your root web directory. for more on robots.txt (a very important file), see http://www.searchengineworld.com/ There are some very useful tools there, including everything you'll need to know about robots.txt Link to comment Share on other sites More sharing options...
gdfwilliams Posted March 4, 2003 Share Posted March 4, 2003 http://www.searchengineworld.com/cgi-bin/r.../robotcheck.cgi that's a quicker link into their robots.txt section... Link to comment Share on other sites More sharing options...
Guest Posted March 5, 2003 Share Posted March 5, 2003 ok... the problem was that all non-existent pages were still being served by the 404.shtml page .. so I have now removed that page from my server for the time being. Also I have visited this url : http://services.google.com:8882/urlconsole/controller to ask them to remove the reference to the non existent files/folder altogether from google index reference. Hoping that I have the solution now. Thanks everyone for the input. Link to comment Share on other sites More sharing options...
Ramesh Posted March 6, 2003 Share Posted March 6, 2003 I dont want google to visit my store at all. If i produce a robots.txt file with the correct code, will this mean google will not visit my site and use up my bandwidth. ( With session ids and the like ) Or could I ask google just to visit my index page and not FOLLOW on from there. Appreciate any help people, thanks ! Special Effects / 3d + Flash Link to comment Share on other sites More sharing options...
Guest Posted March 6, 2003 Share Posted March 6, 2003 just create a robots.txt file in your root directory with User-agent: * Disallow: / Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.