Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Googlebot's Bandwidth Rampage. Please help !!!!!!!!!!!!!!!!!


Guest

Recommended Posts

Hello all!

 

I had installed oscommerce on my site a few months back, and it would spin googlebot chasing its own tail due to multiple sessions-id's it would generate. This would result in my 30-page site being indexed by google as having 65000 pages!

 

Well, I dived into the code and corrected the code so session id's would not be generated by robot visits (thanks to the likes of ICW whose help is much appreciated).

 

But, this time, google is coming back only to look up those 65000 pages (my 404 log calls in awstats says that) and is still going around in circles hogging my bandwidth by 5 gb every 6 hours!

 

Could someone please help me out on this?

 

Any help will be most appreciated.

 

Thanks for you time and attention.

 

JanSomen

Link to comment
Share on other sites

Johnson -

sue google? for what?? it's not their fault if their robots got stuck into an infinite loop the first time around. Should they detect when they're stuck in an infinite loop? Perhaps, but it's not like it's difficult to disallow all bots prior to ensuring a site is search engine safe. The primary responsibility falls on the developer, NOT google.

 

JanSomen -

So, now they're updating their files, they'll see that some of the 65,000 pages are now generating 404s and remove them from their queue.

 

A quick fix to get them to stay off of at least some of the files would be to be sure your robots.txt file is excluding bots from non-product/content related pages:

 

# Robots.txt file 

#  



User-agent: * 

Disallow: /address_book_process.php 

Disallow: /account.php 

Disallow: /account_edit.php 

Disallow: /account_edit_process.php 

Disallow: /account_history.php 

Disallow: /account_history_info.php 

Disallow: /address_book.php 

Disallow: /checkout_process.php 

Disallow: /advanced_search.php 

Disallow: /advanced_search_result.php 

Disallow: /checkout_address.php 

Disallow: /checkout_confirmation.php 

Disallow: /checkout_payment.php 

Disallow: /checkout_success.php 

Disallow: /conditions.php 

Disallow: /contact_us.php 

Disallow: /create_account.php 

Disallow: /create_account_process.php 

Disallow: /create_account_success.php 

Disallow: /download.php 

Disallow: /info_shopping_cart.php 

Disallow: /login.php 

Disallow: /logoff.php 

Disallow: /password_forgotten.php 

Disallow: /popup_image.php 

Disallow: /popup_search_help.php 

Disallow: /privacy.php 

Disallow: /products_new.php 

Disallow: /product_notifications.php 

Disallow: /product_reviews.php 

Disallow: /product_reviews_info.php 

Disallow: /product_reviews_write.php 

Disallow: /redirect.php 

Disallow: /reviews.php 

Disallow: /shipping.php 

Disallow: /shopping_cart.php 

Disallow: /specials.php 

Disallow: /tell_a_friend.php 

Disallow: /disclaimer.php 

Disallow: /download/ 

Disallow: /images/ 

Disallow: /includes/ 

Disallow: /pub/ 

Disallow: /cgi-bin/

 

That way, google will at least have received instructions from you not to touch those pages. Perhaps some of their other indexed pages will generate 404s, but it will surely be fewer than 65,000 hits!

 

Regardless, make absolutely sure that you're not appending a SID anywhere. There's a thread called 'Useful Tool' that should help.

 

- Greg

Link to comment
Share on other sites

The code you mentioned above to "disallow" the Google spider from certain pages, where do I insert this code?

 

Thanks!

Link to comment
Share on other sites

ok... the problem was that all non-existent pages were still being served by the 404.shtml page .. so I have now removed that page from my server for the time being. Also I have visited this url :

 

http://services.google.com:8882/urlconsole/controller

 

to ask them to remove the reference to the non existent files/folder altogether from google index reference.

 

Hoping that I have the solution now.

 

Thanks everyone for the input.

Link to comment
Share on other sites

I dont want google to visit my store at all.

If i produce a robots.txt file with the correct code,

will this mean google will not visit my site and use up my bandwidth.

 

( With session ids and the like )

 

Or could I ask google just to visit my index page and not FOLLOW on from there.

 

Appreciate any help people, thanks !

Special Effects / 3d + Flash

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...