Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

help spider buying products :)


hpqnet

Recommended Posts

Posted

Hi all, I seem to have a spider on my site that is filling up the shopping cart. Can spiders to this? the Ip is 66.77.73.32 and entry from access_log is

 

66.77.73.32 - - [22/Apr/2004:06:14:45 -0500] "GET /product_info.php?products_id=657

HTTP/1.0" 200 29773 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd

dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler"

 

Is is building up one heck of a shopping cart (being that I have roughly 800 products online) ... How can I stop this?

Posted

but the shopping cart is going to be a monster, and I want to make sure its not going to break anything else

Posted
but the shopping cart is going to be a monster, and I want to make sure its not going to break anything else

like what ? shouldn't cause any probs - it won't checkout with them when finished the session will just expire.

Your online success is Paramount.

Posted

yeah, but right now they are up to 5000 hits, and over 170000kbytes of bandwidth

 

I have sessions stored in db, so I guess i am okay

Posted

Spiders can gobble up your bandwidth though, and you don't want them in parts of the site they don't need to spider. Set up a text file called robots.txt, and place it in your root folder. Inside it use something like this -

User-agent: *

Disallow: /admin (or whatever your admin section is called)

Disallow: /includes

Disallow: cart.php

Disallow: checkout.php

 

add any fies you don't want it to spider.

 

This won't stop the spiders used by spammers and hackers, but the legit spiders will look for the file and follow its directives.

 

Hope this helps - Vger

Posted

This is my current robots.txt file... this look okay or too much? Also according to allweb.com the user-agent is called fast ... Should I add fast to spiders.txt?

 

User-agent: *

Disallow: /admin/

Disallow: /test/

Disallow: /catalog/

Disallow: /download/

Disallow: /download_files/

Disallow: /pub/

Disallow: /temp/

#Block files that are secure or login oriented

Disallow: /account.php

Disallow: /account_edit.php

Disallow: /account_edit_process.php

Disallow: /account_history.php

Disallow: /account_history_info.php

Disallow: /account_newsletters.php

Disallow: /account_notifications.php

Disallow: /account_password.php

Disallow: /address_book.php

Disallow: /address_book_process.php

Disallow: /advanced_search.php

Disallow: /advanced_search_result.php

Disallow: /checkout_confirmation.php

Disallow: /checkout_payment.php

Disallow: /checkout_payment_address.php

Disallow: /checkout_process.php

Disallow: /checkout_shipping.php

Disallow: /checkout_shipping_address.php

Disallow: /checkout_success.php

Disallow: /create_account.php

Disallow: /create_account_process.php

Disallow: /create_account_success.php

Disallow: /disclaimer.php

Disallow: /download.php

Disallow: /info_shopping_cart.php

Disallow: /login.php

Disallow: /logoff.php

Disallow: /logoff.php

Disallow: /password_forgotten.php

Disallow: /popup_image.php

Disallow: /popup_search_help.php

Disallow: /product_notifications.php

Disallow: /product_reviews_write.php

Disallow: /redirect.php

Disallow: /shopping_cart.php

Disallow: /shopping_cart_help.php

Disallow: /tell_a_friend.php

Disallow: /contact_us.php

Posted

Spiders are how search engines see what's on your site. They're good, in that they allow people to see what you've got and find the stuff you're trying to sell; they can be bad, however, in that they don't really know how much bandwidth they're using or, in this case, that they're messing up your cart.

 

There's also a band of evil spiders running rampant; their job is to snarf email addresses so people can get you to go to their porn site or buy something to make your penis bigger. The only real way to foil them is to make sure your email addresses are noplace near your website, or cloak them somehow...good luck.

Posted
This is my current robots.txt file... this look okay or too much? Also according to allweb.com the user-agent is called fast ... Should I add fast to spiders.txt?

 

User-agent: *

Disallow: /admin/

Disallow: /test/

Disallow: /catalog/

Disallow: /download/

Disallow: /download_files/

Disallow: /pub/

Disallow: /temp/

#Block files that are secure or login oriented

Disallow: /account.php

Disallow: /account_edit.php

Disallow: /account_edit_process.php

Disallow: /account_history.php

Disallow: /account_history_info.php

Disallow: /account_newsletters.php

Disallow: /account_notifications.php

Disallow: /account_password.php

Disallow: /address_book.php

Disallow: /address_book_process.php

Disallow: /advanced_search.php

Disallow: /advanced_search_result.php

Disallow: /checkout_confirmation.php

Disallow: /checkout_payment.php

Disallow: /checkout_payment_address.php

Disallow: /checkout_process.php

Disallow: /checkout_shipping.php

Disallow: /checkout_shipping_address.php

Disallow: /checkout_success.php

Disallow: /create_account.php

Disallow: /create_account_process.php

Disallow: /create_account_success.php

Disallow: /disclaimer.php

Disallow: /download.php

Disallow: /info_shopping_cart.php

Disallow: /login.php

Disallow: /logoff.php

Disallow: /logoff.php

Disallow: /password_forgotten.php

Disallow: /popup_image.php

Disallow: /popup_search_help.php

Disallow: /product_notifications.php

Disallow: /product_reviews_write.php

Disallow: /redirect.php

Disallow: /shopping_cart.php

Disallow: /shopping_cart_help.php

Disallow: /tell_a_friend.php

Disallow: /contact_us.php

Hi, About your robots.txt file. If your site operates in a directory called 'catalog', and you've blocked that directory in the robots file, then none of your pages will get spidered. Best to allow that directory, but block all of the other directories within it, and, as you've done, any files related to the shopping cart side of things. That way your content pages will get spidered, but not your catalog, images etc.,

 

Hope this helps - Vger

Posted
Spiders are good - don't worry.
I have had one that used 8% (and increasing) off the total site traffic for months and filling carts now and then (on MS2). Finally I decided to block the IP, and the nr of vistiors increased just as fast after that, so I guess this wasn't a usefull bot after all.

 

 

Change the buy now buttons on the products listing to be forms instead of links then the spider cant do that.

aren't the buy now buttons already forms in MS2?

Posted
actually my site runs in the root directory instead of the /catalog directory.

Agh! So why have you got /catalog/ blocked in your robtots.txt file? No need unless you have a catalog directory.

 

Vger

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...