hpqnet Posted April 22, 2004 Posted April 22, 2004 Hi all, I seem to have a spider on my site that is filling up the shopping cart. Can spiders to this? the Ip is 66.77.73.32 and entry from access_log is 66.77.73.32 - - [22/Apr/2004:06:14:45 -0500] "GET /product_info.php?products_id=657 HTTP/1.0" 200 29773 "-" "Yahoo-VerticalCrawler-FormerWebCrawler/3.9 crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler" Is is building up one heck of a shopping cart (being that I have roughly 800 products online) ... How can I stop this?
♥yesudo Posted April 22, 2004 Posted April 22, 2004 Spiders are good - don't worry. Your online success is Paramount.
hpqnet Posted April 22, 2004 Author Posted April 22, 2004 but the shopping cart is going to be a monster, and I want to make sure its not going to break anything else
♥yesudo Posted April 22, 2004 Posted April 22, 2004 but the shopping cart is going to be a monster, and I want to make sure its not going to break anything else like what ? shouldn't cause any probs - it won't checkout with them when finished the session will just expire. Your online success is Paramount.
hpqnet Posted April 22, 2004 Author Posted April 22, 2004 yeah, but right now they are up to 5000 hits, and over 170000kbytes of bandwidth I have sessions stored in db, so I guess i am okay
♥Vger Posted April 22, 2004 Posted April 22, 2004 Spiders can gobble up your bandwidth though, and you don't want them in parts of the site they don't need to spider. Set up a text file called robots.txt, and place it in your root folder. Inside it use something like this - User-agent: * Disallow: /admin (or whatever your admin section is called) Disallow: /includes Disallow: cart.php Disallow: checkout.php add any fies you don't want it to spider. This won't stop the spiders used by spammers and hackers, but the legit spiders will look for the file and follow its directives. Hope this helps - Vger
hpqnet Posted April 22, 2004 Author Posted April 22, 2004 This is my current robots.txt file... this look okay or too much? Also according to allweb.com the user-agent is called fast ... Should I add fast to spiders.txt? User-agent: * Disallow: /admin/ Disallow: /test/ Disallow: /catalog/ Disallow: /download/ Disallow: /download_files/ Disallow: /pub/ Disallow: /temp/ #Block files that are secure or login oriented Disallow: /account.php Disallow: /account_edit.php Disallow: /account_edit_process.php Disallow: /account_history.php Disallow: /account_history_info.php Disallow: /account_newsletters.php Disallow: /account_notifications.php Disallow: /account_password.php Disallow: /address_book.php Disallow: /address_book_process.php Disallow: /advanced_search.php Disallow: /advanced_search_result.php Disallow: /checkout_confirmation.php Disallow: /checkout_payment.php Disallow: /checkout_payment_address.php Disallow: /checkout_process.php Disallow: /checkout_shipping.php Disallow: /checkout_shipping_address.php Disallow: /checkout_success.php Disallow: /create_account.php Disallow: /create_account_process.php Disallow: /create_account_success.php Disallow: /disclaimer.php Disallow: /download.php Disallow: /info_shopping_cart.php Disallow: /login.php Disallow: /logoff.php Disallow: /logoff.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /popup_search_help.php Disallow: /product_notifications.php Disallow: /product_reviews_write.php Disallow: /redirect.php Disallow: /shopping_cart.php Disallow: /shopping_cart_help.php Disallow: /tell_a_friend.php Disallow: /contact_us.php
211655 Posted April 22, 2004 Posted April 22, 2004 what are spiders for any ways... how they are good and how they are bad.. 211655 SEO Optimization Export Orders into CSV file
Guest Posted April 22, 2004 Posted April 22, 2004 Spiders are how search engines see what's on your site. They're good, in that they allow people to see what you've got and find the stuff you're trying to sell; they can be bad, however, in that they don't really know how much bandwidth they're using or, in this case, that they're messing up your cart. There's also a band of evil spiders running rampant; their job is to snarf email addresses so people can get you to go to their porn site or buy something to make your penis bigger. The only real way to foil them is to make sure your email addresses are noplace near your website, or cloak them somehow...good luck.
♥Vger Posted April 23, 2004 Posted April 23, 2004 This is my current robots.txt file... this look okay or too much? Also according to allweb.com the user-agent is called fast ... Should I add fast to spiders.txt? User-agent: * Disallow: /admin/ Disallow: /test/ Disallow: /catalog/ Disallow: /download/ Disallow: /download_files/ Disallow: /pub/ Disallow: /temp/ #Block files that are secure or login oriented Disallow: /account.php Disallow: /account_edit.php Disallow: /account_edit_process.php Disallow: /account_history.php Disallow: /account_history_info.php Disallow: /account_newsletters.php Disallow: /account_notifications.php Disallow: /account_password.php Disallow: /address_book.php Disallow: /address_book_process.php Disallow: /advanced_search.php Disallow: /advanced_search_result.php Disallow: /checkout_confirmation.php Disallow: /checkout_payment.php Disallow: /checkout_payment_address.php Disallow: /checkout_process.php Disallow: /checkout_shipping.php Disallow: /checkout_shipping_address.php Disallow: /checkout_success.php Disallow: /create_account.php Disallow: /create_account_process.php Disallow: /create_account_success.php Disallow: /disclaimer.php Disallow: /download.php Disallow: /info_shopping_cart.php Disallow: /login.php Disallow: /logoff.php Disallow: /logoff.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /popup_search_help.php Disallow: /product_notifications.php Disallow: /product_reviews_write.php Disallow: /redirect.php Disallow: /shopping_cart.php Disallow: /shopping_cart_help.php Disallow: /tell_a_friend.php Disallow: /contact_us.php Hi, About your robots.txt file. If your site operates in a directory called 'catalog', and you've blocked that directory in the robots file, then none of your pages will get spidered. Best to allow that directory, but block all of the other directories within it, and, as you've done, any files related to the shopping cart side of things. That way your content pages will get spidered, but not your catalog, images etc., Hope this helps - Vger
user99999999 Posted April 23, 2004 Posted April 23, 2004 Change the buy now buttons on the products listing to be forms instead of links then the spider cant do that.
paulm2003 Posted April 23, 2004 Posted April 23, 2004 Spiders are good - don't worry.I have had one that used 8% (and increasing) off the total site traffic for months and filling carts now and then (on MS2). Finally I decided to block the IP, and the nr of vistiors increased just as fast after that, so I guess this wasn't a usefull bot after all. Change the buy now buttons on the products listing to be forms instead of links then the spider cant do that. aren't the buy now buttons already forms in MS2?
hpqnet Posted April 23, 2004 Author Posted April 23, 2004 actually my site runs in the root directory instead of the /catalog directory.
♥Vger Posted April 23, 2004 Posted April 23, 2004 actually my site runs in the root directory instead of the /catalog directory. Agh! So why have you got /catalog/ blocked in your robtots.txt file? No need unless you have a catalog directory. Vger
hpqnet Posted April 23, 2004 Author Posted April 23, 2004 i still got some orginal unmodified php files within the /catalog directory.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.