Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

OT: robots.txt


Guest

Recommended Posts

Posted

I have no idea why the spiders (mainly fast) are not listening to my robots.txt.

 

User-agent: *

Disallow: /catalog/images/

Disallow: /catalog/news/

Disallow: /catalog/newsletter/

Disallow: /catalog/firstsite/

Disallow: /catalog/includes

Disallow: /admin

Disallow: /catalog/address_book_process.php

Disallow: /catalog/account.php

Disallow: /catalog/account_edit.php

Disallow: /catalog/account_edit_process.php

Disallow: /catalog/account_history.php

Disallow: /catalog/account_history_info.php

Disallow: /catalog/address_book.php

Disallow: /catalog/checkout_process.php

Disallow: /catalog/advanced_search.php

Disallow: /catalog/advanced_search_result.php

Disallow: /catalog/checkout_address.php

Disallow: /catalog/checkout_confirmation.php

Disallow: /catalog/checkout_payment.php

Disallow: /catalog/checkout_success.php

Disallow: /catalog/conditions.php

Disallow: /catalog/contact_us.php

Disallow: /catalog/create_account.php

Disallow: /catalog/create_account_process.php

Disallow: /catalog/create_account_success.php

Disallow: /catalog/download.php

Disallow: /catalog/info_shopping_cart.php

Disallow: /catalog/login.php

Disallow: /catalog/logoff.php

Disallow: /catalog/password_forgotten.php

Disallow: /catalog/popup_image.php

Disallow: /catalog/popup_search_help.php

Disallow: /catalog/privacy.php

Disallow: /catalog/products_new.php

Disallow: /catalog/product_notifications.php

Disallow: /catalog/product_reviews.php

Disallow: /catalog/product_reviews_info.php

Disallow: /catalog/product_reviews_write.php

Disallow: /catalog/produktberatung.php

Disallow: /catalog/redirect.php

Disallow: /catalog/reviews.php

Disallow: /catalog/shipping.php

Disallow: /catalog/shopping_cart.php

Disallow: /catalog/specials.php

Disallow: /catalog/tell_a_friend.php

Disallow: /catalog/disclaimer.php

Disallow: /catalog/admin/

Disallow: /catalog/download/

Disallow: /catalog/images/

Disallow: /catalog/includes/

Disallow: /catalog/pub/

 

The spider are browsing my images and for example:

 

Disallow: /catalog/produktberatung.php

 

Any ideas???

 

Torsten

Posted
I have no idea why the spiders (mainly fast) are not listening to my robots.txt.

 

Any ideas???

 

Why do you think they should? There is no law forcing them to listen to your robots.txt. Some spiders do, some dont. Suggestion: Make notice of the spiders that ignore the robots.txt, find out where they come from and look there for policies that define what the spider will see and what not. Some spiders look at META tags with robot rules, some will look at robots.txt, some will look at HTML comments.

 

HTH

You can't have everything. That's why trains have difficulty crossing oceans, and hippos did not adapt to fly. -- from the OpenBSD mailinglist.

Posted

Jan ,

 

thx for the info. It's look like, that the fast spider loves my site. After 2000 vistits on sunday, 1500 vistis yesterday - guess what: the fast spider is my guest today.....

 

Torsten

Posted

Who is the "fast spider"? What IP-address is it coming from? What search-engine is it working for?

You can't have everything. That's why trains have difficulty crossing oceans, and hippos did not adapt to fly. -- from the OpenBSD mailinglist.

Posted

Ok this is something Jan maybe able to help with...

 

I have been looking for a site that lists the bot names and IP addresses.

I have found a couple but they are not very good.

 

Anyone know of a decent site that offers this information.

I just want to know the agent user name, ip address and name of bot if poss.

 

Cheers guys.

I have never heard of fastsearch tho I must say.

 

CC.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...