Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

should I block this address??


WiseWombat

Recommended Posts

As you can see the IP address 65.54.188.107 reads the robots.txt files in the webroot but it doesnt follow the rules Should I block it.

As it reads the files That I dont wont spiders indexing.

I also have updated spider.txt and robot.txt file installed any ideas.

Thanks

 

65.54.188.107 - - [23/Aug/2005:20:27:09 +1000] "GET /robots.txt HTTP/1.0" 200 2753

65.54.188.107 - - [23/Aug/2005:20:27:09 +1000] "GET /forum/ HTTP/1.0" 200 24378

65.54.188.107 - - [23/Aug/2005:20:33:23 +1000] "GET /forum/index.php?sid=aeb25e23d957e4aee08702ef37eda31c HTTP/1.0" 200 24378

65.54.188.107 - - [23/Aug/2005:22:10:38 +1000] "GET /OurShop/account.php HTTP/1.0" 302 -

65.54.188.107 - - [23/Aug/2005:22:15:08 +1000] "GET /RoughAsGuts/contact_us.php HTTP/1.0" 404 321

65.54.188.107 - - [23/Aug/2005:23:23:20 +1000] "GET /OurShop/checkout_shipping.php HTTP/1.0" 302 -

65.54.188.107 - - [23/Aug/2005:23:52:01 +1000] "GET /TackleShop/shopping_cart.php?language=en HTTP/1.0" 200 22050

65.54.188.107 - - [24/Aug/2005:01:15:21 +1000] "GET /TackleShop/checkout_shipping.php HTTP/1.0" 302 -

65.54.188.107 - - [24/Aug/2005:01:26:20 +1000] "GET /TackleShop/account.php HTTP/1.0" 302 -

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

That's MSN Bot.  I wouldn't block it if I were you.

 

Vger

Thanks Vger

I cant understand why it reads the robots.txt files and then ignores , Disallow:

As the other spiders follow the rules.

And I also notice that keep looking for files that nolonger exist.

Example.

65.54.188.107 - - [23/Aug/2005:22:15:08 +1000] "GET /RoughAsGuts/contact_us.php HTTP/1.0" 404 321

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

I agree with the others - this seems to be msnbot. I notice that you left out the user agent from the access log - what's there? Did you also manually edit out the osCsid?

Link to comment
Share on other sites

I agree with the others - this seems to be msnbot. I notice that you left out the user agent from the access log - what's there?  Did you also manually edit out the osCsid?

Thanks I didnt edit the osCsid? they are over writen through the server.

I cant remember if I picked this up through oscommerce contributions or if I picked up this from webmasterworld I then and added to the htaccess file.

Seems to work fine this is the first problem Iv had in the past 6 months with spiders.

example

 

RewriteEngine on

RewriteBase /

#

# Skip the next two rewriterules if NOT a spider

RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|crawl|googlebot|crawl|slurp) [NC]

RewriteRule .* - [s=2]

#

# case: leading and trailing parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC]

RewriteRule (.*) $1?%1&%2 [R=301,L]

#

# case: leading-only, trailing-only or no additional parameters

RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC]

RewriteRule (.*) $1?%1 [R=301,L]

 

It Prevents spiders from creating session ids Just add the spiders as needed to the list.

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...