WiseWombat Posted August 24, 2005 Share Posted August 24, 2005 As you can see the IP address 65.54.188.107 reads the robots.txt files in the webroot but it doesnt follow the rules Should I block it. As it reads the files That I dont wont spiders indexing. I also have updated spider.txt and robot.txt file installed any ideas. Thanks 65.54.188.107 - - [23/Aug/2005:20:27:09 +1000] "GET /robots.txt HTTP/1.0" 200 2753 65.54.188.107 - - [23/Aug/2005:20:27:09 +1000] "GET /forum/ HTTP/1.0" 200 24378 65.54.188.107 - - [23/Aug/2005:20:33:23 +1000] "GET /forum/index.php?sid=aeb25e23d957e4aee08702ef37eda31c HTTP/1.0" 200 24378 65.54.188.107 - - [23/Aug/2005:22:10:38 +1000] "GET /OurShop/account.php HTTP/1.0" 302 - 65.54.188.107 - - [23/Aug/2005:22:15:08 +1000] "GET /RoughAsGuts/contact_us.php HTTP/1.0" 404 321 65.54.188.107 - - [23/Aug/2005:23:23:20 +1000] "GET /OurShop/checkout_shipping.php HTTP/1.0" 302 - 65.54.188.107 - - [23/Aug/2005:23:52:01 +1000] "GET /TackleShop/shopping_cart.php?language=en HTTP/1.0" 200 22050 65.54.188.107 - - [24/Aug/2005:01:15:21 +1000] "GET /TackleShop/checkout_shipping.php HTTP/1.0" 302 - 65.54.188.107 - - [24/Aug/2005:01:26:20 +1000] "GET /TackleShop/account.php HTTP/1.0" 302 - ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
♥Vger Posted August 24, 2005 Share Posted August 24, 2005 That's MSN Bot. I wouldn't block it if I were you. Vger Link to comment Share on other sites More sharing options...
WiseWombat Posted August 24, 2005 Author Share Posted August 24, 2005 That's MSN Bot. I wouldn't block it if I were you. Vger <{POST_SNAPBACK}> Thanks Vger I cant understand why it reads the robots.txt files and then ignores , Disallow: As the other spiders follow the rules. And I also notice that keep looking for files that nolonger exist. Example. 65.54.188.107 - - [23/Aug/2005:22:15:08 +1000] "GET /RoughAsGuts/contact_us.php HTTP/1.0" 404 321 ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
user99999999 Posted August 24, 2005 Share Posted August 24, 2005 Its checking links that are already indexed after a few 404 error will cause that link to be deleted from the index. Same for disallow. Link to comment Share on other sites More sharing options...
stevel Posted August 24, 2005 Share Posted August 24, 2005 I agree with the others - this seems to be msnbot. I notice that you left out the user agent from the access log - what's there? Did you also manually edit out the osCsid? Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
WiseWombat Posted August 24, 2005 Author Share Posted August 24, 2005 I agree with the others - this seems to be msnbot. I notice that you left out the user agent from the access log - what's there? Did you also manually edit out the osCsid? <{POST_SNAPBACK}> Thanks I didnt edit the osCsid? they are over writen through the server. I cant remember if I picked this up through oscommerce contributions or if I picked up this from webmasterworld I then and added to the htaccess file. Seems to work fine this is the first problem Iv had in the past 6 months with spiders. example RewriteEngine on RewriteBase / # # Skip the next two rewriterules if NOT a spider RewriteCond %{HTTP_USER_AGENT} !(msnbot|slurp|crawl|googlebot|crawl|slurp) [NC] RewriteRule .* - [s=2] # # case: leading and trailing parameters RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+&(.+)$ [NC] RewriteRule (.*) $1?%1&%2 [R=301,L] # # case: leading-only, trailing-only or no additional parameters RewriteCond %{QUERY_STRING} ^(.+)&osCsid=[0-9a-z]+$|^osCsid=[0-9a-z]+&?(.*)$ [NC] RewriteRule (.*) $1?%1 [R=301,L] It Prevents spiders from creating session ids Just add the spiders as needed to the list. ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.