Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Bot crawling for 3 days straight...


bobsi18

Recommended Posts

Hi all... Starting to wonder about this one - I've had a bot named 'FAST' crawling my site for about 3 days - the last session lasted over 18 hours, each one has been a similar amount of time. I've been searching, can't find any info. It's using the same IP addy. Is this a problem, it crawling for so long? What can I do about it? I recently installed 'Ban IP' - so I could potentially (I think) kick it off, would that be suggested?

 

Any help appreciated, I just have no idea what it means :)

 

~bobsi18~

Link to comment
Share on other sites

for bad bots robots.txt rarely ever works.

ban them via htaccess useragent.

 

here's an example of mine:

   SetEnvIfNoCase User-Agent "charlotte/" bad_bot
  SetEnvIfNoCase User-Agent "ia_archiver" bad_bot
  SetEnvIfNoCase User-Agent "irlbot/" bad_bot
  SetEnvIfNoCase User-Agent "lmcrawler" bad_bot
  SetEnvIfNoCase User-Agent "java/" bad_bot
  SetEnvIfNoCase User-Agent "libwww-perl/" bad_bot
  SetEnvIfNoCase User-Agent "lwp::simple/" bad_bot
  SetEnvIfNoCase User-Agent "mothra/netscan" bad_bot
  SetEnvIfNoCase User-Agent "snapbot/1.0" bad_bot
  SetEnvIfNoCase User-Agent "sna-0.0.1" bad_bot
  SetEnvIfNoCase User-Agent "wget/" bad_bot
  Order Allow,Deny
  Allow from all
  Deny from env=bad_bot

(all of them are either bandwidth suckers, scrapers or harvesters.. so copying this into your own htaccess would probably do you good) :)

 

keep in mind before you ban a useragent you should do some minor research to see if the said useragent is indeed a bad bot.

 

and make sure it's the userAGENT you ban, not the bot name.

Link to comment
Share on other sites

for bad bots robots.txt rarely ever works.

ban them via htaccess useragent.

 

here's an example of mine:

   SetEnvIfNoCase User-Agent "charlotte/" bad_bot
  SetEnvIfNoCase User-Agent "ia_archiver" bad_bot
  SetEnvIfNoCase User-Agent "irlbot/" bad_bot
  SetEnvIfNoCase User-Agent "lmcrawler" bad_bot
  SetEnvIfNoCase User-Agent "java/" bad_bot
  SetEnvIfNoCase User-Agent "libwww-perl/" bad_bot
  SetEnvIfNoCase User-Agent "lwp::simple/" bad_bot
  SetEnvIfNoCase User-Agent "mothra/netscan" bad_bot
  SetEnvIfNoCase User-Agent "snapbot/1.0" bad_bot
  SetEnvIfNoCase User-Agent "sna-0.0.1" bad_bot
  SetEnvIfNoCase User-Agent "wget/" bad_bot
  Order Allow,Deny
  Allow from all
  Deny from env=bad_bot

(all of them are either bandwidth suckers, scrapers or harvesters.. so copying this into your own htaccess would probably do you good) :)

 

keep in mind before you ban a useragent you should do some minor research to see if the said useragent is indeed a bad bot.

 

and make sure it's the userAGENT you ban, not the bot name.

Thanks for your info - in the end I didn't try and get rid of them - I researched it a fair amount, and couldn't find anyone complaining about them, they seemed to be respecting my robots file - in that i didn't see them try to access my admin or other 'banned' pages (I'm new to all this, the only way I know how to tell what a bot is doing is to watch the 'Whos Online' page). They were on the site for about 4 and a half days, for about 16 or so hours each time, but now they've gone. We'll see if they come back...

Link to comment
Share on other sites

share the useragent (if you have it handy) so that others searching this said bot, can use the info at hand as to whether or not it's a bannable bot :)

 

which version of who's online do you use? i use WOE to determine what bots are bannable and which are not, as well :)

Link to comment
Share on other sites

Perhaps you should have an updated spiders.txt file.

 

Have a look at this contribution, it is doing a very fine work.

 

And this is the support thread in the forum, Steve, the maintainer is great and is answering most of the questions, if he can.

 

Perhaps you should ask him, if he knows the bot, which bothers you.

If the bot is not a bad bot, the a.m. contribution will help you, that the SID is not included in the crawling results.

 

The long crawling time is not a problem (not in my point of view), I saw similar crawling times with googlebot.

 

Regards

Andreas

Link to comment
Share on other sites

share the useragent (if you have it handy) so that others searching this said bot, can use the info at hand as to whether or not it's a bannable bot :)

 

which version of who's online do you use? i use WOE to determine what bots are bannable and which are not, as well :)

whoops, thought i had included the name of the fob :). it was labelled as 'FAST', and after some searching, i discovered it's 'FAST MetaWeb Crawler (helpdesk at fastsearch dot com)'. i'm using who's online enhancement v2 (the latest)... addictive!

 

Perhaps you should have an updated spiders.txt file.

 

Have a look at this contribution, it is doing a very fine work.

 

And this is the support thread in the forum, Steve, the maintainer is great and is answering most of the questions, if he can.

 

Perhaps you should ask him, if he knows the bot, which bothers you.

If the bot is not a bad bot, the a.m. contribution will help you, that the SID is not included in the crawling results.

 

The long crawling time is not a problem (not in my point of view), I saw similar crawling times with googlebot.

 

Regards

Andreas

hi andreas, yes, i am using the updated spiders text, very helpful. i'll ask steve if he knows of this bot.

 

thanks again all :)

Link to comment
Share on other sites

  • 2 months later...
whoops, thought i had included the name of the fob :). it was labelled as 'FAST', and after some searching, i discovered it's 'FAST MetaWeb Crawler (helpdesk at fastsearch dot com)'. i'm using who's online enhancement v2 (the latest)... addictive!

hi andreas, yes, i am using the updated spiders text, very helpful. i'll ask steve if he knows of this bot.

 

thanks again all :)

 

Hey out there!

 

Ran across this post and just to let you know 'Fast' is used by the French yellow pages (pages jaunes) and can be helpfull if you have content interesting for European customers. Just to let you know they do crawl all day when they find interesting content and do each product.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...