bobsi18 Posted August 17, 2006 Share Posted August 17, 2006 Hi all... Starting to wonder about this one - I've had a bot named 'FAST' crawling my site for about 3 days - the last session lasted over 18 hours, each one has been a similar amount of time. I've been searching, can't find any info. It's using the same IP addy. Is this a problem, it crawling for so long? What can I do about it? I recently installed 'Ban IP' - so I could potentially (I think) kick it off, would that be suggested? Any help appreciated, I just have no idea what it means :) ~bobsi18~ Link to comment Share on other sites More sharing options...
Guest Posted August 17, 2006 Share Posted August 17, 2006 You can probably also add that bot to your robots.txt file as a disallow. Link to comment Share on other sites More sharing options...
Guest Posted August 18, 2006 Share Posted August 18, 2006 for bad bots robots.txt rarely ever works. ban them via htaccess useragent. here's an example of mine: SetEnvIfNoCase User-Agent "charlotte/" bad_bot SetEnvIfNoCase User-Agent "ia_archiver" bad_bot SetEnvIfNoCase User-Agent "irlbot/" bad_bot SetEnvIfNoCase User-Agent "lmcrawler" bad_bot SetEnvIfNoCase User-Agent "java/" bad_bot SetEnvIfNoCase User-Agent "libwww-perl/" bad_bot SetEnvIfNoCase User-Agent "lwp::simple/" bad_bot SetEnvIfNoCase User-Agent "mothra/netscan" bad_bot SetEnvIfNoCase User-Agent "snapbot/1.0" bad_bot SetEnvIfNoCase User-Agent "sna-0.0.1" bad_bot SetEnvIfNoCase User-Agent "wget/" bad_bot Order Allow,Deny Allow from all Deny from env=bad_bot (all of them are either bandwidth suckers, scrapers or harvesters.. so copying this into your own htaccess would probably do you good) :) keep in mind before you ban a useragent you should do some minor research to see if the said useragent is indeed a bad bot. and make sure it's the userAGENT you ban, not the bot name. Link to comment Share on other sites More sharing options...
bobsi18 Posted August 19, 2006 Author Share Posted August 19, 2006 for bad bots robots.txt rarely ever works.ban them via htaccess useragent. here's an example of mine: SetEnvIfNoCase User-Agent "charlotte/" bad_bot SetEnvIfNoCase User-Agent "ia_archiver" bad_bot SetEnvIfNoCase User-Agent "irlbot/" bad_bot SetEnvIfNoCase User-Agent "lmcrawler" bad_bot SetEnvIfNoCase User-Agent "java/" bad_bot SetEnvIfNoCase User-Agent "libwww-perl/" bad_bot SetEnvIfNoCase User-Agent "lwp::simple/" bad_bot SetEnvIfNoCase User-Agent "mothra/netscan" bad_bot SetEnvIfNoCase User-Agent "snapbot/1.0" bad_bot SetEnvIfNoCase User-Agent "sna-0.0.1" bad_bot SetEnvIfNoCase User-Agent "wget/" bad_bot Order Allow,Deny Allow from all Deny from env=bad_bot (all of them are either bandwidth suckers, scrapers or harvesters.. so copying this into your own htaccess would probably do you good) :) keep in mind before you ban a useragent you should do some minor research to see if the said useragent is indeed a bad bot. and make sure it's the userAGENT you ban, not the bot name. Thanks for your info - in the end I didn't try and get rid of them - I researched it a fair amount, and couldn't find anyone complaining about them, they seemed to be respecting my robots file - in that i didn't see them try to access my admin or other 'banned' pages (I'm new to all this, the only way I know how to tell what a bot is doing is to watch the 'Whos Online' page). They were on the site for about 4 and a half days, for about 16 or so hours each time, but now they've gone. We'll see if they come back... Link to comment Share on other sites More sharing options...
Guest Posted August 19, 2006 Share Posted August 19, 2006 share the useragent (if you have it handy) so that others searching this said bot, can use the info at hand as to whether or not it's a bannable bot :) which version of who's online do you use? i use WOE to determine what bots are bannable and which are not, as well :) Link to comment Share on other sites More sharing options...
Andreas2003 Posted August 19, 2006 Share Posted August 19, 2006 Perhaps you should have an updated spiders.txt file. Have a look at this contribution, it is doing a very fine work. And this is the support thread in the forum, Steve, the maintainer is great and is answering most of the questions, if he can. Perhaps you should ask him, if he knows the bot, which bothers you. If the bot is not a bad bot, the a.m. contribution will help you, that the SID is not included in the crawling results. The long crawling time is not a problem (not in my point of view), I saw similar crawling times with googlebot. Regards Andreas Link to comment Share on other sites More sharing options...
bobsi18 Posted August 19, 2006 Author Share Posted August 19, 2006 share the useragent (if you have it handy) so that others searching this said bot, can use the info at hand as to whether or not it's a bannable bot :) which version of who's online do you use? i use WOE to determine what bots are bannable and which are not, as well :) whoops, thought i had included the name of the fob :). it was labelled as 'FAST', and after some searching, i discovered it's 'FAST MetaWeb Crawler (helpdesk at fastsearch dot com)'. i'm using who's online enhancement v2 (the latest)... addictive! Perhaps you should have an updated spiders.txt file. Have a look at this contribution, it is doing a very fine work. And this is the support thread in the forum, Steve, the maintainer is great and is answering most of the questions, if he can. Perhaps you should ask him, if he knows the bot, which bothers you. If the bot is not a bad bot, the a.m. contribution will help you, that the SID is not included in the crawling results. The long crawling time is not a problem (not in my point of view), I saw similar crawling times with googlebot. Regards Andreas hi andreas, yes, i am using the updated spiders text, very helpful. i'll ask steve if he knows of this bot. thanks again all :) Link to comment Share on other sites More sharing options...
Barbie Posted November 11, 2006 Share Posted November 11, 2006 whoops, thought i had included the name of the fob :). it was labelled as 'FAST', and after some searching, i discovered it's 'FAST MetaWeb Crawler (helpdesk at fastsearch dot com)'. i'm using who's online enhancement v2 (the latest)... addictive!hi andreas, yes, i am using the updated spiders text, very helpful. i'll ask steve if he knows of this bot. thanks again all :) Hey out there! Ran across this post and just to let you know 'Fast' is used by the French yellow pages (pages jaunes) and can be helpfull if you have content interesting for European customers. Just to let you know they do crawl all day when they find interesting content and do each product. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.