kenkja Posted March 17, 2013 Posted March 17, 2013 :( Just don't know what to do about bots (thanks D. Springfield) Bots seem to playing havoc with my site, usually ending up with very slow loading times. I know that some of this is delay is down to way I have designed the site and to the fact that my host is not the fastest and am using a shared server, but usually when I check admin-.who's online, its always a some bot or other that is all over the database. Previously I have checked raw access data from cPanel and seen that even though admin->who's online, might show say 30 identical ip's, the raw access shows maybe 300 incidents of the same ip. I added this code to the robots.txt file Crawl-delay: 2 Request-rate: 1/2 Visit-time: 0100-0600 but I still notice that msn.bot (and others) is all over my site in uk daylight hours, do I need to specify a time zone ? This also touches on a subject I have seen in posts about admin->who's online, in respect of the the time frame ... whilst my site is Uk the server is in the USA, so the times I see are not those I am currently in. Then there is the question of bots written in Mandarin (or similar) which refuse to understand the instructions of the robots.txt file. I ended up using .htaccess to ban the main culprit, but even when doing it i was not sure if the bot was actually a culprit or one just wanting to allow millions of chinese people a chance to see my site hahaha :wacko: :wacko: , the mind bogels What is a good bot, what is a bad bot Discuss, please ken Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
MrPhil Posted March 17, 2013 Posted March 17, 2013 For ill-mannered bots such as Baidu, about all you can do is ban (deny) them in /.htaccess if they're ignoring robots.txt. As for the time setting, that should be GMT (UTC). Note that Google, among others, did not obey visit-time as of a few years ago. Maybe they do now (or maybe not). There's a lot of discussion about how to set up robots.txt to deal with various bots. I was just reading http://searchengineland.com/a-deeper-look-at-robotstxt-17573.
kenkja Posted March 17, 2013 Author Posted March 17, 2013 @@MrPhil thanks have already banned Baidu via htaccess, as it seems to be very aggressive As yet have not found google bot, out of my time frame, but that may just be coincidence, msn.bot appears to worst culprit so far Then again, whilst we don't want bots using up all our database resources, surely we need them to get listed, seems all a bit of a black art to me Will have a good read of the link tomorrow regards ken Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
kenkja Posted March 21, 2013 Author Posted March 21, 2013 Hi all, this bot issue is really getting me down, I downloaded raw access file the other day, only 25000+lines of info, when I finally managed to get the gz file open which was a battle. Anyway my guess is 90% of these are bots or spiders should this code in root .htacess work RewriteEngine on # $Id$ # # This is used with Apache WebServers # # For this to work, you must include the parameter 'Options' to # the AllowOverride configuration # # Example: # # <Directory "/usr/local/apache/htdocs"> # AllowOverride Options # </Directory> # # 'All' with also work. (This configuration is in the # apache/conf/httpd.conf file) # The following makes adjustments to the SSL protocol for Internet # Explorer browsers #<IfModule mod_setenvif.c> # <IfDefine SSL> # SetEnvIf User-Agent ".*MSIE.*" \ # nokeepalive ssl-unclean-shutdown \ # downgrade-1.0 force-response-1.0 # </IfDefine> #</IfModule> # If Search Engine Friendly URLs do not work, try enabling the # following Apache configuration parameter # AcceptPathInfo On # Fix certain PHP values # (commented out by default to prevent errors occuring on certain # servers) # php_value session.use_trans_sid 0 # php_value register_globals 1 SetEnvIfNoCase User-Agent ^Baiduspider$ bad_bot SetEnvIfNoCase User-Agent ^360spider$ bad_bot SetEnvIfNoCase User-Agent ^Yandex*$ bad_bot order Allow,Deny Allow from all Deny from env=bad_bot RewriteCond %{HTTP_REFERER} !^http://mysite.co.uk/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http://mysite.co.uk$ [NC] RewriteCond %{HTTP_REFERER} !^http://mysub.com/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http://mysub.com$ [NC] RewriteCond %{HTTP_REFERER} !^http://www.mysite.co.uk/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http://www.mysite.co.uk$ [NC] RewriteCond %{HTTP_REFERER} !^http://www.mysub.com/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http://www.mysub.com$ [NC] RewriteCond %{HTTP_REFERER} !^https://mysite.co.uk/.*$ [NC] RewriteCond %{HTTP_REFERER} !^https://mysite.co.uk$ [NC] RewriteCond %{HTTP_REFERER} !^https://www.mysite.co.uk/.*$ [NC] RewriteCond %{HTTP_REFERER} !^https://www.mysite.co.uk$ [NC] RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC] #Instead of showing access denied redirect to index.php #ErrorDocument 403 /access_error.php?id=403 #Like so ErrorDocument 403 /index.php?id=403 #Below add (use your renamed admin) RewriteRule ^myadmin\/?$ - [F] ########## BAD BEHAVIOR BLOCK rules to ban exploits RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR] RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR] RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR] RewriteCond %{QUERY_STRING} (\<|%3C).*iframe.*(\>|%3E) [NC,OR] RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR] RewriteCond %{QUERY_STRING} ^(.*)cPath=http://(.*)$ [NC,OR] RewriteCond %{QUERY_STRING} ^(.*)/self/(.*)$ [NC,OR] RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2}) RewriteRule ^(.*)$ bad_conduct/ban.php [L] RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK) RewriteRule .* - [F] RewriteRule setup\.php$ bad_conduct/ban.php [NC,L] RewriteRule file_manager\.php$ bad_conduct/ban.php [NC,L] ########### BAD BEHAVIOR BLOCK rules to ban exploits ########### IMPORTANT! ########### Add one blank line at the very end of the .htaccess file <Files 403.shtml> order allow,deny allow from all </Files> deny from 173.199.114.163 deny from 204.12.226.2 deny from 208.115.111.68 deny from 5.63.145.73 deny from 69.30.238.26 deny from 65.55.213.72 deny from 82.192.66.250 deny from 180.76.5.136 deny from 86.55.210.53 deny from 212.35.10.79 deny from 218.214.2.6 deny from 83.103.119.239 deny from 188.138.16.60 deny from 94.102.7.154 deny from 193.27.246.178 deny from 195.157.124.186 deny from 87.108.66.195 deny from 79.143.179.17 deny from 94.127.67.120 deny from 82.204.11.106 deny from 195.172.186.81 deny from 195.116.150.20 deny from 195.116.150.20 deny from 195.116.150.20 deny from 195.116.150.20 deny from 195.116.150.20 deny from 178.172.60.6 deny from 180.210.58.51 deny from 117.103.223.26 deny from 118.107.163.182 deny from 218.29.115.152 deny from 168.62.166.202 deny from 218.108.236.107 deny from 122.224.5.122 deny from 168.61.17.198 deny from 180.131.3.12 deny from 122.224.6.43 deny from 123.103.15.55 deny from 219.255.134.96 deny from 50.192.249.201 deny from 46.137.207.128 deny from 121.189.62.84 deny from 223.85.245.54 deny from 31.222.164.138 deny from 50.57.127.55 deny from 211.167.68.21 deny from 147.156.42.201 deny from 60.29.10.18 deny from 46.21.157.143 deny from 60.191.232.53 deny from 208.109.104.119 deny from 60.199.223.196 deny from 62.202.18.26 deny from 99.177.96.73 any help would be gratefully appreciated Nb I do have a line at the bottom of the file before all the denied ip's that bad behavour has blocked just doesn't like it in the code snippet, have also been manually adding ips to the list from the raw access log, which is to be fair a waste of time as the change them all the time ken Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
kenkja Posted March 21, 2013 Author Posted March 21, 2013 and whilst were on the subject anyone got any experience with applying url below to osc http://perishablepress.com/blackhole-bad-bots/ Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
Jack_mcs Posted March 21, 2013 Posted March 21, 2013 That's just an IP trap package. There is an addon that does the same thing. If you want to install one, the one written for oscommerce would probably be the better choice. I'll be releasing a new addon in a week or two that will allow controlling these bots from admin. It has a trap built into it. Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. All of My Addons Get the latest versions of my addons Recommended SEO Addons
kenkja Posted March 21, 2013 Author Posted March 21, 2013 @@Jack_mcs Sorry, I got in flunk and didn't read the link till a while after, just seems un-neccessarily complex when compared to the IP Trap add on ken Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
kenkja Posted March 24, 2013 Author Posted March 24, 2013 Hi all, Mr and those bots again, seemed to slowed them down, buts about all. Anyway found this code block on hosts knowledge base ##begin code ##start blocking potentially unwanted bots. RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus RewriteRule ^.* - [F,L] ##end code. bai bots. Actually it had a great long list of bots, which it banned, but the above shows the beginning and end of code. gonna give it a go and see what happens Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
kenkja Posted March 26, 2013 Author Posted March 26, 2013 unless the bots where having a night off, that last code block does ban the likes of baiduspider/twengabot etc Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
kenkja Posted April 3, 2013 Author Posted April 3, 2013 Small update to the last post, am still not seeing baiduspider or twenga bot, so I guess it works for them, however nothing I seem to do gets rid of AhrefsBot, anyone got any clues regards Ken Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
kenkja Posted April 3, 2013 Author Posted April 3, 2013 Time for a rant, I've just been on ahrefs website, they are an SEO organisation, much as I guess many of the other bots are, so the inference is that they are somehow trying to help promote websites. It seems blatantly obvious to me, that anyone who is clever enough to build such a program can have absolutely no doubt about the fact that the program is malicious, they all know full well that it will create bandwith issues - an if they conducted such research in the real world, they would be in big trouble. Lets say a small independant store on a high street has 300 product lines supplied by 3 major wholesalers. The wholesalers decide to check on the store, are the using correct RRP etc, and so on second 1, 3 mystery shoppers appear, then on second 2 3 more, in just less than 2 minutes there are 300 - in the Uk this would be called a public order offence, possibly even a riot ! Rant over Ken Os-commerce v2.3.3 Security Pro v11 Site Monitor IP Trap htaccess Protection Bad Behaviour Block Year Make Model Document Manager X Sell Star Product Modular Front Page Modular Header Tags
Recommended Posts
Archived
This topic is now archived and is closed to further replies.