Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Good Bot Bad Bot


kenkja

Recommended Posts

Posted

:( Just don't know what to do about bots (thanks D. Springfield)

 

Bots seem to playing havoc with my site, usually ending up with very slow loading times. I know that some of this is delay is down to way I have designed the site and to the fact that my host is not the fastest and am using a shared server, but usually when I check admin-.who's online, its always a some bot or other that is all over the database.

 

Previously I have checked raw access data from cPanel and seen that even though admin->who's online, might show say 30 identical ip's, the raw access shows maybe 300 incidents of the same ip.

 

I added this code to the robots.txt file

 

Crawl-delay: 2
Request-rate: 1/2
Visit-time: 0100-0600

 

but I still notice that msn.bot (and others) is all over my site in uk daylight hours, do I need to specify a time zone ? This also touches on a subject I have seen in posts about admin->who's online, in respect of the the time frame ... whilst my site is Uk the server is in the USA, so the times I see are not those I am currently in.

 

Then there is the question of bots written in Mandarin (or similar) which refuse to understand the instructions of the robots.txt file. I ended up using .htaccess to ban the main culprit, but even when doing it i was not sure if the bot was actually a culprit or one just wanting to allow millions of chinese people a chance to see my site

 

hahaha :wacko: :wacko: , the mind bogels

 

What is a good bot, what is a bad bot

 

Discuss, please

 

ken

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

For ill-mannered bots such as Baidu, about all you can do is ban (deny) them in /.htaccess if they're ignoring robots.txt.

 

As for the time setting, that should be GMT (UTC). Note that Google, among others, did not obey visit-time as of a few years ago. Maybe they do now (or maybe not).

 

There's a lot of discussion about how to set up robots.txt to deal with various bots. I was just reading http://searchengineland.com/a-deeper-look-at-robotstxt-17573.

Posted

@@MrPhil

 

thanks have already banned Baidu via htaccess, as it seems to be very aggressive

 

As yet have not found google bot, out of my time frame, but that may just be coincidence, msn.bot appears to worst culprit so far

 

Then again, whilst we don't want bots using up all our database resources, surely we need them to get listed, seems all a bit of a black art to me

 

Will have a good read of the link tomorrow

 

regards

 

ken

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

Hi all,

 

this bot issue is really getting me down, I downloaded raw access file the other day, only 25000+lines of info, when I finally managed to get the gz file open which was a battle.

 

Anyway my guess is 90% of these are bots or spiders

 

should this code in root .htacess work

 

RewriteEngine on
# $Id$
#
# This is used with Apache WebServers
#
# For this to work, you must include the parameter 'Options' to
# the AllowOverride configuration
#
# Example:
#
# <Directory "/usr/local/apache/htdocs">
# AllowOverride Options
# </Directory>
#
# 'All' with also work. (This configuration is in the
# apache/conf/httpd.conf file)
# The following makes adjustments to the SSL protocol for Internet
# Explorer browsers
#<IfModule mod_setenvif.c>
# <IfDefine SSL>
# SetEnvIf User-Agent ".*MSIE.*" \
#			 nokeepalive ssl-unclean-shutdown \
#			 downgrade-1.0 force-response-1.0
# </IfDefine>
#</IfModule>
# If Search Engine Friendly URLs do not work, try enabling the
# following Apache configuration parameter
# AcceptPathInfo On
# Fix certain PHP values
# (commented out by default to prevent errors occuring on certain
# servers)
# php_value session.use_trans_sid 0
# php_value register_globals 1
SetEnvIfNoCase User-Agent ^Baiduspider$ bad_bot
SetEnvIfNoCase User-Agent ^360spider$ bad_bot
SetEnvIfNoCase User-Agent ^Yandex*$ bad_bot
order Allow,Deny
Allow from all
Deny from env=bad_bot

RewriteCond %{HTTP_REFERER} !^http://mysite.co.uk/.*$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://mysite.co.uk$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://mysub.com/.*$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://mysub.com$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.co.uk/.*$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysite.co.uk$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysub.com/.*$	 [NC]
RewriteCond %{HTTP_REFERER} !^http://www.mysub.com$	 [NC]
RewriteCond %{HTTP_REFERER} !^https://mysite.co.uk/.*$	 [NC]
RewriteCond %{HTTP_REFERER} !^https://mysite.co.uk$	 [NC]
RewriteCond %{HTTP_REFERER} !^https://www.mysite.co.uk/.*$	 [NC]
RewriteCond %{HTTP_REFERER} !^https://www.mysite.co.uk$	 [NC]
RewriteRule .*\.(jpg|jpeg|gif|png|bmp)$ - [F,NC]
#Instead of showing access denied redirect to index.php
#ErrorDocument 403 /access_error.php?id=403
#Like so
ErrorDocument 403 /index.php?id=403
#Below add (use your renamed admin)
RewriteRule ^myadmin\/?$ - [F]
########## BAD BEHAVIOR BLOCK rules to ban exploits
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*iframe.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} ^(.*)cPath=http://(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} ^(.*)/self/(.*)$ [NC,OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule ^(.*)$ bad_conduct/ban.php [L]
RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
RewriteRule .* - [F]
RewriteRule setup\.php$ bad_conduct/ban.php [NC,L]
RewriteRule file_manager\.php$ bad_conduct/ban.php [NC,L]
########### BAD BEHAVIOR BLOCK rules to ban exploits
########### IMPORTANT!
########### Add one blank line at the very end of the .htaccess file
<Files 403.shtml>
order allow,deny
allow from all
</Files>
deny from 173.199.114.163
deny from 204.12.226.2
deny from 208.115.111.68
deny from 5.63.145.73
deny from 69.30.238.26
deny from 65.55.213.72
deny from 82.192.66.250
deny from 180.76.5.136
deny from 86.55.210.53
deny from 212.35.10.79
deny from 218.214.2.6
deny from 83.103.119.239
deny from 188.138.16.60
deny from 94.102.7.154
deny from 193.27.246.178
deny from 195.157.124.186
deny from 87.108.66.195
deny from 79.143.179.17
deny from 94.127.67.120
deny from 82.204.11.106
deny from 195.172.186.81
deny from 195.116.150.20
deny from 195.116.150.20
deny from 195.116.150.20
deny from 195.116.150.20
deny from 195.116.150.20
deny from 178.172.60.6
deny from 180.210.58.51
deny from 117.103.223.26
deny from 118.107.163.182
deny from 218.29.115.152
deny from 168.62.166.202
deny from 218.108.236.107
deny from 122.224.5.122
deny from 168.61.17.198
deny from 180.131.3.12
deny from 122.224.6.43
deny from 123.103.15.55
deny from 219.255.134.96
deny from 50.192.249.201
deny from 46.137.207.128
deny from 121.189.62.84
deny from 223.85.245.54
deny from 31.222.164.138
deny from 50.57.127.55
deny from 211.167.68.21
deny from 147.156.42.201
deny from 60.29.10.18
deny from 46.21.157.143
deny from 60.191.232.53
deny from 208.109.104.119
deny from 60.199.223.196
deny from 62.202.18.26
deny from 99.177.96.73

 

any help would be gratefully appreciated

 

Nb I do have a line at the bottom of the file before all the denied ip's that bad behavour has blocked just doesn't like it in the code snippet, have also been manually adding ips to the list from the raw access log, which is to be fair a waste of time as the change them all the time

 

ken

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

and whilst were on the subject anyone got any experience with applying url below to osc

 

http://perishablepress.com/blackhole-bad-bots/

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

That's just an IP trap package. There is an addon that does the same thing. If you want to install one, the one written for oscommerce would probably be the better choice. I'll be releasing a new addon in a week or two that will allow controlling these bots from admin. It has a trap built into it.

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Posted

@@Jack_mcs

 

Sorry, I got in flunk and didn't read the link till a while after, just seems un-neccessarily complex when compared to the IP Trap add on

 

ken

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

Hi all,

 

Mr and those bots again, seemed to slowed them down, buts about all.

 

Anyway found this code block on hosts knowledge base

 

##begin code
##start blocking potentially unwanted bots.
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
##end code. bai bots.​

 

Actually it had a great long list of bots, which it banned, but the above shows the beginning and end of code.

 

gonna give it a go and see what happens

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

unless the bots where having a night off, that last code block does ban the likes of baiduspider/twengabot etc

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

  • 2 weeks later...
Posted

Small update to the last post, am still not seeing baiduspider or twenga bot, so I guess it works for them, however nothing I seem to do gets rid of AhrefsBot, anyone got any clues

 

regards

 

Ken

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Posted

Time for a rant, I've just been on ahrefs website, they are an SEO organisation, much as I guess many of the other bots are, so the inference is that they are somehow trying to help promote websites.

 

It seems blatantly obvious to me, that anyone who is clever enough to build such a program can have absolutely no doubt about the fact that the program is malicious, they all know full well that it will create bandwith issues - an if they conducted such research in the real world, they would be in big trouble.

 

Lets say a small independant store on a high street has 300 product lines supplied by 3 major wholesalers. The wholesalers decide to check on the store, are the using correct RRP etc, and so on second 1, 3 mystery shoppers appear, then on second 2 3 more, in just less than 2 minutes there are 300 - in the Uk this would be called a public order offence, possibly even a riot !

 

Rant over

 

Ken

Os-commerce v2.3.3

Security Pro v11

Site Monitor

IP Trap

htaccess Protection

Bad Behaviour Block

Year Make Model

Document Manager

X Sell

Star Product

Modular Front Page

Modular Header Tags

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...