WiseWombat Posted May 5, 2005 Posted May 5, 2005 Hi over night my site was has hit 3800 times and all showing up in visitors stats :x This is the ip 144.139.9.13 My question , I wont to add this spider to the spider txt but I need to find out the spider name. Could some one verify this as the spider netname: TELSTRAINTERNET32-AU Thanks:Martin ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
boxtel Posted May 5, 2005 Posted May 5, 2005 Hi over night my site was has hit 3800 times and all showing up in visitors stats :x This is the ip 144.139.9.13 My question , I wont to add this spider to the spider txt but I need to find out the spider name. Could some one verify this as the spider netname: TELSTRAINTERNET32-AU Thanks:Martin <{POST_SNAPBACK}> look at the user agent string in you server access logs. Treasurer MFC
WiseWombat Posted May 5, 2005 Author Posted May 5, 2005 look at the user agent string in you server access logs. <{POST_SNAPBACK}> Hi Amanda here is one line from the server log, but I dont understand what you mean by user agent. 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - and it was also bouncing back and forth from product_new.php and products_info.php What I did was deleted the session from the database seemd the fix the problem. Thanks :Martin ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
TCwho Posted May 5, 2005 Posted May 5, 2005 It will look something like this in your server logs: 207.46.98.73 - - [04/May/2005:00:24:11 -0400] "GET /catalog/product_info.php?products_id=163 HTTP/1.0" 200 40100 www.YourWebsiteNameHere.com "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" "-" Drop_Shadow How Did You Hear About Us Email HTML Order Link ---- GMT -5:00
boxtel Posted May 5, 2005 Posted May 5, 2005 Hi Amanda here is one line from the server log, but I dont understand what you mean by user agent.144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - and it was also bouncing back and forth from product_new.php and products_info.php What I did was deleted the session from the database seemd the fix the problem. Thanks :Martin <{POST_SNAPBACK}> believe me, a spider doesn't give a hooch about you deleting session id's. it might have been a spider but if you have no spider user agent string you cannot add it to spiders.txt so block the ip address if you want. Treasurer MFC
TCwho Posted May 5, 2005 Posted May 5, 2005 Oh yeah and another thing it could be, it can technially happen, is someone is using a program to copy your site to their harddrive for later viewing...etc etc....some of those programs can act like robots Drop_Shadow How Did You Hear About Us Email HTML Order Link ---- GMT -5:00
WiseWombat Posted May 6, 2005 Author Posted May 6, 2005 believe me, a spider doesn't give a hooch about you deleting session id's.it might have been a spider but if you have no spider user agent string you cannot add it to spiders.txt so block the ip address if you want. <{POST_SNAPBACK}> Its come back a second time round and looking for the exact same files but know with the word HEAD instead of GET Today its back again searching for the same files but this time the GET has been replaced the HEAD Example. 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "HEAD /catalog/index.php? cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - INSTEAD OF 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - Any ideas why this might be?? I have added 2 more files to robots.txt files. products reviews.php and and shopping_cart.php seems this one reads through the robots.txt file but then looks through the file anyway. I have added this one to the to blocked list spider or no spider . Thanks :Martin ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
boxtel Posted May 6, 2005 Posted May 6, 2005 Its come back a second time round and looking for the exact same files but know with the word HEAD instead of GET Today its back again searching for the same files but this time the GET has been replaced the HEAD Example. 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "HEAD /catalog/index.php? cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - INSTEAD OF 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - Any ideas why this might be?? I have added 2 more files to robots.txt files. products reviews.php and and shopping_cart.php seems this one reads through the robots.txt file but then looks through the file anyway. I have added this one to the to blocked list spider or no spider . Thanks :Martin <{POST_SNAPBACK}> but this one does not appear to have a useragent string at all. what I do sometimes is add this : if ( (tep_not_null($user_agent)) and ($browser_language == '') ) { $spiders = file('z:/spiders/' . 'spiders.txt'); for ($i=0, $n=sizeof($spiders); $i<$n; $i++) { if (tep_not_null($spiders[$i])) { if (stristr(strtolower($user_agent), trim(strtolower($spiders[$i])))) { $spider_flag = true; break; } } } } which reads the spiders.txt file, I added this : if(!$spider_flag) { $spider_ips = array( '209.240.253.203', // spider simulator '66.135.38.137', // spider simulator '61.111.254.59', // Korean no agent no language '66.249.65.77', // IBP? '195.92.95.94', // netcraft '66.249.66.172', // Mediapartners using javascript from adsense ? '64.62.168.25', // gigabot using en language '198.65.147.172', // goforit '61.135.145.212', // baidu spider using language '202.108.250.223', // baidu spider using language '61.135.146.208', // baidu spider using language '66.36.241.140', // NutchCVS/0.06-dev // '220.135.121.91', // me for testing '129.241.104.168', // boitho.com '203.160.252.178' // ? ); if (in_array($browser_ip, $spider_ips)) $spider_flag = true; } this way I can activate the $spider_flag based on ip address even if not covered by the spiders.txt file. Treasurer MFC
WiseWombat Posted May 6, 2005 Author Posted May 6, 2005 but this one does not appear to have a useragent string at all. <{POST_SNAPBACK}> Thanks Amanda but what file do I add it to. I cant understand why this IP Address actualy shows up as a customer in my visitors stats I thought spiders where excluded from showing up in the visitors stats?? >_< ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
user99999999 Posted May 6, 2005 Posted May 6, 2005 I would just block the IP. Add this to your .htaccess file deny from 144.139.9.13 I dont know why anyone would do that as far as DOS attack or hack to your site. The URL there only increases the cart quantity by 1 and sends back a redirect url. The HEAD request basicaly does nothing but return the page time stamp and is usually used by the browser to determine if it should GET the whole page or use the cache. Maybe they want to order 5000 but their 5 key is broken.
WiseWombat Posted May 6, 2005 Author Posted May 6, 2005 I would just block the IP. Add this to your .htaccess file deny from 144.139.9.13 I dont know why anyone would do that as far as DOS attack or hack to your site. The URL there only increases the cart quantity by 1 and sends back a redirect url. The HEAD request basicaly does nothing but return the page time stamp and is usually used by the browser to determine if it should GET the whole page or use the cache. Maybe they want to order 5000 but their 5 key is broken. <{POST_SNAPBACK}> Thanks for that I have already blocked it. :thumbsup: ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
Guest Posted May 6, 2005 Posted May 6, 2005 :thumbsup: Its come back a second time round? and looking for the exact same files but know with the word HEAD instead of GET Today its back again searching for the same files but this time the GET has been replaced the HEAD Example. 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "HEAD /catalog/index.php? cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - INSTEAD OF 144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 - Any ideas why this might be?? I have added 2 more files to robots.txt files. products reviews.php and and shopping_cart.php seems this one reads through the robots.txt file but then looks through the file anyway. I have added this one to the to blocked list spider or no spider . Thanks :Martin <{POST_SNAPBACK}> Thanks for that I have already blocked it. :thumbsup: <{POST_SNAPBACK}> check 2 quote status - sorry for testing ha
user99999999 Posted May 6, 2005 Posted May 6, 2005 I would just block the IP. Add this to your .htaccess file deny from 144.139.9.13 I dont know why anyone would do that as far as DOS attack or hack to your site. The URL there only increases the cart quantity by 1 and sends back a redirect url. The HEAD request basicaly does nothing but return the page time stamp and is usually used by the browser to determine if it should GET the whole page or use the cache. Maybe they want to order 5000 but their 5 key is broken. <{POST_SNAPBACK}> Thanks for that I have already blocked it. :thumbsup: <{POST_SNAPBACK}> :thumbsup: check 2 quote status - sorry for testing ha <{POST_SNAPBACK}> You see people do weird stuff all the time and youl never figure out why!
boxtel Posted May 6, 2005 Posted May 6, 2005 Thanks Amanda but what file do I add it to.I cant understand why this IP Address actualy shows up as a customer in my visitors stats I thought spiders where excluded from showing up in the visitors stats?? >_< <{POST_SNAPBACK}> in application_top.php the $spider_flag is set via the reading of the spiders.txt file. This only if the request has a useragent string because that is what it compares. it does that with this code: if (tep_not_null($user_agent)) { $spiders = file(DIR_WS_INCLUDES . 'spiders.txt'); for ($i=0, $n=sizeof($spiders); $i<$n; $i++) { if (tep_not_null($spiders[$i])) { if (is_integer(strpos($user_agent, trim($spiders[$i])))) { $spider_flag = true; break; } } } } right after that you add the additional code: if(!$spider_flag) { $spider_ips = array( '209.240.253.203', // spider simulator '66.135.38.137', // spider simulator '61.111.254.59', // Korean no agent no language '66.249.65.77', // IBP? '195.92.95.94', // netcraft '66.249.66.172', // Mediapartners using javascript from adsense ? '64.62.168.25', // gigabot using en language '198.65.147.172', // goforit '61.135.145.212', // baidu spider using language '202.108.250.223', // baidu spider using language '61.135.146.208', // baidu spider using language '66.36.241.140', // NutchCVS/0.06-dev // '220.135.121.91', // me for testing '129.241.104.168', // boitho.com '203.160.252.178' // ? ); if (in_array($browser_ip, $spider_ips)) $spider_flag = true; } and yes, spiders are included in the stats, at least in my version so you could add a check on $spider_flag whether to record them or not, I have. Treasurer MFC
boxtel Posted May 6, 2005 Posted May 6, 2005 in application_top.php the $spider_flag is set via the reading of the spiders.txt file.This only if the request has a useragent string because that is what it compares. it does that with this code: if (tep_not_null($user_agent)) { $spiders = file(DIR_WS_INCLUDES . 'spiders.txt'); for ($i=0, $n=sizeof($spiders); $i<$n; $i++) { if (tep_not_null($spiders[$i])) { if (is_integer(strpos($user_agent, trim($spiders[$i])))) { $spider_flag = true; break; } } } } right after that you add the additional code: if(!$spider_flag) { $spider_ips = array( '209.240.253.203', // spider simulator '66.135.38.137', // spider simulator '61.111.254.59', // Korean no agent no language '66.249.65.77', // IBP? '195.92.95.94', // netcraft '66.249.66.172', // Mediapartners using javascript from adsense ? '64.62.168.25', // gigabot using en language '198.65.147.172', // goforit '61.135.145.212', // baidu spider using language '202.108.250.223', // baidu spider using language '61.135.146.208', // baidu spider using language '66.36.241.140', // NutchCVS/0.06-dev // '220.135.121.91', // me for testing '129.241.104.168', // boitho.com '203.160.252.178' // ? ); if (in_array($browser_ip, $spider_ips)) $spider_flag = true; } and yes, spiders are included in the stats, at least in my version so you could add a check on $spider_flag whether to record them or not, I have. <{POST_SNAPBACK}> before I forget, $browser_ip should be set before all this with : $browser_ip = tep_get_ip_address(); Treasurer MFC
boxtel Posted May 6, 2005 Posted May 6, 2005 before I forget, $browser_ip should be set before all this with : $browser_ip = tep_get_ip_address(); <{POST_SNAPBACK}> additional benefit is that you can set yourself to be a spider so you can see what they see for testing. Treasurer MFC
WiseWombat Posted May 6, 2005 Author Posted May 6, 2005 :thumbsup: check 2 quote status - sorry for testing ha <{POST_SNAPBACK}> :thumbsup: And Yes you are blocked 144.139.9.13 <{POST_SNAPBACK}> ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
WiseWombat Posted May 6, 2005 Author Posted May 6, 2005 additional benefit is that you can set yourself to be a spider so you can see what they see for testing. <{POST_SNAPBACK}> Thanks Amanda I will go through and take alook and do some testing of my own. :D ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP
Recommended Posts
Archived
This topic is now archived and is closed to further replies.