Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Spider verification required


WiseWombat

Recommended Posts

Posted

Hi over night my site was has hit 3800 times and all showing up in visitors stats :x

This is the ip 144.139.9.13

My question , I wont to add this spider to the spider txt but I need to find out the spider name.

Could some one verify this as the spider

netname: TELSTRAINTERNET32-AU

Thanks:Martin

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Posted
Hi over night my site was has hit 3800 times and all showing up in visitors stats  :x

This is the ip 144.139.9.13

My question , I wont to add this spider to the spider txt but I need to find out the spider name.

Could some one verify this as the spider

netname:      TELSTRAINTERNET32-AU

Thanks:Martin

 

 

look at the user agent string in you server access logs.

Treasurer MFC

Posted
look at the user agent string in you server access logs.

Hi Amanda here is one line from the server log, but I dont understand what you mean by user agent.

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

and it was also bouncing back and forth from product_new.php and products_info.php

What I did was deleted the session from the database seemd the fix the problem.

Thanks :Martin

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Posted
Hi Amanda here is one line from the server  log, but  I dont understand what you mean by user agent.

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

and it was also bouncing back and forth from product_new.php and products_info.php

What I did was deleted the session from the database  seemd the fix the problem.

Thanks :Martin

 

believe me, a spider doesn't give a hooch about you deleting session id's.

it might have been a spider but if you have no spider user agent string you cannot add it to spiders.txt so block the ip address if you want.

Treasurer MFC

Posted
believe me, a spider doesn't give a hooch about you deleting session id's.

it might have been a spider but if you have no spider user agent string you cannot add it to spiders.txt so block the ip address if you want.

Its come back a second time round and looking for the exact same files but know with the word HEAD instead of GET

 

Today its back again searching for the same files but this time the GET has been replaced the HEAD Example.

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "HEAD /catalog/index.php?

cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

INSTEAD OF

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

Any ideas why this might be??

 

I have added 2 more files to robots.txt files.

products reviews.php and and shopping_cart.php seems this one reads through the robots.txt file but then looks through the file anyway.

 

I have added this one to the to blocked list spider or no spider .

Thanks :Martin

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Posted
Its come back a second time round  and looking for the exact same files but know with the word HEAD instead of GET

 

Today its back again searching for the same files but this time the GET has been replaced the HEAD Example.

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "HEAD /catalog/index.php?

cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

INSTEAD OF

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

Any ideas why this might be??

 

I have added 2 more files to robots.txt files.

products reviews.php and and shopping_cart.php seems this one reads through the robots.txt file but then looks through the file anyway.

 

I have added this one to the to blocked list spider or no spider .

Thanks :Martin

 

but this one does not appear to have a useragent string at all.

 

what I do sometimes is add this :

 

if (

(tep_not_null($user_agent))

and ($browser_language == '')

) {

$spiders = file('z:/spiders/' . 'spiders.txt');

for ($i=0, $n=sizeof($spiders); $i<$n; $i++) {

if (tep_not_null($spiders[$i])) {

if (stristr(strtolower($user_agent), trim(strtolower($spiders[$i])))) {

$spider_flag = true;

break;

}

}

}

}

 

which reads the spiders.txt file, I added this :

 

if(!$spider_flag) {

$spider_ips = array(

'209.240.253.203', // spider simulator

'66.135.38.137', // spider simulator

'61.111.254.59', // Korean no agent no language

'66.249.65.77', // IBP?

'195.92.95.94', // netcraft

'66.249.66.172', // Mediapartners using javascript from adsense ?

'64.62.168.25', // gigabot using en language

'198.65.147.172', // goforit

'61.135.145.212', // baidu spider using language

'202.108.250.223', // baidu spider using language

'61.135.146.208', // baidu spider using language

'66.36.241.140', // NutchCVS/0.06-dev

// '220.135.121.91', // me for testing

'129.241.104.168', // boitho.com

'203.160.252.178' // ?

);

if (in_array($browser_ip, $spider_ips)) $spider_flag = true;

}

 

this way I can activate the $spider_flag based on ip address even if not covered by the spiders.txt file.

Treasurer MFC

Posted
but this one does not appear to have a useragent string at all.

Thanks Amanda but what file do I add it to.

I cant understand why this IP Address actualy shows up as a customer in my visitors stats I thought spiders where excluded from showing up in the visitors stats?? >_<

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Posted

I would just block the IP. Add this to your .htaccess file

 

deny from 144.139.9.13

 

I dont know why anyone would do that as far as DOS attack or hack to your site. The URL there only increases the cart quantity by 1 and sends back a redirect url.

 

The HEAD request basicaly does nothing but return the page time stamp and is usually used by the browser to determine if it should GET the whole page or use the cache.

 

Maybe they want to order 5000 but their 5 key is broken.

Posted
I would just block the IP. Add this to your .htaccess file

 

deny from 144.139.9.13

 

I dont know why anyone would do that as far as DOS attack or hack to your site. The URL there only increases the cart quantity by 1 and sends back a redirect url.

 

The HEAD request basicaly does nothing but return the page time stamp and is usually used by the browser to determine if it should GET the whole page or use the cache.

 

Maybe they want to order 5000 but their 5 key is broken.

Thanks for that I have already blocked it. :thumbsup:

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Posted

:thumbsup:

Its come back a second time round? and looking for the exact same files but know with the word HEAD instead of GET

 

Today its back again searching for the same files but this time the GET has been replaced the HEAD Example.

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "HEAD /catalog/index.php?

cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

INSTEAD OF

144.139.9.13 - - [05/May/2005:11:29:25 +1000] "GET /catalog/index.php?cPath=72&page=1&sort=2a&language=en&action=buy_now&products_id=363&osCsid=7a37851ea22e4b8bf70c68805d22cf12 HTTP/1.1" 302 -

Any ideas why this might be??

 

I have added 2 more files to robots.txt files.

products reviews.php and and shopping_cart.php seems this one reads through the robots.txt file but then looks through the file anyway.

 

I have added this one to the to blocked list spider or no spider .

Thanks :Martin

 

 

Thanks for that I have already blocked it. :thumbsup:

 

 

check 2 quote status - sorry for testing ha

Posted
I would just block the IP. Add this to your .htaccess file

 

deny from 144.139.9.13

 

I dont know why anyone would do that as far as DOS attack or hack to your site. The URL there only increases the cart quantity by 1 and sends back a redirect url.

 

The HEAD request basicaly does nothing but return the page time stamp and is usually used by the browser to determine if it should GET the whole page or use the cache.

 

Maybe they want to order 5000 but their 5 key is broken.

 

 

Thanks for that I have already blocked it. :thumbsup:

 

 

:thumbsup:

check 2 quote status - sorry for testing ha

 

You see people do weird stuff all the time and youl never figure out why!

Posted
Thanks Amanda but what file do I add it to.

I cant understand why this IP Address actualy shows up as a customer in my visitors stats I thought spiders where excluded from showing up in the visitors stats??  >_<

 

in application_top.php the $spider_flag is set via the reading of the spiders.txt file.

This only if the request has a useragent string because that is what it compares.

 

it does that with this code:

 

if (tep_not_null($user_agent)) {

$spiders = file(DIR_WS_INCLUDES . 'spiders.txt');

 

for ($i=0, $n=sizeof($spiders); $i<$n; $i++) {

if (tep_not_null($spiders[$i])) {

if (is_integer(strpos($user_agent, trim($spiders[$i])))) {

$spider_flag = true;

break;

}

}

}

}

 

right after that you add the additional code:

 

if(!$spider_flag) {

$spider_ips = array(

'209.240.253.203', // spider simulator

'66.135.38.137', // spider simulator

'61.111.254.59', // Korean no agent no language

'66.249.65.77', // IBP?

'195.92.95.94', // netcraft

'66.249.66.172', // Mediapartners using javascript from adsense ?

'64.62.168.25', // gigabot using en language

'198.65.147.172', // goforit

'61.135.145.212', // baidu spider using language

'202.108.250.223', // baidu spider using language

'61.135.146.208', // baidu spider using language

'66.36.241.140', // NutchCVS/0.06-dev

// '220.135.121.91', // me for testing

'129.241.104.168', // boitho.com

'203.160.252.178' // ?

);

if (in_array($browser_ip, $spider_ips)) $spider_flag = true;

}

 

 

and yes, spiders are included in the stats, at least in my version so you could add a check on $spider_flag whether to record them or not, I have.

Treasurer MFC

Posted
in application_top.php the $spider_flag is set via the reading of the spiders.txt file.

This only if the request has a useragent string because that is what it compares.

 

it does that with this code:

 

    if (tep_not_null($user_agent)) {

      $spiders = file(DIR_WS_INCLUDES . 'spiders.txt');

 

      for ($i=0, $n=sizeof($spiders); $i<$n; $i++) {

        if (tep_not_null($spiders[$i])) {

          if (is_integer(strpos($user_agent, trim($spiders[$i])))) {

            $spider_flag = true;

            break;

          }

        }

      }

    }

 

right after that you add the additional code:

 

if(!$spider_flag) {

$spider_ips = array(

'209.240.253.203', // spider simulator

'66.135.38.137', // spider simulator

'61.111.254.59', // Korean no agent no language

'66.249.65.77', // IBP?

'195.92.95.94', // netcraft

'66.249.66.172', // Mediapartners using javascript from adsense ?

'64.62.168.25', // gigabot using en language

'198.65.147.172', // goforit

'61.135.145.212', // baidu spider using language

'202.108.250.223', // baidu spider using language

'61.135.146.208', // baidu spider using language

'66.36.241.140', // NutchCVS/0.06-dev

// '220.135.121.91', // me for testing

'129.241.104.168', // boitho.com

'203.160.252.178' // ?

);

if (in_array($browser_ip, $spider_ips)) $spider_flag = true;

}

and yes, spiders are included in the stats, at least in my version so you could add a check on $spider_flag whether to record them or not, I have.

 

before I forget, $browser_ip should be set before all this with :

 

$browser_ip = tep_get_ip_address();

Treasurer MFC

Posted
before I forget, $browser_ip should be set before all this with :

 

$browser_ip = tep_get_ip_address();

 

 

additional benefit is that you can set yourself to be a spider so you can see what they see for testing.

Treasurer MFC

Posted
additional benefit is that you can set yourself to be a spider so you can see what they see for testing.

Thanks Amanda I will go through and take alook and do some testing of my own.

:D

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...