Ggrego Posted December 24, 2005 Share Posted December 24, 2005 Within the last three days I have been experiencing an IP address from Japan 133.9.238.91 and 139.9.238.91 adding items to the shopping cart. The items would total as much as $50,000.00 with a total of over 200 items. I said to my self, self this person was buying anything. I presume that someone was sitting behind a desk just clicking on "Buy Now" button. After tracing and tracking down information on the IP addresses it appears that this was a search engine "bot" called "e-SocietyRobot." I have now blocked the IP. It soul purpose is to gather information. Its not crawling your site to put it in its search its just gathering information and for what I do not have a clue. It really steals a super great deal of bandwidth. There are a number of ways to block the IP. I had my webhosting company block it then later I went to the root directory and edited my .htaccess file and blocked it that way. It was much easier to do that. I do not know the purpose of this "bot" other that it was on mysite for days and just robbing me of bandwith. I am merely putting this on the forum for information purpose only and to make others aware. Here is some additonal information on the bot: e-SocietyRobot http://www.yama.info.waseda.ac.jp/~yamana/es/index_eng.htm If anyone knows what this bot is for, does or why let me know. Ggrego.. :-" Link to comment Share on other sites More sharing options...
stevel Posted December 24, 2005 Share Posted December 24, 2005 If you were using Prevent Spider Sessions, then this probably would not have been an issue for you. There are lots of search engine bots out there. It is a good idea to arrange your store to allow these bots to peruse the store but not follow links that require a session. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted December 24, 2005 Share Posted December 24, 2005 does this bot identify itself as a bot? if it uses a standard user agent (like any other random user), wouldn't this problem of it adding things to it's cart still occur? Link to comment Share on other sites More sharing options...
stevel Posted December 24, 2005 Share Posted December 24, 2005 Its user agent string includes the string "robot", which would cause it to be detected by the Prevent Spider Sessions feature, enabled through Admin..Configuration..Sessions. It's also important to keep the includes/spiders.txt list up to date - I maintain one as a contribution. With this option enabled, bots identified as spiders don't get a session, so any attempt to "buy now" brings them to the "cookie usage" page. Note that this tends to not be an issue with "add to cart", since that is a form, and bots don't submit forms. They will however follow Buy Now links, so it's best if you add code to your site to not display these links unless a session has been started. I also recommend not displaying the product list column sort links to bots - this will cut down your bot traffic considerably. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted December 24, 2005 Share Posted December 24, 2005 I also recommend not displaying the product list column sort links to bots - this will cut down your bot traffic considerably. how do you disable this? i've had enigma1 helping me with it, but i spotted yahoo sorting a single item category last week. (i cannot find any links in my website that would link to such a thing) since i did his most recent changes yahoo hasn't tried to hit it yet, so i am not 100% sure on if it works or not. i have spider_flag set to true in application top (the second instance) Link to comment Share on other sites More sharing options...
stevel Posted December 24, 2005 Share Posted December 24, 2005 Easiest way - in includes/functions/general.php, function tep_create_sort_heading, change: if ($sortby) { to if ($sortby && $session_started) { Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Ggrego Posted December 24, 2005 Author Share Posted December 24, 2005 If you were using Prevent Spider Sessions, then this probably would not have been an issue for you. There are lots of search engine bots out there. It is a good idea to arrange your store to allow these bots to peruse the store but not follow links that require a session. I did have the osCommerce in Admin sectons set to prevent spider sessions. It had no effect. Ggrego... Link to comment Share on other sites More sharing options...
Ggrego Posted December 24, 2005 Author Share Posted December 24, 2005 does this bot identify itself as a bot? if it uses a standard user agent (like any other random user), wouldn't this problem of it adding things to it's cart still occur? No, the system did not identify it as a bot. It identified it as a customer. Link to comment Share on other sites More sharing options...
Ggrego Posted December 24, 2005 Author Share Posted December 24, 2005 Its user agent string includes the string "robot", which would cause it to be detected by the Prevent Spider Sessions feature, enabled through Admin..Configuration..Sessions. It's also important to keep the includes/spiders.txt list up to date - I maintain one as a contribution. With this option enabled, bots identified as spiders don't get a session, so any attempt to "buy now" brings them to the "cookie usage" page. Note that this tends to not be an issue with "add to cart", since that is a form, and bots don't submit forms. They will however follow Buy Now links, so it's best if you add code to your site to not display these links unless a session has been started. I also recommend not displaying the product list column sort links to bots - this will cut down your bot traffic considerably. That was the problem I had with this bot. It not obeying the rules of Spider "Bots" putting the e-SocietyRobots in your robots.txt it just ignored it. Link to comment Share on other sites More sharing options...
stevel Posted December 24, 2005 Share Posted December 24, 2005 What is the user agent string you see in the log file? The web reports I have on this robot show that it has the string "robot" in it. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Ggrego Posted December 28, 2005 Author Share Posted December 28, 2005 What is the user agent string you see in the log file? The web reports I have on this robot show that it has the string "robot" in it. Sorry been out of town. If you click on this link "http://www.yama.info.waseda.ac.jp/~yamana/es/index_eng.htm" it will tell you how to not allow the bot to spyder your website. The string that you insert into the robot text files does not work. I believe that it is ingnoring it own rule. The string is: User-agent: e-SocietyRobot Disallow: / In the end I just blocked the IP. Link to comment Share on other sites More sharing options...
stevel Posted December 30, 2005 Share Posted December 30, 2005 Please tell me the user agent string from your log file. I find no references from this robot in my own logs over the past several months. According to the references I can find, this should be "e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)" and this will be detected by the test for "obot". Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Guest Posted April 10, 2006 Share Posted April 10, 2006 once bots have already duplicated your pages by indexing "sorted" pages, is there any way to 404 the sorted pages? i don't make use of sorting ANYWHERE on my site. Link to comment Share on other sites More sharing options...
boxtel Posted April 10, 2006 Share Posted April 10, 2006 once bots have already duplicated your pages by indexing "sorted" pages, is there any way to 404 the sorted pages? i don't make use of sorting ANYWHERE on my site. never use a 404 but a 301 permanent redirect to the same page with the same parameters except the sort parameter. I had to do the same thing with a page parameter on the product pages when I removed the category browser box which was a paged box. So now I have to redirect all product_info pages with a page parameter if it is requested by a spider because that paramter does not do anything anymore and as such the spiders would get 20 pages with a different url but with the same content, nevermind the cache files impact. Treasurer MFC Link to comment Share on other sites More sharing options...
Guest Posted April 11, 2006 Share Posted April 11, 2006 i'm a bit sketchy with 301's after my experience last fall. they caused my entire site to be basically purged from google and be rebuilt from scratch over the next 6 months and i still haven't fully recovered. i much rather take the blunt hit with a 404 and then have yahoo come back later. after all, yahoo never seems to leave my site :D none of the other search engine's seem to have indexed sorted url's, which i find odd Link to comment Share on other sites More sharing options...
boxtel Posted April 11, 2006 Share Posted April 11, 2006 i'm a bit sketchy with 301's after my experience last fall. they caused my entire site to be basically purged from google and be rebuilt from scratch over the next 6 months and i still haven't fully recovered. i much rather take the blunt hit with a 404 and then have yahoo come back later. after all, yahoo never seems to leave my site :D none of the other search engine's seem to have indexed sorted url's, which i find odd as long as you accompany the 301 with a valid new location it should be fine so don't redirect them all to your index. you will see in your server logs that when the spider gets the 301 with the new location it will immediately fetch that new page. I am not sure what they do with 404's. Treasurer MFC Link to comment Share on other sites More sharing options...
Guest Posted April 11, 2006 Share Posted April 11, 2006 i had just always assumed a 404 response basically queues that particular url for deletion from the result index and/or drops the particular rank until deletion, so people are not finding it in searches. google images does not seem to abide by this. i'm getting hits on images i haven't had on my site in 2 years :rolleyes: Link to comment Share on other sites More sharing options...
stevel Posted April 11, 2006 Share Posted April 11, 2006 Yahoo and ask.com keep looking for pages I haven't had for three years. Google picked up on my 301s right away. Steve Contributions: Country-State Selector Login Page a la Amazon Protection of Configuration Updated spiders.txt Embed Links with SID in Description Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.