Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

e-SocietyRobot


Ggrego

Recommended Posts

Within the last three days I have been experiencing an IP address from Japan 133.9.238.91 and 139.9.238.91 adding items to the shopping cart. The items would total as much as $50,000.00 with a total of over 200 items. I said to my self, self this person was buying anything. I presume that someone was sitting behind a desk just clicking on "Buy Now" button. After tracing and tracking down information on the IP addresses it appears that this was a search engine "bot" called "e-SocietyRobot." I have now blocked the IP. It soul purpose is to gather information. Its not crawling your site to put it in its search its just gathering information and for what I do not have a clue. It really steals a super great deal of bandwidth. There are a number of ways to block the IP. I had my webhosting company block it then later I went to the root directory and edited my .htaccess file and blocked it that way. It was much easier to do that. I do not know the purpose of this "bot" other that it was on mysite for days and just robbing me of bandwith. I am merely putting this on the forum for information purpose only and to make others aware. Here is some additonal information on the bot:

 

e-SocietyRobot

http://www.yama.info.waseda.ac.jp/~yamana/es/index_eng.htm

 

If anyone knows what this bot is for, does or why let me know.

Ggrego.. :-"

Link to comment
Share on other sites

If you were using Prevent Spider Sessions, then this probably would not have been an issue for you. There are lots of search engine bots out there. It is a good idea to arrange your store to allow these bots to peruse the store but not follow links that require a session.

Link to comment
Share on other sites

does this bot identify itself as a bot?

 

if it uses a standard user agent (like any other random user), wouldn't this problem of it adding things to it's cart still occur?

Link to comment
Share on other sites

Its user agent string includes the string "robot", which would cause it to be detected by the Prevent Spider Sessions feature, enabled through Admin..Configuration..Sessions. It's also important to keep the includes/spiders.txt list up to date - I maintain one as a contribution.

 

With this option enabled, bots identified as spiders don't get a session, so any attempt to "buy now" brings them to the "cookie usage" page. Note that this tends to not be an issue with "add to cart", since that is a form, and bots don't submit forms. They will however follow Buy Now links, so it's best if you add code to your site to not display these links unless a session has been started. I also recommend not displaying the product list column sort links to bots - this will cut down your bot traffic considerably.

Link to comment
Share on other sites

I also recommend not displaying the product list column sort links to bots - this will cut down your bot traffic considerably.

how do you disable this? i've had enigma1 helping me with it, but i spotted yahoo sorting a single item category last week. (i cannot find any links in my website that would link to such a thing)

 

since i did his most recent changes yahoo hasn't tried to hit it yet, so i am not 100% sure on if it works or not.

 

i have spider_flag set to true in application top (the second instance)

Link to comment
Share on other sites

If you were using Prevent Spider Sessions, then this probably would not have been an issue for you. There are lots of search engine bots out there. It is a good idea to arrange your store to allow these bots to peruse the store but not follow links that require a session.

I did have the osCommerce in Admin sectons set to prevent spider sessions. It had no effect.

Ggrego...

Link to comment
Share on other sites

does this bot identify itself as a bot?

 

if it uses a standard user agent (like any other random user), wouldn't this problem of it adding things to it's cart still occur?

No, the system did not identify it as a bot. It identified it as a customer.

Link to comment
Share on other sites

Its user agent string includes the string "robot", which would cause it to be detected by the Prevent Spider Sessions feature, enabled through Admin..Configuration..Sessions. It's also important to keep the includes/spiders.txt list up to date - I maintain one as a contribution.

 

With this option enabled, bots identified as spiders don't get a session, so any attempt to "buy now" brings them to the "cookie usage" page. Note that this tends to not be an issue with "add to cart", since that is a form, and bots don't submit forms. They will however follow Buy Now links, so it's best if you add code to your site to not display these links unless a session has been started. I also recommend not displaying the product list column sort links to bots - this will cut down your bot traffic considerably.

That was the problem I had with this bot. It not obeying the rules of Spider "Bots" putting the e-SocietyRobots in your robots.txt it just ignored it.

Link to comment
Share on other sites

What is the user agent string you see in the log file? The web reports I have on this robot show that it has the string "robot" in it.

Sorry been out of town. If you click on this link "http://www.yama.info.waseda.ac.jp/~yamana/es/index_eng.htm" it will tell you how to not allow the bot

to spyder your website. The string that you insert into the robot text files does not work. I

believe that it is ingnoring it own rule. The string is:

 

User-agent: e-SocietyRobot

Disallow: /

 

In the end I just blocked the IP.

Link to comment
Share on other sites

Please tell me the user agent string from your log file. I find no references from this robot in my own logs over the past several months. According to the references I can find, this should be "e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)" and this will be detected by the test for "obot".

Link to comment
Share on other sites

  • 3 months later...

once bots have already duplicated your pages by indexing "sorted" pages, is there any way to 404 the sorted pages?

 

i don't make use of sorting ANYWHERE on my site.

Link to comment
Share on other sites

once bots have already duplicated your pages by indexing "sorted" pages, is there any way to 404 the sorted pages?

 

i don't make use of sorting ANYWHERE on my site.

 

 

never use a 404 but a 301 permanent redirect to the same page with the same parameters except the sort parameter.

 

I had to do the same thing with a page parameter on the product pages when I removed the category browser box which was a paged box. So now I have to redirect all product_info pages with a page parameter if it is requested by a spider because that paramter does not do anything anymore and as such the spiders would get 20 pages with a different url but with the same content, nevermind the cache files impact.

Treasurer MFC

Link to comment
Share on other sites

i'm a bit sketchy with 301's after my experience last fall. they caused my entire site to be basically purged from google and be rebuilt from scratch over the next 6 months and i still haven't fully recovered.

 

i much rather take the blunt hit with a 404 and then have yahoo come back later. after all, yahoo never seems to leave my site :D

 

none of the other search engine's seem to have indexed sorted url's, which i find odd

Link to comment
Share on other sites

i'm a bit sketchy with 301's after my experience last fall. they caused my entire site to be basically purged from google and be rebuilt from scratch over the next 6 months and i still haven't fully recovered.

 

i much rather take the blunt hit with a 404 and then have yahoo come back later. after all, yahoo never seems to leave my site :D

 

none of the other search engine's seem to have indexed sorted url's, which i find odd

 

as long as you accompany the 301 with a valid new location it should be fine so don't redirect them all to your index.

you will see in your server logs that when the spider gets the 301 with the new location it will immediately fetch that new page. I am not sure what they do with 404's.

Treasurer MFC

Link to comment
Share on other sites

i had just always assumed a 404 response basically queues that particular url for deletion from the result index and/or drops the particular rank until deletion, so people are not finding it in searches. google images does not seem to abide by this. i'm getting hits on images i haven't had on my site in 2 years :rolleyes:

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...