Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

This blows my mind


bobg7

Recommended Posts

Posted

I just got a bandwidth warning sayin I have used up 86.05% of my available bandwidth, thanks to my host for the heads up.

 

I pulled my log's and noticed that on Sept 22, 05 I went from an average of about 8 MB per day to 1.04 gig.

 

The only thing I have changed in the last 48 hours is to get listed on Google Adwords, would that explain the sudden jump in bandwidth or is something else going on I cant see.

 

I have already opened a support ticket with my host to see if they can see something on there side, hope to hear back one way or the other.

 

Also, I just checked the 'Whos Online' and noticed 68.142.230.187, a tracert shows it to be the yahoo crawler. They have been on for a little over 3 hours, have between 1 to 24 each of my 450 plus items in the shopping cart with a subtotal of $65,252.52 racked up.

 

Hope they complete there order and pay up - :lol:

 

Has anyone else seen such a jump in bandwidth? Could Google Adwords be causing it?

 

Thanks in advance,

Installed Contributions: CCGV, Close Popup, Dynamic Meta Tags, Easy Populate, Froogle Data Feeder, Google Position, Infobox Header Entire Row, Live Support for OSC, PayPal Seal with CC images, Report_m Sales, Shop by Price Revised, SQL Updater, Who's Online Enhancement, Footer, GNA EP Assistant and still going.

Posted

:lol: i see this problem a lot on invisionboard sites too, the best trick (not foolproof) is to use metatags to keep them out of useless sections of the site (such as shopping cart, member panel, etc) they may be hitting on a bunch of different links leading to places you need to log in and it may be looping somewhere? ..i suppose it depends where people post your stuff on the web (maybe somebody linked directly to a member only page) somewhere or something.

 

googlebot has been known to do this a few times too

 

i've never had a problem with any search bots raping my site, but i've seen many people complain about it

Posted
:lol: i see this problem a lot on invisionboard sites too, the best trick (not foolproof) is to use metatags to keep them out of useless sections of the site (such as shopping cart, member panel, etc) they may be hitting on a bunch of different links leading to places you need to log in and it may be looping somewhere? ..i suppose it depends where people post your stuff on the web (maybe somebody linked directly to a member only page) somewhere or something.

 

googlebot has been known to do this a few times too

 

i've never had a problem with any search bots raping my site, but i've seen many people complain about it

 

Hello,

 

Another way to tell spiders what NOT to crawl without using meta-tags (which Google claims their bot doesn't bother to read), is to set a robots.txt file on your root directory with something like this in it (list of every folder you don't want them messin' around in):

 

User-agent: *

Disallow: /conf/

Disallow: /cp/

Disallow: /banner/

Disallow: /banner2/

Disallow: /gallery/

Disallow: /fonts/

Disallow: /images/

Disallow: /lang/

Disallow: /logo/

Disallow: /offer/

Disallow: /upload/

Disallow: /template/

 

 

As I understand it, search engine bots will always look to see if a robots.txt file is in the root when they arrive at your site to crawl.

 

--OSCnewbie.

Posted

if you're using osc out of the box, that most likely won't work, as from what i understand robots.txt only works with folders, not pages.

 

so you can't tell spiders not to touch checkout.php or anything. i may be wrong though, haven't checked up on it lately :)

Posted
if you're using osc out of the box, that most likely won't work, as from what i understand robots.txt only works with folders, not pages.

 

that would be interesting, and I think that there is nor problem at all for osc to interfer

 

search the board for robots and spiders, ther are even contribs that contain consistantly updated files, thx to there 'providers'

 

but I am always willing to learn, so please correct me if I am wrong

 

dahui

Posted

i have been proven wrong:

Disallow:

 

    The second part of a record consists of Disallow: directive lines. These lines specify files and/or directories. For example, the following line instructs spiders that it can not download email.htm:

 

Disallow: email.htm

 

    You may also specify directories:

 

Disallow: /cgi-bin/

 

    Which would block spiders from your cgi-bin directory.

 

    There is a wildcard nature to the Disallow directive. The standard dictates that /bob would disallow /bob.html and /bob/indes.html (both the file bob and files in the bob directory will not be indexed).

 

    If you leave the Disallow line blank, it indicates that ALL files may be retrieved. At least one disallow line must be present for each User-agent directive to be correct. A completely empty Robots.txt file is the same as if it were not present.

 

my bad :)

but this is very useful information, as i have always been under the assumption it does not work with directories... i will be updating my robots.txt shortly :D

Posted
i have been proven wrong:

my bad :)

but this is very useful information, as i have always been under the assumption it does not work with directories... i will be updating my robots.txt shortly :D

 

I hadve been recently crawling this topic here on board and was convinced with a propper spiders and robots file and some modifications on contact us form or tell a friend (search board for vulnarabilities of thos forms) that I would be quite safe now, so I was a bit bothered ;)

 

dahui

Posted

sorry about that :D i seem to have outdated info in my brain, i've been visiting the wrong forums lately :D

i haven't hit the topics about the contact_us.php yet, though i've notice they're quite popular.. i'll have to visit them to get my site up to par :)

Posted

I'm almost sure it's a yahoo bot thats killing me. I sent a note to my host and requested they bounce the apache server and try to kick them out but being a shared server, they may not do it.

Installed Contributions: CCGV, Close Popup, Dynamic Meta Tags, Easy Populate, Froogle Data Feeder, Google Position, Infobox Header Entire Row, Live Support for OSC, PayPal Seal with CC images, Report_m Sales, Shop by Price Revised, SQL Updater, Who's Online Enhancement, Footer, GNA EP Assistant and still going.

Posted

Well, here is an update. my site was shutdown for 18 hours when the bandwidth quota was exceded >_<

 

I continued to work with my host and they added an additional 2 gig of bandwidth to get it going again. We both were hoping that with the connection lost the yahoo bot would go away.

 

It didn't, so far it's racked well over $100k is product sales and sucking up bandwidth like a leach.

 

I added this to my robots.txt:

 

User-agent: eXavaBot, yahoobot, yahoo

Disallow: /catalog/

 

But it's still there. Is there any way I can kick them out before I loose anymore bandwidth?

Installed Contributions: CCGV, Close Popup, Dynamic Meta Tags, Easy Populate, Froogle Data Feeder, Google Position, Infobox Header Entire Row, Live Support for OSC, PayPal Seal with CC images, Report_m Sales, Shop by Price Revised, SQL Updater, Who's Online Enhancement, Footer, GNA EP Assistant and still going.

Posted

put a redirection checking for the particular ip and redirect him to the major search engines with your shop name posted as a search query.

 

See how the spider bait contribution works to do this.

 

That should give you more web references :lol:

Posted

Your robots.txt file is completely wrong. Those entries belong in the spiders.txt file.

 

Have you actually researched robots.txt and spiders.txt files and how they work, on this forum?

Posted
put a redirection checking for the particular ip and redirect him to the major search engines with your shop name posted as a search query.

 

See how the spider bait contribution works to do this.

 

That should give you more web references :lol:

 

Intresting, I did a little searching on how to do that but came up blank.

 

Do you have an example of the code for this?

Installed Contributions: CCGV, Close Popup, Dynamic Meta Tags, Easy Populate, Froogle Data Feeder, Google Position, Infobox Header Entire Row, Live Support for OSC, PayPal Seal with CC images, Report_m Sales, Shop by Price Revised, SQL Updater, Who's Online Enhancement, Footer, GNA EP Assistant and still going.

Posted

Yahoo and Inktomi spiders are a particular problem, because adding Yahoo or Inktomi to the spiders.txt file will not stop them from generating session ids. The only way around this problem (and it is quite simple to do) is to add the ip address of the bad spider to the includes/spiders.txt file. This will stop Yahoo and Inktomi (which Yahoo now owns) from generating session ids.

 

No need to block ip addresses or redirect them or do anything via .htaccess.

 

Vger

Posted
Yahoo and Inktomi spiders are a particular problem, because adding Yahoo or Inktomi to the spiders.txt file will not stop them from generating session ids.  The only way around this problem (and it is quite simple to do) is to add the ip address of the bad spider to the includes/spiders.txt file.  This will stop Yahoo and Inktomi (which Yahoo now owns) from generating session ids.

 

No need to block ip addresses or redirect them or do anything via .htaccess.

 

Vger

 

Came across this,

 

Would it mess with them if I added it to my index.php?

 

<?php
if($_SERVER['REMOTE_ADDR']=="68.142.230.187"){

echo "<script>location.href=./404.html</script>";

}
?>

 

PS, I added the ip address to the includes/spider.txt as you suggested.

 

Thanks,

Installed Contributions: CCGV, Close Popup, Dynamic Meta Tags, Easy Populate, Froogle Data Feeder, Google Position, Infobox Header Entire Row, Live Support for OSC, PayPal Seal with CC images, Report_m Sales, Shop by Price Revised, SQL Updater, Who's Online Enhancement, Footer, GNA EP Assistant and still going.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...