Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Updated spiders.txt Official Support Topic


stevel

Recommended Posts

Correct. This contribution is simply an update for the spiders.txt file that is included in the osC distribution. It adds new spiders and is optimized. No code is changed.

Link to comment
Share on other sites

hi Steve,

Thanks for the quick response. I just installed your contribution a few minutes ago, by copying the spiders.txt file into my /catalog/includes DIR, and turning "prevent spider sessions" to TRUE. Is there any way that I can test to see that it is working OK? I don't want to "scare" away the spiders!

thanks,

Ray

Link to comment
Share on other sites

Got Firefox? Install the "User Agent Switcher" extension and set the useragent to "Googlebot". That's how I test it. Be sure you close your browser session and reopen it to clear session cookies. Then try adding something to your cart.

 

Note that this does not prevent spiders from indexing your store. All it does is keep them from obtaining sessions.

Link to comment
Share on other sites

Got Firefox? Install the "User Agent Switcher" extension and set the useragent to "Googlebot". That's how I test it. Be sure you close your browser session and reopen it to clear session cookies. Then try adding something to your cart.

 

Note that this does not prevent spiders from indexing your store. All it does is keep them from obtaining sessions.

Steve,

I actually use IE on both my PC's. Is there any other way to test?

One thing I just noticed while looking at my "Who's online, is that there are four "Mozilla" bots from the same IP checking out various links. Three of them have a "yes" under the session column and are checking out products, while the fourth has a "no". The one with the no is viewing "/catalog/cookie_usage.php.

Is Who's online an accurate way to guage whether this is working OK or not?

thanks.

Ray

www.specopstactical.com

Link to comment
Share on other sites

"Mozilla" is not a bot. Actually, if you see Mozilla there, you have no idea what it is, since just about every browser includes "Mozilla" in its UA.

 

I tried your store and the Prevent Spider Sessions is working fine.

Link to comment
Share on other sites

"Mozilla" is not a bot. Actually, if you see Mozilla there, you have no idea what it is, since just about every browser includes "Mozilla" in its UA.

 

I tried your store and the Prevent Spider Sessions is working fine.

Steve,

Thanks a million! I'm glad to hear that the mod is working fine. Now off to further OSc refinements!

-Ray

Link to comment
Share on other sites

Hi!

 

I like your list! :)

 

Could you please add the following:

  • findlinks/1.1-a8 (+http://wortschatz.uni-leipzig.de/findlinks/) also known as findlinks/1.1.1-a1 (+http://wortschatz.uni-leipzig.de/findlinks/)
  • ilse

Thanks for the great work!

 

- Jasper

Link to comment
Share on other sites

Please post the complete user agent string as found in your access log. If "ilse" is the one I'm thinking of, it should already be covered by "crawl". findlinks is already there.

Edited by stevel
Link to comment
Share on other sites

I don't know "OSCMAX". There isn't anything else in a standard osC store to change. What do you mean by "can't see any bots"?

 

You can add debug code to application_top.php to see if you can find out why one store is misbehaving.

 

Note that, unless you're on a Windows host, the case of the filename is important - it is looking for spiders.txt not SPIDERS.TXT.

Link to comment
Share on other sites

I don't know "OSCMAX". There isn't anything else in a standard osC store to change. What do you mean by "can't see any bots"?

 

I check via "Who's On-Line" and one store shows bots, along with their names in red. The other store, never shows any bots. Always guests. The stores are identical (well, obviously there's something different). I also have a straight OSC MS 2.2 store, which works fine as well. I checked CHMOD settings, and all are identical as well. Hmmmm.

 

You can add debug code to application_top.php to see if you can find out why one store is misbehaving.

 

Note that, unless you're on a Windows host, the case of the filename is important - it is looking for spiders.txt not SPIDERS.TXT.

 

I understand the caps - it is lower case on the server, just wanted to emphasize it in the post.

John Skurka

Link to comment
Share on other sites

Maybe there are no bots visiting the other store? If instead what you see is that there are visitors that are clearly bots but that have sessions, you have some further analysis to do to find out why. If you'll give me the URL of the store that is a problem, I can check to see if spiders get sessions.

Link to comment
Share on other sites

If you'll give me the URL of the store that is a problem, I can check to see if spiders get sessions.

 

The store that works: www.atoolcrib.com

 

The store that doesn't: www.vehitronix.com

 

I know some of the "visitors" to Vehitronix are bots, based on the IP address of the visitor as reported in the Who's Online contrib.

John Skurka

Link to comment
Share on other sites

I tried your site with my user agent set to "Googlebot" and I did not get a session. So whatever issue you have with the "Who's Online" feature, it isn't related to use of spiders.txt.

Link to comment
Share on other sites

So whatever issue you have with the "Who's Online" feature, it isn't related to use of spiders.txt.

 

You got it - I overwrote the "WOL" Code with the most current version and everything is working correctly now! Thanks for your help in debugging this.

John Skurka

Link to comment
Share on other sites

There would certainly be a problem with the 3/31 file but there shouldn't be with the newer ones. Please make sure that your spiders.txt does NOT contain the line:

 

ox/

Link to comment
Share on other sites

  • 4 weeks later...

I have updated to the latest spiders.txt but have a spider 64.124.140.15x that is making 4 - 5 connections at a time, 24 hours a day for the last few days and is loading up the cart with each connection from what i can see in my Whos Online. Is there anything i can do about this ?

Link to comment
Share on other sites

I am hoping this is correct, I have never done anything with my access logs before. I downloaded the access log file and found the correct ips and this is what it says

Is this the info you were asking for ?

 

 

64.124.140.150 - - [17/May/2006:10:39:40 -0500] "GET /product_info.php?products_id=6129 HTTP/1.1" 200 50494 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.151 - - [17/May/2006:03:40:39 -0500] "GET /product_info.php?products_id=3258 HTTP/1.1" 200 45109 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.152 - - [17/May/2006:08:19:10 -0500] "GET /product_info.php?products_id=4023 HTTP/1.1" 200 27728 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.153 - - [17/May/2006:08:10:01 -0500] "GET /product_info.php?products_id=4311 HTTP/1.1" 200 28744 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.154 - - [17/May/2006:08:25:39 -0500] "GET /product_info.php?products_id=4309 HTTP/1.1" 200 28188 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.176 - - [17/May/2006:14:37:50 -0500] "GET /index.php?cPath=63 HTTP/1.1" 200 41878 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.177 - - [17/May/2006:14:43:57 -0500] "GET /privacy.php HTTP/1.1" 200 23373 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.178 - - [17/May/2006:14:53:14 -0500] "GET /index.php?cPath=232 HTTP/1.1" 200 41679 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.180 - - [17/May/2006:14:36:48 -0500] "GET /index.php?cPath=231 HTTP/1.1" 200 43138 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

 

64.124.140.181 - - [17/May/2006:17:37:35 -0500] "GET /index.php?cPath=222 HTTP/1.1" 200 41866 "-" "Mozilla/5.0 (compatible; FatBot 2.0; www.FatLens.com)"

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...