Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Spiders and Robots


kalric

Recommended Posts

I've read some posts about spiders and robots creating SID's and somewhere you can see if they are indexing your site.

 

Can someone tell me where I find this info? Our site isn't on any of the search engines yet, but I have the spiders.txt and robot.txt file configured as per the discussions and contributions suggest.

 

I'm assuming that if you have Yahoo and Googlebot listed in the Spiders.txt file that you WANT them to index your site...right? This isn't a file that you list sites you DON'T want to come....right? :-"

 

Any help would be appreciated!

 

Shayne

Link to comment
Share on other sites

Shayne,

 

Here's how it works. If you have "Prevent Spider Sessions" = True in admin->Configuration->Sessions, then osCommerce will look at catalog/include/spiders.txt for Spider/Bot names. If the User Agent of the browser is a match, then that visitor is a Bot and will not be given a Session ID.

 

ed

Link to comment
Share on other sites

robots.txt is generally used to tell spiders what they can and cannot index.

 

spiders.txt is used to identify the crawlers that should have the osCsid suppressed.

 

Bobby

Link to comment
Share on other sites

  • 1 month later...
robots.txt is generally used to tell spiders what they can and cannot index.

 

Chemo & All,

I have renamed and hidden my admin directory AND this directory is also password protected. I would still like to "Disallow" this directory from being spider'ed...using an entry in the robots.txt

 

However, robots.txt appears to be readily accessible for anyone to view. i.e. http://www.microsoft.com/robots.txt

 

Since we have all gone through the act of hiding our admin tools, wouldn't listing it in robots.txt kinda defeat the purpose? Maybe I am being over paranoid since it is password protected anyways.

 

QUESTION: If you password protect a directory, can a spider/bot still access/index that directory? Is it neccessary to "Disallow" a directory that is password protected?

 

Thanks in advance!!!

 

Regards,

T-DOGG

Link to comment
Share on other sites

No, remove the admin entry from your robots.txt file, and any other directories you don't want people to know about.

Bots can't get into pwd protected areas.

To keep people out of directories without an index file, add

 

Options -indexes

to your .htaccess file.

 

Also, add

 

<Files .htaccess>

order allow,deny

deny from all

</Files>

 

to your .htaccess file to prevent your actual .htaccess file being viewed by anyone.

Link to comment
Share on other sites

Chemo & All,

  I have renamed and hidden my admin directory AND this directory is also password protected.  I would still like to "Disallow" this directory from being spider'ed...using an entry in the robots.txt

 

  However, robots.txt appears to be readily accessible for anyone to view.  i.e.  http://www.microsoft.com/robots.txt 

 

  Since we have all gone through the act of hiding our admin tools, wouldn't listing it in robots.txt kinda defeat the purpose?  Maybe I am being over paranoid since it is password protected anyways.

 

  QUESTION:  If you password protect a directory, can a spider/bot still access/index that directory?  Is it neccessary to "Disallow" a directory that is password protected?

 

  Thanks in advance!!!

 

Regards,

T-DOGG

The robots file is only a guide that the bots are supposed to follow. The major ones do (google, yahoo, msn) but many may not. You shouldn't have admin listed in the robots file but even if you do, the bots can't get past a login prompt so even the ones that misbehave won't get in.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...