Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

robots.txt and prevent spider session not working


papillon

Recommended Posts

Posted

Hi, i have just installed Who's Online Enhancement 3.4 and there's something i dont understand... i added my url and sitemap.xml to google two days ago..

- i have in admin->sessions->prevent spider sessions is set to true (so the bots should not create a session)

- i have a robots.txt in domain root with a line Disallow: /cookie_usage.php (so the bots should not go for it)

 

but i just went to admin->who's online and i see this:

 

name - guest

ip address - crawl-66-249-72-137.googlebot.com

last url - cookie_usage.php

session? - Yes

 

ok, so this is supossed to be googlebot, but it seems its creating a session and crawling cookie_usage.php?

 

advise?

Posted

The bot is ending up on cookie_usage.php because it arrived at your site using an old session id it had previously created.

 

Google will soon learn that it cannot create a session id and will then revert to crawling all your site without a session id and without ending up on the cookie_usage.php page.

 

robots.txt files are pretty useless these days, and can be harmful. For instance, if you put the names of folders you don't want genuine bots crawling in that file, such as your oscommerce 'admin' folder then this will tell hacker bots exactly which folders/files it should go for.

 

Vger

Posted
The bot is ending up on cookie_usage.php because it arrived at your site using an old session id it had previously created.

 

Google will soon learn that it cannot create a session id and will then revert to crawling all your site without a session id and without ending up on the cookie_usage.php page.

 

robots.txt files are pretty useless these days, and can be harmful. For instance, if you put the names of folders you don't want genuine bots crawling in that file, such as your oscommerce 'admin' folder then this will tell hacker bots exactly which folders/files it should go for.

 

Vger

thanks!

ok, so then theres nothing i should care?

about the robots.txt files... i was also thinking: as far as i know (please correct me if im wrong, im newbie), the bots are supossed to go to the first page on the domain, and just follow all the links they find and so on. so even if I DONT have a line in robots.txt to block the /admin directory , if i dont have any link to a page on /admin, the bots should never find anything there? or im getting this completely wrong?

 

thanks

Posted
robots.txt files are pretty useless these days, and can be harmful. For instance, if you put the names of folders you don't want genuine bots crawling in that file, such as your oscommerce 'admin' folder then this will tell hacker bots exactly which folders/files it should go for.
The search engines always look for a robots file. If they can't find one, they are redirected to the 404 page. This causes two problems. If it is the server 404 page that gets returned, it is wasting bandwidth. If it is a private 404 page, it is wasting, probably more, bandwidth and the search engines may not be able to handle it properly. So a basic robots file is needed, if for nothing else than to save bandwidth and having your logs filled in with page not found errors. Beyond that, it doesn't matter if admin (or whatever the admin is named) is included in the file. The good bots will just ignore it if it is disallowed while the bad won't be able to get in anyway, due to it being password protected.

 

Jack

Support Links:

For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc.

All of My Addons

Get the latest versions of my addons

Recommended SEO Addons

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...