papillon Posted April 29, 2008 Posted April 29, 2008 Hi, i have just installed Who's Online Enhancement 3.4 and there's something i dont understand... i added my url and sitemap.xml to google two days ago.. - i have in admin->sessions->prevent spider sessions is set to true (so the bots should not create a session) - i have a robots.txt in domain root with a line Disallow: /cookie_usage.php (so the bots should not go for it) but i just went to admin->who's online and i see this: name - guest ip address - crawl-66-249-72-137.googlebot.com last url - cookie_usage.php session? - Yes ok, so this is supossed to be googlebot, but it seems its creating a session and crawling cookie_usage.php? advise?
♥Vger Posted April 29, 2008 Posted April 29, 2008 The bot is ending up on cookie_usage.php because it arrived at your site using an old session id it had previously created. Google will soon learn that it cannot create a session id and will then revert to crawling all your site without a session id and without ending up on the cookie_usage.php page. robots.txt files are pretty useless these days, and can be harmful. For instance, if you put the names of folders you don't want genuine bots crawling in that file, such as your oscommerce 'admin' folder then this will tell hacker bots exactly which folders/files it should go for. Vger
papillon Posted April 30, 2008 Author Posted April 30, 2008 The bot is ending up on cookie_usage.php because it arrived at your site using an old session id it had previously created. Google will soon learn that it cannot create a session id and will then revert to crawling all your site without a session id and without ending up on the cookie_usage.php page. robots.txt files are pretty useless these days, and can be harmful. For instance, if you put the names of folders you don't want genuine bots crawling in that file, such as your oscommerce 'admin' folder then this will tell hacker bots exactly which folders/files it should go for. Vger thanks! ok, so then theres nothing i should care? about the robots.txt files... i was also thinking: as far as i know (please correct me if im wrong, im newbie), the bots are supossed to go to the first page on the domain, and just follow all the links they find and so on. so even if I DONT have a line in robots.txt to block the /admin directory , if i dont have any link to a page on /admin, the bots should never find anything there? or im getting this completely wrong? thanks
Jack_mcs Posted April 30, 2008 Posted April 30, 2008 robots.txt files are pretty useless these days, and can be harmful. For instance, if you put the names of folders you don't want genuine bots crawling in that file, such as your oscommerce 'admin' folder then this will tell hacker bots exactly which folders/files it should go for.The search engines always look for a robots file. If they can't find one, they are redirected to the 404 page. This causes two problems. If it is the server 404 page that gets returned, it is wasting bandwidth. If it is a private 404 page, it is wasting, probably more, bandwidth and the search engines may not be able to handle it properly. So a basic robots file is needed, if for nothing else than to save bandwidth and having your logs filled in with page not found errors. Beyond that, it doesn't matter if admin (or whatever the admin is named) is included in the file. The good bots will just ignore it if it is disallowed while the bad won't be able to get in anyway, due to it being password protected. Jack Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. All of My Addons Get the latest versions of my addons Recommended SEO Addons
Recommended Posts
Archived
This topic is now archived and is closed to further replies.