kitchenniche Posted April 18, 2005 Share Posted April 18, 2005 Hi there, Are there some errors in my robots.txt ? There are always spiders in "checkout_shipping" write_review" etc. although i have those pages in my robots.txt here the robots file: Disallow: /admin Disallow: /account.php Disallow: /advanced_search.php Disallow: /checkout_shipping.php Disallow: /create_account.php Disallow: /login.php Disallow: /login.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /shopping_cart.php Disallow: /cookie_usage.php Disallow: /product_review_write.php User-agent: Googlebot-Image Disallow: / something wrong here? Thanks a bunch! sandra HIM - Dark Light - Out on 26/09/05 Link to comment Share on other sites More sharing options...
WiseWombat Posted April 18, 2005 Share Posted April 18, 2005 Hi there, Are there some errors in my robots.txt ? There are always spiders in "checkout_shipping" write_review" etc. although i have those pages in my robots.txt here the robots file: Disallow: /admin Disallow: /account.php Disallow: /advanced_search.php Disallow: /checkout_shipping.php Disallow: /create_account.php Disallow: /login.php Disallow: /login.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /shopping_cart.php Disallow: /cookie_usage.php Disallow: /product_review_write.php User-agent: Googlebot-Image Disallow: / something wrong here? Thanks a bunch! sandra <{POST_SNAPBACK}> Before I installed the robots txt file to web root I found that the spiders had scanned the session ids in database so I removed them with phpmyadmin ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
kitchenniche Posted May 26, 2005 Author Share Posted May 26, 2005 Before I installed the robots txt file to web root I found that the spiders had scanned the session ids in database so I removed them with phpmyadmin <{POST_SNAPBACK}> Where can i find those "session ids" to remove them in phpmyadmin? thanks HIM - Dark Light - Out on 26/09/05 Link to comment Share on other sites More sharing options...
kitchenniche Posted May 26, 2005 Author Share Posted May 26, 2005 help :P HIM - Dark Light - Out on 26/09/05 Link to comment Share on other sites More sharing options...
FalseDawn Posted May 26, 2005 Share Posted May 26, 2005 Good grief - I've never read such mis-informed nonesense. Bots have no way of scanning your database - what happens is that if your store is not configured correctly, then all your links will have the SessionID appended to the URL. If you also do not turn on "prevent spider sessions" with an up-to-date spiders.txt file, then when a bot crawls your site (just like a human visitor), it will index your links, and these links will include the session IDs. Furthermore, sessions and your robots.txt file are not related in any way, anyway. The problem then occurs if users click these links from within a search engine page and start adding things to their cart, checking out out etc - another user could then come along and click the same link and "inherit" this session. Especially so if you don't have "recreate sessions" set, or have no idea what this does. Going back to your original problem, the robots.txt file does not prevent spiders from accessing certain files/directories, it is merely to "indicate" to the "well behaved" spiders that you don't want them in there. It is in any case a bad idea to have a robots.txt file like you have, since it is plain text and anybody can read it. It exposes your directory structure for a start... The best you can do (without using .htaccess and checking the referer) is to make sure that "prevent spider sessions" is turned on and get the latest spiders.txt file. You might also want to only allow logged-in users to write reviews - this will keep the bots out of this area. If you insist on using a robots.txt file, make sure it is in your web root. Link to comment Share on other sites More sharing options...
FalseDawn Posted May 26, 2005 Share Posted May 26, 2005 Oh, and adding: User-agent: * Just before: Disallow: /admin will probably help as well. Link to comment Share on other sites More sharing options...
♥Vger Posted May 26, 2005 Share Posted May 26, 2005 Good grief - I've never read such mis-informed nonesense. Yes. I don't want to be mean to anyone, as we've all been newbies at the start - but I do just wish that if people do not know the answer they would not offer one anyway. I'm surprised that Martin said that after 480 posts to the forum. Vger p.s. Whilst a properly constructed robots text file is invaluable, this only applies to 'good bots' as 'bad bots' ignore any robots file you have anyway - unless it's to gain more information. Link to comment Share on other sites More sharing options...
kitchenniche Posted May 27, 2005 Author Share Posted May 27, 2005 Thanks guys, - I already had the "Prevent Spider Sessions" turned on. - In my shop, you have to log in to write a review, nonetheless, googlebot, msnbot, yahoo etc. is there all the time. - now i tried what FalseDawn said: i placed "User-agent: *" at the beginning of the robots-file, which looks now like this: User-agent: * Disallow: /admin Disallow: /account.php Disallow: /advanced_search.php Disallow: /checkout_shipping.php Disallow: /create_account.php Disallow: /login.php Disallow: /login.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /shopping_cart.php Disallow: /cookie_usage.php Disallow: /product_review_write.php User-agent: Googlebot-Image Disallow: / I hope this will do it. Thanks a lot! :thumbsup: Sandra HIM - Dark Light - Out on 26/09/05 Link to comment Share on other sites More sharing options...
kitchenniche Posted May 27, 2005 Author Share Posted May 27, 2005 hmm seems like nothing has changed; msnbot is visiting /cookie_usage.php right now or is this usual and i have to wait a while until the bots do not crawl those pages anymore? thanks in advance sandra HIM - Dark Light - Out on 26/09/05 Link to comment Share on other sites More sharing options...
FalseDawn Posted May 27, 2005 Share Posted May 27, 2005 If you want to mimic what a bot will "see" as it travels your site, simply set your browser to "reject all cookies" - you'll see that if you have "force cookie use" set true, that as soon as you try to logon, you'll get this page. Turning off "force cookie use" is one option, although bots indexing this page is not a big problem - you can add it to your robots.txt file if desired, or add <meta name="robots" content="noindex,nofollow"> in the <head> section of the page if you don't want the "good" bots to index it. Edit: For security, you should also change the name of your "admin" directory to something unguessable, and remove this entry from the robots.txt file Link to comment Share on other sites More sharing options...
FalseDawn Posted May 27, 2005 Share Posted May 27, 2005 Actually, scratch that - bots will probably hit this page regardless of your "force cookie use" setting - I can't remember exactly... and they won't get sessions like a normal user, ho hum. I wouldn't worry too much about it - all "good" bots will respect your robots.txt file. Is your robots.txt file in your web root? Link to comment Share on other sites More sharing options...
kitchenniche Posted May 27, 2005 Author Share Posted May 27, 2005 Actually, scratch that - bots will probably hit this page regardless of your "force cookie use" setting - I can't remember exactly... and they won't get sessions like a normal user, ho hum. I wouldn't worry too much about it - all "good" bots will respect your robots.txt file. Is your robots.txt file in your web root? <{POST_SNAPBACK}> Yes, my robots.txt is in my web root (in /public_html). You said that the "good" bots will respect my robots.txt file, but msn and yahoo are already "wriiting" reviews for my site :D and crawl cookie_usage.php and so on. HIM - Dark Light - Out on 26/09/05 Link to comment Share on other sites More sharing options...
FalseDawn Posted May 27, 2005 Share Posted May 27, 2005 http://www.oscommerce.com/forums/index.php?showtopic=126617 Link to comment Share on other sites More sharing options...
WiseWombat Posted May 27, 2005 Share Posted May 27, 2005 Yes, my robots.txt is in my web root (in /public_html). You said that the "good" bots will respect my robots.txt file, but msn and yahoo are already "wriiting" reviews for my site :D and crawl cookie_usage.php and so on. <{POST_SNAPBACK}> If you are running apache this might also be a good add on to your .htacess file. Spider Session Remover ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
WiseWombat Posted May 27, 2005 Share Posted May 27, 2005 Yes. I don't want to be mean to anyone, as we've all been newbies at the start - but I do just wish that if people do not know the answer they would not offer one anyway. I'm surprised that Martin said that after 480 posts to the forum. Vger p.s. Whilst a properly constructed robots text file is invaluable, this only applies to 'good bots' as 'bad bots' ignore any robots file you have anyway - unless it's to gain more information. <{POST_SNAPBACK}> Maybe I should have worded it better. the spiders had indexed my site and they had indexed some page sessions before I installed the robots text file. Now these session I found where stored inside my database in session ID. SO I then deleted them and installed the robots text file. ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
FalseDawn Posted May 27, 2005 Share Posted May 27, 2005 I don't want to flog a dead horse here, but you are confusing sessions with the robots.txt file - the two are totally unrelated. Link to comment Share on other sites More sharing options...
WiseWombat Posted May 27, 2005 Share Posted May 27, 2005 I don't want to flog a dead horse here, but you are confusing sessions with the robots.txt file - the two are totally unrelated. <{POST_SNAPBACK}> The horse is dead and Yes I know that. Look at my original reply I said before I installed robots.txt file? ( WARNING ) I think I know what Im talking about. BACK UP BACK UP BACK UP BACK UP Link to comment Share on other sites More sharing options...
FalseDawn Posted May 27, 2005 Share Posted May 27, 2005 Your reply to me indicates that you believe the robots.txt file will in some way prevent spiders from creating sessions, when this is not the case. If this is not what you believe, then I apologize. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.