Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

robots.txt not working properly


kitchenniche

Recommended Posts

Hi there,

 

Are there some errors in my robots.txt ? There are always spiders in "checkout_shipping" write_review" etc. although i have those pages in my robots.txt

 

here the robots file:

 

Disallow: /admin

Disallow: /account.php

Disallow: /advanced_search.php

Disallow: /checkout_shipping.php

Disallow: /create_account.php

Disallow: /login.php

Disallow: /login.php

Disallow: /password_forgotten.php

Disallow: /popup_image.php

Disallow: /shopping_cart.php

Disallow: /cookie_usage.php

Disallow: /product_review_write.php

 

 

 

User-agent: Googlebot-Image

Disallow: /

 

something wrong here?

 

Thanks a bunch!

 

sandra

HIM - Dark Light - Out on 26/09/05
Link to comment
Share on other sites

Hi there,

 

Are there some errors in my robots.txt ? There are always spiders in "checkout_shipping" write_review" etc. although i have those pages in my robots.txt

 

here the robots file:

 

Disallow: /admin

Disallow: /account.php

Disallow: /advanced_search.php

Disallow: /checkout_shipping.php

Disallow: /create_account.php

Disallow: /login.php

Disallow: /login.php

Disallow: /password_forgotten.php

Disallow: /popup_image.php

Disallow: /shopping_cart.php

Disallow: /cookie_usage.php

Disallow: /product_review_write.php

User-agent: Googlebot-Image

Disallow: /

 

something wrong here?

 

Thanks a bunch!

 

sandra

Before I installed the robots txt file to web root I found that the spiders had scanned the session ids in database so I removed them with phpmyadmin

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

  • 1 month later...
Before I installed the robots txt file to web root I found that the spiders had scanned the session ids in database so I removed them with phpmyadmin

 

Where can i find those "session ids" to remove them in phpmyadmin?

 

thanks

HIM - Dark Light - Out on 26/09/05
Link to comment
Share on other sites

Good grief - I've never read such mis-informed nonesense.

 

Bots have no way of scanning your database - what happens is that if your store is not configured correctly, then all your links will have the SessionID appended to the URL. If you also do not turn on "prevent spider sessions" with an up-to-date spiders.txt file, then when a bot crawls your site (just like a human visitor), it will index your links, and these links will include the session IDs.

 

Furthermore, sessions and your robots.txt file are not related in any way, anyway.

 

The problem then occurs if users click these links from within a search engine page and start adding things to their cart, checking out out etc - another user could then come along and click the same link and "inherit" this session. Especially so if you don't have "recreate sessions" set, or have no idea what this does.

 

Going back to your original problem, the robots.txt file does not prevent spiders from accessing certain files/directories, it is merely to "indicate" to the "well behaved" spiders that you don't want them in there.

 

It is in any case a bad idea to have a robots.txt file like you have, since it is plain text and anybody can read it. It exposes your directory structure for a start...

 

The best you can do (without using .htaccess and checking the referer) is to make sure that "prevent spider sessions" is turned on and get the latest spiders.txt file.

 

You might also want to only allow logged-in users to write reviews - this will keep the bots out of this area.

If you insist on using a robots.txt file, make sure it is in your web root.

Link to comment
Share on other sites

Good grief - I've never read such mis-informed nonesense.

 

Yes. I don't want to be mean to anyone, as we've all been newbies at the start - but I do just wish that if people do not know the answer they would not offer one anyway. I'm surprised that Martin said that after 480 posts to the forum.

 

Vger

 

p.s. Whilst a properly constructed robots text file is invaluable, this only applies to 'good bots' as 'bad bots' ignore any robots file you have anyway - unless it's to gain more information.

Link to comment
Share on other sites

Thanks guys,

 

- I already had the "Prevent Spider Sessions" turned on.

 

- In my shop, you have to log in to write a review, nonetheless, googlebot, msnbot, yahoo etc. is there all the time.

 

- now i tried what FalseDawn said: i placed "User-agent: *" at the beginning of the robots-file, which looks now like this:

 

User-agent: *

Disallow: /admin

Disallow: /account.php

Disallow: /advanced_search.php

Disallow: /checkout_shipping.php

Disallow: /create_account.php

Disallow: /login.php

Disallow: /login.php

Disallow: /password_forgotten.php

Disallow: /popup_image.php

Disallow: /shopping_cart.php

Disallow: /cookie_usage.php

Disallow: /product_review_write.php

 

 

 

User-agent: Googlebot-Image

Disallow: /

 

I hope this will do it.

 

Thanks a lot! :thumbsup:

Sandra

HIM - Dark Light - Out on 26/09/05
Link to comment
Share on other sites

hmm seems like nothing has changed; msnbot is visiting /cookie_usage.php right now or is this usual and i have to wait a while until the bots do not crawl those pages anymore?

 

thanks in advance

 

sandra

HIM - Dark Light - Out on 26/09/05
Link to comment
Share on other sites

If you want to mimic what a bot will "see" as it travels your site, simply set your browser to "reject all cookies" - you'll see that if you have "force cookie use" set true, that as soon as you try to logon, you'll get this page.

Turning off "force cookie use" is one option, although bots indexing this page is not a big problem - you can add it to your robots.txt file if desired,

or add

<meta name="robots" content="noindex,nofollow"> in the <head> section of the page if you don't want the "good" bots to index it.

 

Edit: For security, you should also change the name of your "admin" directory to something unguessable, and remove this entry from the robots.txt file

Link to comment
Share on other sites

Actually, scratch that - bots will probably hit this page regardless of your "force cookie use" setting - I can't remember exactly... and they won't get sessions like a normal user, ho hum.

 

I wouldn't worry too much about it - all "good" bots will respect your robots.txt file.

 

Is your robots.txt file in your web root?

Link to comment
Share on other sites

Actually, scratch that - bots will probably hit this page regardless of your "force cookie use" setting - I can't remember exactly... and they won't get sessions like a normal user, ho hum.

 

I wouldn't worry too much about it - all "good" bots will respect your robots.txt file.

 

Is your robots.txt file in your web root?

 

Yes, my robots.txt is in my web root (in /public_html).

 

You said that the "good" bots will respect my robots.txt file, but msn and yahoo are already "wriiting" reviews for my site :D and crawl cookie_usage.php and so on.

HIM - Dark Light - Out on 26/09/05
Link to comment
Share on other sites

Yes, my robots.txt is in my web root (in /public_html).

 

You said that the "good" bots will respect my robots.txt file, but msn and yahoo are already "wriiting" reviews for my site  :D and crawl cookie_usage.php and so on.

If you are running apache this might also be a good add on to your .htacess file.

 

Spider Session Remover

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

Yes.  I don't want to be mean to anyone, as we've all been newbies at the start - but I do just wish that if people do not know the answer they would not offer one anyway.  I'm surprised that Martin said that after 480 posts to the forum.

 

Vger

 

p.s. Whilst a properly constructed robots text file is invaluable, this only applies to 'good bots' as 'bad bots' ignore any robots file you have anyway - unless it's to gain more information.

Maybe I should have worded it better.

the spiders had indexed my site and they had indexed some page sessions before I installed the robots text file. Now these session I found where stored inside my database in session ID. SO I then deleted them and installed the robots text file.

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

I don't want to flog a dead horse here, but you are confusing sessions with the robots.txt file - the two are totally unrelated.

The horse is dead and Yes I know that.

Look at my original reply I said before I installed robots.txt file?

( WARNING )

I think I know what Im talking about.

BACK UP BACK UP BACK UP BACK UP

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...