Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

help, search engines displaying session id's in the results


yanarasod

Recommended Posts

I dont know how search engines are showing the session id's with url's

 

google shows session id with product popup links and account page, whereas msn shows session id with each link

 

i have these settings under: admin/configuration/sessions

 

Force Cookie Use False

Check SSL Session ID False

Check User Agent False

Check IP Address False

Prevent Spider Sessions True

Recreate Session False

 

 

i am really worried about this, what should i do, where should i do , anyway to fix it. thanks for any help in advance :'(

Link to comment
Share on other sites

Do you have an updated spiders.txt file in place?

 

Without the spiders.txt file in place, prevent spider sessions won't work....that's my understanding anyway.

 

In addition, you need the robots.txt set up to keep the bots out of your areas you don't want them in...shopping cart, create acct, etc.

Link to comment
Share on other sites

thanks very much for the reply.

 

yes i do have a spiders.txt in

 

root/includes/spiders.txt

 

but it isnt updated or in other words, no changes after the default osc install.

 

there is currently no robots.txt file.

 

what should i do next .thanks very much

Link to comment
Share on other sites

I used this for spiders.txt and yahoo was the only one to pick up SID's but I think it was because I was too slow adding the file.

 

http://www.oscommerce.com/community/contri...rch,spiders.txt

 

There's a thread w/tips for the robots file

 

http://www.oscommerce.com/forums/index.php?sho...7&hl=robots.txt

 

And there's an example robots.txt file here

 

http://www.oscommerce.com/community/contri...l/search,robots

 

the robots.txt file goes in your root.

Link to comment
Share on other sites

i will make the robots.txt file and paste it here for confirmation.

 

and i am going to do whatevery changes you told. i will come back here after a fews days/weeks after i get the result of the new google search results.

 

any other pages that need to be disallowed (like popup_image.php), i think i need to disallow, popup_search_help.php

 

also isee google doesn't display resutls for pages of url like this

 

site.com/folder/file.php

 

site.com/trends/new.php

 

the catalog is at the root

 

site.com

Link to comment
Share on other sites

this forum doesn't let me dit my post twice <_<

 

ok, so do i need to disallow

 

popup_search_help , and all other popups pages.

 

 

also i went to download the latest spiders.txt from the contribtuons section (by stonebridgecomputing 13 jun 2006), but it says the file is password protected. does it require a password :blink: what should i do now really confused

 

 

pass.gif

Link to comment
Share on other sites

Try the skip file button...see if the rest of it will open. If that doesn't work, just download the one below it, from 20 May 06. Looks like Stonebridge computing listed the 2 additions on the contribution page anyway. You could add those 2 manually.

 

I think all of the versions have the same readme.txt file . Here is one from May '06 ...

 

spiders.txt contribution

 

Steve Lionel

[email protected]

 

The file catalog/includes/spiders.txt is used in conjunction with the "Prevent Spider Sessions" feature in Admin->Sessions. When that feature is enabled, osC downcases the "user agent" from the HTTP request and looks to see if any of the strings in spiders.txt are found in it. This is to identify search engine spiders (also called robots). If there is a match, the spider is not assigned a PHP session, but can still access the page. The benefits of this are:

 

- The search engines do not include session IDs in their index.

- The spiders are not able to add items to a "cart", thus filling up your database with worthless carts

 

This feature does NOT prevent spiders from indexing your site. Rather, it improves the quality of the indexing.

 

osCommerce 2.2-MS2 was released in early 2003 and many new spiders have been launched since that time. Keeping spiders.txt up to date is highly recommended.

 

This contribution contains a spiders.txt that has been updated with new spiders that I have seen visiting my sites. It has also been streamlined somewhat, as the stock one has redundant and incorrect lines - these do no harm but make page loading slower.

 

I will keep updating this as I identify new active spiders. I am not trying to identify every spider ever created. A second file, spiders-large.txt, is included. This adds spider strings as supplied by ChrisW123, and I have optimized it somewhat, removing redundant strings. I have not tried to validate all the strings in this file. You can use the "large" file if you want to further reduce the chance that a spider will start a session, but at the cost of slower loads of every page. If you use this, rename it to spiders.txt when you place it in your catalog/includes folder.

 

Please note that the strings here are not necessarily what spiders look for in robots.txt - a separate issue.

 

The purpose of this file is to tell if an incoming request is from a spider, NOT to identify a particular spider. Therefore, there are some common substrings in the list such as "spider", "crawl", and "obot" which match many different spiders. For example, "ebot" matches Googlebot, "nbot" matches msnbot. The strings in this file MUST be all lowercase, or else they will be ignored. If you think a particular spider is missing from the list, please post in the support topic and include a line from your access log showing the spider access including the full user agent string. Please do not update this contribution unless you fully understand how it works.

 

Comments or questions should go in the support topic at http://www.oscommerce.com/forums/index.php?showtopic=112609 Please contact me if you have suggestions or changes for this list.

 

Released under the GNU Public License

 

---

The following is a list of spiders NOT included, as they appear to be "link validators" and not actual search engine spiders:

 

linkalarm

validator

zealbot

zeus

feedchecker

Link to comment
Share on other sites

Yes, I would just use the previous one. Stevel is the one that posts the updates usually anyway.

 

Maybe post in the spiders contribution forum that you had trouble w/ the update:

 

http://www.oscommerce.com/forums/index.php?showtopic=112609

 

all i get inside it is a gpl file, i tried to skip but all other files it says are password-protected: readme, spiders-large, spiders. so should i completely rest with Stevels' update, any other way.
Link to comment
Share on other sites

ok, downloaded stevels update.

 

but this spiders file has no googlebot or any reference to it. have you had any problems with it.

 

this is what the readme says

 

The purpose of this file is to tell if an incoming request is from a spider, NOT to identify a particular spider. Therefore, there are some common substrings in the list such as "spider", "crawl", and "obot" which match many different spiders. For example, "ebot" matches Googlebot, "nbot" matches msnbot. The strings in this file MUST be all lowercase, or else they will be ignored. If you think a particular spider is missing from the list, please post in the support topic and include a line from your access log showing the spider access including the full user agent string. Please do not update this contribution unless you fully understand how it works.

Link to comment
Share on other sites

what do you think of my earlier post.

 

also this is my robots.txt, please cab you spot anything wrong, you will find here a page called tea_photos.php, this is a member only page, should i make it hidden or private. also dont account_orders.php and such pages need a disaalow in this file, many thanks

 

# Pages Protection
User-agent: *
Disallow: /admin
Disallow: /account.php
Disallow: /advanced_search.php
Disallow: /checkout_shipping.php
Disallow: /create_account.php
Disallow: /login.php
Disallow: /login.php
Disallow: /password_forgotten.php
Disallow: /popup_image.php
Disallow: /shopping_cart.php
Disallow: /tea_photos.php
# Images Protection
User-agent: Googlebot-Image
Disallow: /

Link to comment
Share on other sites

ok, i have made some changes, brought the googlebot-image to the top, as said to have less specific first. now do you spot anything wrong and also about the special members file i spoke about in my earlier post. many thanks

 

# Images Protection
User-agent: Googlebot-Image
Disallow: /

# Pages Protection
User-agent: *
Disallow: /admin
Disallow: /account.php
Disallow: /advanced_search.php
Disallow: /checkout_shipping.php
Disallow: /create_account.php
Disallow: /login.php
Disallow: /login.php
Disallow: /password_forgotten.php
Disallow: /popup_image.php
Disallow: /shopping_cart.php
Disallow: /tea_photos.php

Link to comment
Share on other sites

ok modified it even more, i saw your robots.txt, you have password_forgotten.php twice, also do we need to add the directories called, downloads, tmp etc. please see this for details:

http://www.wheeloftime.nl/robots.txt

 

my modified robots.txt, please tell me if you see anything wrong. thanks

 

# Bhura Tea robots.txt

# Images Protection
User-agent: Googlebot-Image
Disallow: /

# Pages Protection
User-agent: *
Disallow: /admin/
Disallow: /account.php
Disallow: /account_edit.php
Disallow: /account_history.php
Disallow: /account_history_info.php
Disallow: /account_newsletters.php
Disallow: /account_notifications.php
Disallow: /account_password.php
Disallow: /address_book_process.php
Disallow: /advanced_search.php
Disallow: /checkout_shipping.php
Disallow: /checkout_confirmation.php
Disallow: /checkout_payment_address.php
Disallow: /checkout_shipping_address.php
Disallow: /create_account.php
Disallow: /login.php
Disallow: /logoff.php
Disallow: /password_forgotten.php
Disallow: /popup_image.php
Disallow: /shopping_cart.php
Disallow: /info_shopping_cart.php
Disallow: /tell_a_friend.php
Disallow: /cookie_usage.php
Disallow: /ssl_check.php
Disallow: /product_reviews_write.php
Disallow: /tea_photos.php

Link to comment
Share on other sites

If you are using my contribution, it contains the string "ebot" which catches Googlebot. It certainly works on my store and many others. If you are using my contribution and Googlebot really is STILL getting a session, then that needs investigation. It could be that Google got these session IDs before you turned on Prevent Spider Sessions, as even the stock osC file would catch Googlebot.

 

If you'll provide me the URL of your store, I can test this.

 

As for stonebridgecomputing's update, ignore it. It is wrong.

Link to comment
Share on other sites

Your site is fine as regards Google and sessions. Googlebot is not getting sessions now from your site. If it did in the past, well that's another problem. There is a contribution Spider Session Remover which can help, but it is not a cure-all. If you can get a list of the session IDs that Google has for you, you can delete them from the database (or session files). You can't really tell if a user is coming in from one of those links - I suppose you could add code to check to see if the referrer is google (or some other search engine) and recreate the session if it is, but that's a lot of work.

 

As for your robots.txt - I suggest removing the entry for your admin area. If there are no links in, search engines won't try them. Your admin should be password-protected anyway. I don't like to advertise my admin folder location (on my store it is NOT /admin) to help dissuade those who think it is fun to try to break into people's osC admin areas (and they do exist.)

Link to comment
Share on other sites

Your site is fine as regards Google and sessions

no google shows oscid:

 

i think you searched this way:

 

site:www.mysite.com

 

try this instead

 

site:mysite.com

 

(replace mysite.com with my site) and see the results.

 

 

There is a contribution Spider Session Remover which can help, but it is not a cure-all.

 

as for the sid removal for the current SE grabbed id's , what do you think about this contribution:

 

http://www.oscommerce.com/community/contributions,2819

 

 

If you can get a list of the session IDs that Google has for you, you can delete them from the database (or session files).

 

google has 2 and msn 2 ,so no big task, so how do i remove it.

 

 

As for your robots.txt - I suggest removing the entry for your admin area. If there are no links in, search engines won't try them. Your admin should be password-protected anyway. I don't like to advertise my admin folder location (on my store it is NOT /admin) to help dissuade those who think it is fun to try to break into people's osC admin areas (and they do exist.)

 

thanks i will remov that. aslo can i include, disalolow: downloads, tmp, includes , cache and other such folders. inlcudes is password -protected i think,so need to to add it?

 

thanks

Link to comment
Share on other sites

You misunderstood what I wrote. I said that YOUR SITE is fine and that Google visiting your site NOW does not get sessions. I did not search Google's index, that's another thing entirely.

 

The session remover contribution you found is the one I was referring to.

 

Those other folders do not need to be added. No search engine can see them. I'll comment that there are people who deliberately look in robots.txt and access files listed to see what's there...

Link to comment
Share on other sites

You misunderstood what I wrote. I said that YOUR SITE is fine and that Google visiting your site NOW does not get sessions. I did not search Google's index, that's another thing entirely.

 

if not that way how did you come to know that google visiting my site does not get sessions. is there any tool. :o

 

 

The session remover contribution you found is the one I was referring to.

 

ok i am going to place that in my site's root. will it in anyway collide with the current .htaccess inside includes. after installing this contribution is there a need to remove th id''s from db as you said earlier.

 

 

Those other folders do not need to be added. No search engine can see them. I'll comment that there are people who deliberately look in robots.txt and access files listed to see what's there...

 

ok this the final version of the code, can you spot anything wrong.

 

# sandalwood robots.txt

# Images Protection
User-agent: Googlebot-Image
Disallow: /

# Pages Protection
User-agent: *
Disallow: /account.php
Disallow: /account_edit.php
Disallow: /account_history.php
Disallow: /account_history_info.php
Disallow: /account_newsletters.php
Disallow: /account_notifications.php
Disallow: /account_password.php
Disallow: /address_book_process.php
Disallow: /advanced_search.php
Disallow: /checkout_shipping.php
Disallow: /checkout_confirmation.php
Disallow: /checkout_payment_address.php
Disallow: /checkout_shipping_address.php
Disallow: /create_account.php
Disallow: /login.php
Disallow: /logoff.php
Disallow: /password_forgotten.php
Disallow: /popup_image.php
Disallow: /shopping_cart.php
Disallow: /info_shopping_cart.php
Disallow: /tell_a_friend.php
Disallow: /cookie_usage.php
Disallow: /ssl_check.php
Disallow: /product_reviews_write.php
Disallow: /tea_photos.php

 

do you also think that i should add the following files to the robots.txt

 

add_checkout_process.php
advanced_search_result.php
checkout_process.php
checkout_success.php
popup_search_help.php
redirect.php
download.php

and this for ccgv

gv_redeem.php

Link to comment
Share on other sites

I use the "User Agent Switcher" extension to Firefox to set my browser's user agent to Googlebot. I then visit your site and see if I have a session. When I did that with your site, I did not have a session.

 

As for the robots.txt, some of those files I don't recognize. Only files which have links from a page the robot would see should be listed. That means none of the checkout_xxx files except checkout_shipping. Actually, you can have just /checkout and that will cover what you need.

Link to comment
Share on other sites

As for the robots.txt, some of those files I don't recognize.

 

all other files below are by default with osc except the gv_redeem which comes with ccgv.

do you think, i should also include popup_search_help.php in the robots.txt just like popup_image.php, and also the below files

 

advanced_search_result.php

checkout_success.php

popup_search_help.php

redirect.php

download.php

 

and this for ccgv

 

gv_redeem.php

 

 

 

Only files which have links from a page the robot would see should be listed. That means none of the checkout_xxx files except checkout_shipping. Actually, you can have just /checkout and that will cover what you need.

 

 

my new robots.txt

 

# game robots.txt

# Images Protection
User-agent: Googlebot-Image
Disallow: /

# Pages Protection
User-agent: *
Disallow: /account.php
Disallow: /account_edit.php
Disallow: /account_history.php
Disallow: /account_history_info.php
Disallow: /account_newsletters.php
Disallow: /account_notifications.php
Disallow: /account_password.php
Disallow: /address_book_process.php
Disallow: /advanced_search.php
Disallow: /checkout
Disallow: /create_account.php
Disallow: /login.php
Disallow: /logoff.php
Disallow: /password_forgotten.php
Disallow: /popup_image.php
Disallow: /shopping_cart.php
Disallow: /info_shopping_cart.php
Disallow: /tell_a_friend.php
Disallow: /cookie_usage.php
Disallow: /ssl_check.php
Disallow: /product_reviews_write.php
Disallow: /tea_photos.php

 

 

ok i am going to place that in my site's root. will it in anyway collide with the current .htaccess inside includes. after installing this contribution is there a need to remove the sid''s from db as you said earlier. also is there a need to keep this .htaccess if the sid's are removed from search engines, when they next crawl, i.e. i can delete this file after a few weeks, can't I???

 

hi can you please answer the above question.

Link to comment
Share on other sites

The .htaccess lines that the Remover contrib specifies should be in the .htaccess of your "catalog" or "store" folder. You can ADD them to one in your site's root if you want, but don't just replace an existing .htaccess with that one.

 

It will take more than a few weeks for the effect to be seen. Personally, I would remove the sessions from the database anyway.

Link to comment
Share on other sites

Personally, I would remove the sessions from the database anyway.

 

but should i keep this new .htaccess even after search engines rop the old results. or should i keepp this new .htaccess forever.

 

about deleting them from databse: which table should i go and delete these sessions from

 

under sessions i find: sesskey, expiry value

 

 

 

also is my robots.txt alright now and what about adding those extra files which i mentioned, do you have any idea about them too.

 

thanks.... :thumbsup:

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...