christian.tauber Posted June 1, 2008 Posted June 1, 2008 Google has apparently started indexing pages behind php forms - in my case duplicate entries show up in the index for all pages behind the currency form. http://googlewebmastercentral.blogspot.com...html-forms.html Eventually this might end up in duplicate content penalties or in page rank's being split up between two urls referring to the same page (the only difference being currency). Does anyone have the same problem ...or even better an idea on how to prevent this? So far I have been thinking along two lines: 1. As Google does not index pages behind POST forms, using POST method in the currency form might take care... 2. Utilizing spiders.txt to check the user agent and then redirecting to the "original" (i.e. default currency) url in case a spider string is found... Both ideas don't seem to be very elegant - more of a "rough and tumble" approach. Anyone in for a leaner, smarter approach?
christian.tauber Posted June 1, 2008 Author Posted June 1, 2008 ...lacking any better ideas I went on to check $spider_flag (set in application_top.php) in includes/boxes/currencies.php before creating the currency form. In case $spider_flag is true, the form is now simply not created. This works only if "Prevent Spider Sessions" is enabled (true). You may then replace the follwing code in includes/boxes/currencies.php $info_box_contents[] = array('form' => tep_draw_form('currencies', tep_href_link(basename($PHP_SELF), '', $request_type, false), 'get'), 'align' => 'center', 'text' => tep_draw_pull_down_menu('currency', $currencies_array, $currency, 'onChange="this.form.submit();" style="width: 100%"') . $hidden_get_variables . tep_hide_session_id()); with: if ($spider_flag == false) { $info_box_contents[] = array('form' => tep_draw_form('currencies', tep_href_link(basename($PHP_SELF), '', $request_type, false), 'get'), 'align' => 'center', 'text' => tep_draw_pull_down_menu('currency', $currencies_array, $currency, 'onChange="this.form.submit();" style="width: 100%"') . $hidden_get_variables . tep_hide_session_id()); } else { $info_box_contents = ''; } Pretty rough ...so if anybody can think of something more sophisticated, please leave a short comment or something. Plus you still need to create a 301 redirect to get the duplicate urls that already made it into Google's index out of there again. You may simply tweak the htaccess solution for getting pages with an amended OScid out of the index to do just that...
Jack_mcs Posted June 1, 2008 Posted June 1, 2008 Why not just add the pages to a robots file? Jack Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. All of My Addons Get the latest versions of my addons Recommended SEO Addons
christian.tauber Posted June 1, 2008 Author Posted June 1, 2008 Thanks Jack, ...I guess adding pages to the robots.txt is definitely an option - plus one that does not require any changes to the codebase. I shied away from that, as the only difference the urls behind the currency form would indeed be the currency identifier (e.g. "?currency=GBP"). For a stock installation a duplicate entry in Google might then materialize as: 1. http://<site>/product_info.php?products_id=1 2. http://<site>/product_info.php?currency=GBP&products_id=1 So if I would now add product_info.php into robots.txt, none of the product detail pages would ever be indexed. If I would add product_info.php?currency=<currency id>, I would have to add one entry for each currency and each page susceptible of being indexed with a currency identifier in its url (basically all osCommerce pages) - i.e. quite some entries... Or am I down the wrong path here? I would really appreciate your insight on that...
christian.tauber Posted June 1, 2008 Author Posted June 1, 2008 ...the moment I posted my reply I realized that I was indeed down "the wrong path". Basically you only need to add all pages to the robots.txt o n c e - exempting all pages with a currency identifier in them, right? I.e. "index.php?currency=", "product_info.php?currency=", ... That should work! Yet one more question for this to be waterproof: Is the currency identifier always amended first in osC urls? Or could a url be generated as follows: http://<site>/product_info.php?osCid=<osCid>&products_id=<products id>&<...>¤cy=<currency id> If the answer seems to be very obvious, please excuse ...it has been a while since I have been working with osCommerce. Thanks again!
Jack_mcs Posted June 1, 2008 Posted June 1, 2008 You shouldn't be adding the product_info page to the robots file. That will ruin of listings in the search engines. The problem, as I understand it, is that google may get to a page that shouldn't be listed, like login.php. Those pages shouldn't ever be listed so adding them to the robots file won't hurt anything. Jack Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. All of My Addons Get the latest versions of my addons Recommended SEO Addons
christian.tauber Posted June 1, 2008 Author Posted June 1, 2008 ...cheers Jack. No worries ...I'll definitely not add anything to robots.txt I do want people to find. The issue is not that Google indexes pages it should, by design, not (like login.php). The problem is, that Googlebot started making selections on GET forms and now indexes all pages behind these forms. In osCommerce the currency selection is done through a GET form. Googlebot does therefore index all osC pages twice or more in my case (once with no currency identifier and once or even twice with another one - refer to my earlier post for a stock osC example). The real issue being penalties for duplicate content or splitting page ranks... Things get more complicated if you use some kind of SEO url contribution. Therefore I would actually only use the robots.txt solution if you do not have any SEO url contribution installed...
Jack_mcs Posted June 1, 2008 Posted June 1, 2008 Oh, I see. Sorry for the confusion. I don't have a fix for you but I'm not sure it is an issue. If you read through that post, you will see where this problem is mentioned seveal times. So they are aware of the problem and I would think they check for it causing duplicate content pages, which is the real issue. But since it is experimental, I guess we won't know until the dust settles. You must be feeling great to be selected as one of the few sites they chose to test this on though. :) Jack Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. All of My Addons Get the latest versions of my addons Recommended SEO Addons
christian.tauber Posted June 1, 2008 Author Posted June 1, 2008 Jack, thanks for your reply ...and, the words they chose to describe the new ability of their bot is charming for sure. Then again, I think people running osCommerce experiencing the same issue as me would rather not end up being penalized once "the dust settles". And whereas I currently see no marketing potential in having the (content-wise) exact same page listed in Google's index with two or more different urls, I do see a lot of risk: Especially aforementioned penalties for duplicate content or two urls sharing the same (amount of) page rank. I guess I'll stick with hiding the currency form from spiders for now - plus I'll plug some 301 redirects into my htaccess to get the duplicates out of the index. Again, please, if you Jack or anybody else has similar issues and/or most importantly a smarter approach to prevent Google (and other spiders) from indexing pages behind the currency form - leave a quick message here... thanks.
♥toyicebear Posted June 1, 2008 Posted June 1, 2008 you might add a rel=”nofollow” attribute to the links Basics for osC 2.2 Design - Basics for Design V2.3+ - Seo & Sef Url's - Meta Tags for Your osC Shop - Steps to prevent Fraud... - MS3 and Team News... - SEO, Meta Tags, SEF Urls and osCommerce - Commercial Support Inquiries - OSC 2.3+ How To To see what more i can do for you check out my profile [click here]
christian.tauber Posted June 2, 2008 Author Posted June 2, 2008 toyicebear - thanks for the reply. They did mention that Googlebot would continue to respect the nofollow attribute... Top of my head I couldn't think of a way of how to code a nofollow into a GET form. Would you mind pointing me into a direction here?
♥toyicebear Posted June 2, 2008 Posted June 2, 2008 In includes/boxes/currencies.php try to change this: 'text' => tep_draw_pull_down_menu('currency', $currencies_array, $currency, 'onChange="this.form.submit();" style="width: 100%"') . $hidden_get_variables . tep_hide_session_id()); to 'text' => tep_draw_pull_down_menu('currency', $currencies_array, $currency, 'onChange="this.form.submit();" style="width: 100%" rel=”nofollow”') . $hidden_get_variables . tep_hide_session_id()); Basics for osC 2.2 Design - Basics for Design V2.3+ - Seo & Sef Url's - Meta Tags for Your osC Shop - Steps to prevent Fraud... - MS3 and Team News... - SEO, Meta Tags, SEF Urls and osCommerce - Commercial Support Inquiries - OSC 2.3+ How To To see what more i can do for you check out my profile [click here]
Guest Posted June 2, 2008 Posted June 2, 2008 why not just change the meta tags of the page, if currency is set? something like: if (isset($_GET['currency'])){ //display meta tags for noindex,nofollow }
christian.tauber Posted June 2, 2008 Author Posted June 2, 2008 ...toyicebear and eww ...thanks a lot for your hints. These are exactly the kind of "more sophisticated ideas" I was hoping to find here. Thanks guys! For now I am not sure incorporating nofollow into the form would do the trick (I didn't know you could actually do that - and W3 does not specify forms as being able to take "rel" as an attribute: http://www.w3.org/TR/html401/interact/forms.html )... As for checking whether currency is set on each affected page: That would be every php page not behind the login form - not to many in a stock installation actually, so it should be fairly feasible. Thanks again ...if I bump into an even slimmer solution I'll for sure post it here!
Guest Posted June 2, 2008 Posted June 2, 2008 isn't the currency only linked on index.php and product_info.php? so you'd only have 2 edits to incorporate the meta tag attributes. if you have the box showing on any login pages, google wouldn't see those anyway.. just as long as you have them blocked in robots.txt
christian.tauber Posted June 3, 2008 Author Posted June 3, 2008 ...eww, I guess you're right for a stock install. And even for a more customized store the total number of underlying .php pages should not be too many to tweak each and every one of them. In my case I went on to include your suggestion directly in my header tag contribution ...so I actually only needed to touch one single file. Figuring out a smart way to create 301 redirects is proving a bit more cumbersome. Jack, toyicebear, eww - thanks again for caring. Seems we arrived at a slim, easy to implement solution. Now all it takes is for Google to truly respect the noindex, nofollow attributes...
jobs.steven Posted June 17, 2008 Posted June 17, 2008 Hello, Christian I'm looking for the way to make www.epathchina.com/index.php?currency=GBP to www.epathchina.com/ for several days. You could have a look atwww.chinavasion.com, when they exchange the currency in the index.php, the URL will be rewrite to www.chinavasion.com/, that's so cool! I tried to find the way to do this but harvestless :(
jobs.steven Posted June 17, 2008 Posted June 17, 2008 Maybe 301 redirect could achieve our request...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.