Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Duplicate content for osC sort functions


FWR Media

Recommended Posts

Check Google webmaster tools because Google has started showing duplicate titles/descriptions due to the standard oscommerce sort and paging functions.

 

Example shown with SEO URLs.

 

Product URL

 

http://www.mysite.com/a-great-product-c-32.html

 

Duplicate titles/descriptions

 

http://www.mysite.com/a-great-product-c-32...t=2a&page=2

http://www.mysite.com/a-great-product-c-32...t=2d&page=2

http://www.mysite.com/a-great-product-c-32...t=3a&page=2

http://www.mysite.com/a-great-product-c-32...t=3d&page=2

 

Then when you consider the number of pages there may be it gets worse from there.

 

I've had an answer from webmaster tools and they seem to recommend the use of rel="nofollow"

 

The following is my initial suggestion but has no testing: -

 

1) includes/classes/split_page_results.php

 

Find all instances of tep_href_link

 

in each link find ..

 

class="pageResults"

 

 

change to ..

 

 

class="pageResults" rel="nofollow"

 

 

2) includes/functions/general.php

 

Find function tep_create_sort_heading

 

find in the function ..

 

 

title="

 

 

Change to ..

 

 

rel="nofollow" title="

 

 

3) includes/boxes/languages.php

 

Find ..

 

language=' . $key, $request_type) . '">

 

 

Replace with ..

 

 

language=' . $key, $request_type) . '" rel="nofollow">

 

It would be good if others added or changed this over time as we find out how it works and/or which additions are needed.

 

I did follow this instruction. Now, Google did not cache my website anymore.

Link to comment
Share on other sites

  • Replies 54
  • Created
  • Last Reply
I did follow this instruction. Now, Google did not cache my website anymore.

 

This would not stop Google indexing your site, period.

 

However the preferred method is the code which adds a NOINDEX meta if specific $_GET variables are set.

Link to comment
Share on other sites

Another method likely to work is the following which also covers more exclusions.

 

The following array or similar should be available.

 

$spiderNoFollow = array('sort', 'page', 'language', 'currency');

 

The following code would go in the <head></head> of the page ..

 

 

<?php
if( (true === $spider_flag) && spiderNoFollow($spiderNoFollow) ) {
echo '<meta name="ROBOTS" content="NOINDEX, FOLLOW">';
}
?>

 

The following function could e.g. go into includes/functions/general.php

 

function spiderNoFollow($spiderNoFollow){

 foreach( $spiderNoFollow as $value ){
if( isset($_GET[$value]) ){
  return true;
}
 }
 return false;
}

 

Hi Rob,

 

Tried using this...where are you setting $spider_flag ??? At the moment its not adding the meta tag cos im missing $spider_flag I think....cheers mate :)

 

I have also read up elsewhere and saw people have added this to robots.txt

Disallow: /*sort=*
Disallow: /*manufacturers_id=*products_id=*

 

Gonna try it out, thought I'd share that with all :)

Link to comment
Share on other sites

Hi Rob,

 

Tried using this...where are you setting $spider_flag ??? At the moment its not adding the meta tag cos im missing $spider_flag I think....cheers mate :)

 

I have also read up elsewhere and saw people have added this to robots.txt

Disallow: /*sort=*
Disallow: /*manufacturers_id=*products_id=*

 

Gonna try it out, thought I'd share that with all :)

 

Ignore $spider_flag as it really doesn't matter (in this case)

 

Put ..

 

$spiderNoFollow = array('sort', 'page', 'language', 'currency');

 

at the very bottom of includes/application_top.php immediately above the closing ?> (This is not exhaustive you can add as many $_GET variables as needed).

 

then in the <head></head> of the files put ..

 

<?php
if( isset($spiderNoFollow) && spiderNoFollow($spiderNoFollow) ) {
 echo '<meta name="ROBOTS" content="NOINDEX, FOLLOW">';
}
?>

 

The following function could e.g. go into includes/functions/general.php

 

 

function spiderNoFollow($spiderNoFollow){
 if( is_array($spiderNoFollow) ){
foreach( $spiderNoFollow as $value ){
  if( isset($_GET[$value]) ){
	return true;
  }
}
 }
 return false;
}

Link to comment
Share on other sites

Ignore $spider_flag as it really doesn't matter (in this case)

 

Put ..

 

$spiderNoFollow = array('sort', 'page', 'language', 'currency');

 

at the very bottom of includes/application_top.php immediately above the closing ?>

 

then in the <head></head> of the files put ..

 

<?php
if( isset($spiderNoFollow) && spiderNoFollow($spiderNoFollow) ) {
 echo '<meta name="ROBOTS" content="NOINDEX, FOLLOW">';
}
?>

 

The following function could e.g. go into includes/functions/general.php

 

 

function spiderNoFollow($spiderNoFollow){
 if( is_array($spiderNoFollow) ){
foreach( $spiderNoFollow as $value ){
  if( isset($_GET[$value]) ){
	return true;
  }
}
 }
 return false;
}

 

Thanks mate...had everything else the way you mentioned except the $spider_flag var :) legend it showed up cheers :D

Link to comment
Share on other sites

i am also having problems with duplicate tags, i have done the first part of the code

$spiderNoFollow = array('sort', 'page', 'language', 'currency');

 

the seconed part of the code you say put it in the <head></head> which file is this in also does the third part of the code need to be in a certain part of the file top, middle, bottom or does it not matter

Link to comment
Share on other sites

i am also having problems with duplicate tags, i have done the first part of the code
$spiderNoFollow = array('sort', 'page', 'language', 'currency');

 

the seconed part of the code you say put it in the <head></head> which file is this in also does the third part of the code need to be in a certain part of the file top, middle, bottom or does it not matter

 

<head></head> catalog/index.php or other catalog files if needed.

 

The function spiderNoFollow($spiderNoFollow)

 

at the bottom of catalog/includes/functions/general.php before the closing ?>

Link to comment
Share on other sites

I think i have gone wrong some where, i my not explain this right so please bear with me,

 

Put all code in all the right places i think

used tags analizer to see what happens

 

www.allpawstogether.co.uk/christmas-crinkle-p-136.html

all details came out fine titles, description, web address

 

i then put in

www.allpawstogether.co.uk/charms-for-dogs-c-116.html

 

this is how it says it would be displayed on search engine page

 

No title

list view 24.99

http://www.allpawstogether.co.uk/charms-for-dogs-c-116.html

 

Im sure this is wrong

 

would you know where im going wrong

Link to comment
Share on other sites

I think i have gone wrong some where, i my not explain this right so please bear with me,

 

Put all code in all the right places i think

used tags analizer to see what happens

 

www.allpawstogether.co.uk/christmas-crinkle-p-136.html

all details came out fine titles, description, web address

 

i then put in

www.allpawstogether.co.uk/charms-for-dogs-c-116.html

 

this is how it says it would be displayed on search engine page

 

No title

list view 24.99

http://www.allpawstogether.co.uk/charms-for-dogs-c-116.html

 

Im sure this is wrong

 

would you know where im going wrong

 

Not sure what you mean I looked at those pages and all meta seemed in place, I then used ..

 

http://www.allpawstogether.co.uk/charms-fo...rer&sort=2a

 

And sure enough there is ..

 

<meta name="ROBOTS" content="NOINDEX, FOLLOW">

 

Seems perfect to me.

Link to comment
Share on other sites

Hi,

I'm having a problem with duplicate title tags and meta descriptions. One of the problems is the sort pages and Google has even found that catalog and catalog/index.php is the same page.

Another problem is the osCsid is appended to the url. For this I used Chemo's in includes/application_top.php

// include the language translations

require(DIR_WS_LANGUAGES . $language . '.php');

Pasted under

 

if ( $spider_flag == true ){

if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){

$location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);

header("HTTP/1.0 301 Moved Permanently");

header("Location: $location"); // redirect...bye bye

}

}

 

I added a robots.txt file.

 

I am thinking of following Robert's advice in post 29 (the three steps).

 

I have set prevent spider sessions to true from the admin panel.

 

Is there anything else I should do? I don't like the osCsid appended to the url and I know it can create other problems as well. I noticed that when Yahoo is crawling it gets a session id.

How will I know if these steps have helped? Will I see less duplicate pages soon or is it going to take quite a while?

 

Thank you in advance,

Alexandra

Link to comment
Share on other sites

I admit to find this whole thread a little confusing, as people appear to be talking at cross-purposes.

 

I am not sure if you are referring to Google indexing the same product under separate categories or to Google indexing the same product on the same pages.

 

What I mean by the latter is that if you add a new product to more than one Category then it can turn up more than once under New Products, or Best Sellers or Specials etc.

 

As osCommerce is database driven this duplication can be prevented by changing the 'select' query from 'select' to 'select DISTINCT'. osCommerce already uses this in places but once modifications start to be made it is easy to overlook the need to alter the select query.

 

Vger

Link to comment
Share on other sites

Hi,

I'm having a problem with duplicate title tags and meta descriptions. One of the problems is the sort pages and Google has even found that catalog and catalog/index.php is the same page.

Another problem is the osCsid is appended to the url. For this I used Chemo's in includes/application_top.php

// include the language translations

require(DIR_WS_LANGUAGES . $language . '.php');

Pasted under

 

if ( $spider_flag == true ){

if ( eregi(tep_session_name(), $_SERVER['REQUEST_URI']) ){

$location = tep_href_link(basename($_SERVER['SCRIPT_NAME']), tep_get_all_get_params(array(tep_session_name())), 'NONSSL', false);

header("HTTP/1.0 301 Moved Permanently");

header("Location: $location"); // redirect...bye bye

}

}

 

I added a robots.txt file.

 

I am thinking of following Robert's advice in post 29 (the three steps).

 

I have set prevent spider sessions to true from the admin panel.

 

Is there anything else I should do? I don't like the osCsid appended to the url and I know it can create other problems as well. I noticed that when Yahoo is crawling it gets a session id.

How will I know if these steps have helped? Will I see less duplicate pages soon or is it going to take quite a while?

 

Thank you in advance,

Alexandra

 

If the osCsid is persistent then your configure settings are incorrect.

 

Post the first 9 defines of catalog/includes/configure.php here and I'll see if I can see the problem.

Link to comment
Share on other sites

Hi Robert,

Thanks for the quick reply. This is how my file looks.

 

// Define the webserver and path parameters

// * DIR_FS_* = Filesystem directories (local/physical)

// * DIR_WS_* = Webserver directories (virtual/URL)

define('HTTP_SERVER', 'http://www.mysite.com'); // eg, http://localhost - should not be empty for productive servers

define('HTTPS_SERVER', ''); // eg, https://localhost - should not be empty for productive servers

define('ENABLE_SSL', false); // secure webserver for checkout procedure?

define('HTTP_COOKIE_DOMAIN', 'www.mysite.com');

define('HTTPS_COOKIE_DOMAIN', '');

define('HTTP_COOKIE_PATH', '/catalog/');

define('HTTPS_COOKIE_PATH', '');

define('DIR_WS_HTTP_CATALOG', '/catalog/');

define('DIR_WS_HTTPS_CATALOG', '');

 

 

I looked into what you said Rhea. You may be right because I noticed that there is a filter id before the sort. I sell paintings where I have to put them in as oil paintings and then a sub category according to size. I changed the manufacturers to subject and use that to sort them according to whether the painting is abstract, landscape etc. I have added quite a few contributions. I'm not sure where to look to correct this.

 

Thanks again

Alexandra

Link to comment
Share on other sites

Not sure what you mean I looked at those pages and all meta seemed in place, I then used ..

 

http://www.allpawstogether.co.uk/charms-fo...rer&sort=2a

 

And sure enough there is ..

 

<meta name="ROBOTS" content="NOINDEX, FOLLOW">

 

Seems perfect to me.

 

 

checked google today and now have 25 duplicate meta descriptions

 

and 27 duplicate title tags

 

why is this still happening?????

Where am i going wrong???????

Link to comment
Share on other sites

checked google today and now have 25 duplicate meta descriptions

 

and 27 duplicate title tags

 

why is this still happening?????

Where am i going wrong???????

 

The code I put up PREVENTS duplicates it does not remove existing duplicates.

Link to comment
Share on other sites

Hi Robert,

Thanks for the quick reply. This is how my file looks.

 

// Define the webserver and path parameters

// * DIR_FS_* = Filesystem directories (local/physical)

// * DIR_WS_* = Webserver directories (virtual/URL)

define('HTTP_SERVER', 'http://www.mysite.com'); // eg, http://localhost - should not be empty for productive servers

define('HTTPS_SERVER', ''); // eg, https://localhost - should not be empty for productive servers

define('ENABLE_SSL', false); // secure webserver for checkout procedure?

define('HTTP_COOKIE_DOMAIN', 'www.mysite.com');

define('HTTPS_COOKIE_DOMAIN', '');

define('HTTP_COOKIE_PATH', '/catalog/');

define('HTTPS_COOKIE_PATH', '');

define('DIR_WS_HTTP_CATALOG', '/catalog/');

define('DIR_WS_HTTPS_CATALOG', '');

 

 

I looked into what you said Rhea. You may be right because I noticed that there is a filter id before the sort. I sell paintings where I have to put them in as oil paintings and then a sub category according to size. I changed the manufacturers to subject and use that to sort them according to whether the painting is abstract, landscape etc. I have added quite a few contributions. I'm not sure where to look to correct this.

 

Thanks again

Alexandra

 

Playing with the session code of osc is a very bad idea, used correctly it is perfectly good "out of the box". All of the "remove osCsid" contributions are a further danger, they are not needed and create complexity when trying to debug such issues.

 

The osCsid should dissappear after one or two clicks if the site is working correctly and your configure.php is correct.

 

Known bots, you gave Yahoo as an example, will NOT generate sessions if you have the base osc code in place, prevent spider sessions is set to on, you have kept up to date with includes/spiders.txt (contribution exhaustively maintaned by stevel).

 

You should also set "recreate sessions" to true to prevent session riding.

Link to comment
Share on other sites

Playing with the session code of osc is a very bad idea, used correctly it is perfectly good "out of the box". All of the "remove osCsid" contributions are a further danger, they are not needed and create complexity when trying to debug such issues.

 

The osCsid should dissappear after one or two clicks if the site is working correctly and your configure.php is correct.

 

Known bots, you gave Yahoo as an example, will NOT generate sessions if you have the base osc code in place, prevent spider sessions is set to on, you have kept up to date with includes/spiders.txt (contribution exhaustively maintaned by stevel).

 

You should also set "recreate sessions" to true to prevent session riding.

Hi Robert,

Thanks again for your reply.

I haven't used any of the remove osCsid contributions.

I have your version of Seo urls and the validation which work well. When I enter my site the OsCid disappears after one or two clicks. When I check who's online I see for example a Yahoo ip and it has a session id appended to the url. Google has indexed some urls with the osCsid appended. It's been a few days that I set the prevent spider sessions to true. Is it possible that it takes a while to take effect? Here is an example of what I see on Googles webmaster tools: Is this normal?

 

Pages with duplicate meta descriptions Pages

products new description -

‎/catalog/acropolis-pauls-church-wedding-invitations-p-221.html‎‎/catalog/poppies-triptych-painting-p-489.html‎‎/catalog/trees-modern-painting-p-790.html?language=en&osCsid=d6944a991cbd4cb2cfe1acf54b6a72b8‎ 3

Golden Brown Abstract Modern Oil Painting A - Oil Paintings - Big selection of oil paintings. Modern

‎/catalog/golden-brown-abstract-modern-painting-p-789.html‎‎/catalog/golden-brown-abstract-modern-painting-p-789.html?osCsid=df512c83a62204a56016f6d63ac95687‎ 2

Musical Children Modern Oil Painting - Oil Paintings - Big selection of oil paintings. Modern, abstr

‎/catalog/musical-children-modern-painting-p-787.html‎‎/catalog/musical-children-modern-painting-p-787.html?osCsid=e4b56bb3cabb05038572fd2d1ad2464d‎ 2

Big selection of Greek scene oil paintings, watercolours, prints and ceramics. Beautiful paintings o

‎/‎‎

/catalog/‎‎

/catalog/abstract-modern-m-10.html‎‎/catalog/abstract-modern-m-10.html?filter_id=21&sort=2a&osCsid=044b7bd12c5954b0cf898bd64baf7a6c‎‎/catalog/abstract-modern-m-10.html?filter_id=36&sort=2a‎‎/catalog/abstract-modern-m-10.html?filter_id=40&sort=2a&osCsid=a9c23cd1758066c1465cbacb9ff25a78‎‎/catalog/abstract-modern-m-10.html?filter_id=42&sort=2a&osCsid=ea42417452e05a60de801073b3e2ee4a‎‎/catalog/abstract-modern-m-10.html?filter_id=43&sort=2a&osCsid=390b97a31d1adca6af66a520d5f34133‎‎/catalog/abstract-modern-m-10.html?filter_id=46&sort=2a&osCsid=2a2f187b8cb2bad937faa290df8a5546‎‎/catalog/abstract-modern-m-10.html?language=en&sort=2a&osCsid=51ea591fc2820e7b0a00e046df607d13‎‎/catalog/abstract-modern-m-10.html?language=en&sort=3a&page=1&filter_id=42&osCsid=352a3ecfa233dbd385d0b236c7bd3905‎‎/catalog/abstract-modern-m-10.html?language=gr&page=2&filter_id=40&sort=2d&osCsid=df512c83a62204a56016f6d63ac95687‎‎/catalog/abstract-modern-m-10.html?language=gr&sort=2a&filter_id=37&osCsid=bdaed0b0070c96be821c2d5f020a089d‎‎/catalog/abstract-modern-m-10.html?language=gr&sort=2a&filter_id=40&page=3&osCsid=88c9995bf7a62fddc82f0aca67c5cd77‎‎/catalog/abstract-modern-m-10.html?language=gr&sort=2a&osCsid=82f64786befc6f0d7b59447f1575924d‎‎/catalog/abstract-modern-m-10.html?osCsid=352a3ecfa233dbd385d0b236c7bd3905&sort=2a¤cy=USD‎‎/catalog/abstract-modern-m-10.html?osCsid=5f36f9e3e5b18c1f650e327912d746d4&language=gr&sort=2d&page=1&filter_id=47‎‎/catalog/abstract-modern-m-10.html?osCsid=88c9995bf7a62fddc82f0aca67c5cd77&language=gr&sort=3d&page=1&filter_id=47‎‎/catalog/abstract-modern-m-10.html?osCsid=b21808d478786f51f12b207f91992f55&language=gr&sort=4d&page=1&filter_id=46‎‎/catalog/abstract-modern-m-10.html?page=2&sort=2a&osCsid=88c9995bf7a62fddc82f0aca67c5cd77‎‎/catalog/abstract-modern-m-10.html?page=4&sort=2a&osCsid=044b7bd12c5954b0cf898bd64baf7a6c‎‎/catalog/abstract-modern-m-10.html?page=5&sort=2a&osCsid=8a82ffa31ee4b65006e55052cfdb1db4‎‎/catalog/abstract-modern-m-10.html?sort=2a&page=1&filter_id=41&language=en&osCsid=88c9995bf7a62fddc82f0aca67c5cd77‎‎/catalog/abstract-modern-m-10.html?sort=2d&page=1&osCsid=82f64786befc6f0d7b59447f1575924d‎‎/catalog/abstract-modern-m-10.html?sort=3d&page=1&filter_id=43&language=en&osCsid=df512c83a62204a56016f6d63ac95687‎‎/catalog/abstract-modern-m-10.html?sort=4a&page=1&filter_id=36&language=gr&osCsid=390b97a31d1adca6af66a520d5f34133‎‎/catalog/abstract-modern-m-10.html?sort=4a&page=1&osCsid=51ea591fc2820e7b0a00e046df607d13‎‎/catalog/abstract-modern-m-10.html?sort=4d&page=1&filter_id=47&language=gr&osCsid=5f36f9e3e5b18c1f650e327912d746d4‎‎/catalog/animals-m-11.html‎‎/catalog/animals-m-11.html?filter_id=38&sort=2a&osCsid=590e8933386b60e31f75cb8d082caf2d‎

 

 

I will try the recreate sessions. I had read forums but was unsure whether to touch that.

 

Thanks again,

Alexandra

Link to comment
Share on other sites

The osCsid should NEVER be indexed by the known and valid bots.

Link to comment
Share on other sites

The osCsid should NEVER be indexed by the known and valid bots.

Hi,

I know and that's why I am concerned.

I updated the spiders file as you suggested and as I posted before I put in the codes in the 3 files (index, includes/application_top and includes/functions/general files).

I set prevent spider sessions and recreate sessions to true.

I also added Chemo's redirect code in includes/application_top file.

I guess I have to wait and see if the situation gets better. If you have anymore suggestions I'm all ears!

 

Thanks again for your help. You're alwasy very helpful and I appreciate it.

Alexandra

Link to comment
Share on other sites

It will always take some time for the search engines to update their data centers, so you will continue to see problems for quite a while, you can manually remove listings in google webmaster tools, but take care, you can't go back.

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

It will always take some time for the search engines to update their data centers, so you will continue to see problems for quite a while, you can manually remove listings in google webmaster tools, but take care, you can't go back.

Hi,

Thanks for the tip. As I looked into that I noticed that Google says to put the robots.txt in the root of the site. This was a big mistake on my part. I had it in my catalog directory. I will be changing hosts soon and then do away with the intro page. In the mean time I put the robots file in the root. Should I also add Robert's code in the <head> of the root's index file as well as in the <head> part of catalog/index. I noticed that Google is ignoring these codes and in the admin panel/ who's online the url showed Google is still indexing currencies and sort pages. I now have 496 pages with duplicate title tags and 223 pages with duplicate meta descriptions.

 

Thanks again,

Alexandra

Link to comment
Share on other sites

After doing fixes wait a month or so to see changes in listings.

Sam

 

Remember, What you think I ment may not be what I thought I ment when I said it.

 

Contributions:

 

Auto Backup your Database, Easy way

 

Multi Images with Fancy Pop-ups, Easy way

 

Products in columns with multi buy etc etc

 

Disable any Category or Product, Easy way

 

Secure & Improve your account pages et al.

Link to comment
Share on other sites

I noticed that Google is ignoring these codes and in the admin panel/ who's online the url showed Google is still indexing currencies and sort pages.

 

Thanks again,

Alexandra

 

That is not the case Alexandra my code does the following: -

 

NOINDEX - Read - google don't index this page please.

 

FOLLOW - Read - Even though we don't want you to index it Google we would still like you to follow the link.

 

Also bear in mind that the page still has to be accessed for google to read the meta tags so it will still show in who's online.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...