Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Google Duplicate Content Manager version 1.0a


FWR Media

Recommended Posts

I would say it is recommended to have some kind of sitemap installed with this contribution so crawlers can index all of your main categories and product pages.

 

In my case my URLs have some additional information for tool tip functions so actually it will be impossible for regular crawler to find normal page.

All my products url’s look like this:

 

18x13-mm-oval-golden-citrine-aaa-grade-p-26399.html?prodID=26399

 

where prodID=26399 is for tool tip and recently I found Google indexed over 3000 duplicated pages because of my tool tip so I hope this contrib will be very helpful.

 

Thanks a lot for your work! I made a lot of changes to my own store and I know how much effort it takes to create a contribution that will work for everybody.

 

I think the concern will be duplicated content and not in only my case.

 

For example page:

My-categoty-c-21.html

Will have the same content as:

My-categoty-c-21.html?page=1

 

Solution for that will be more difficult because no one know (to the best of my knowledge) how Google finds duplicated content.

If adding one paragraph of text would make new content we could add text like “You are on page_xx of category_xxx and searching by_xxx, let us know how we can help…..”

 

Or add some random text that will change every time you will reload page.

Does anyone have any ideas or different thoughts about this?

Edited by marcinmf
Link to comment
Share on other sites

  • Replies 75
  • Created
  • Last Reply

Top Posters In This Topic

Here's some food for thought quoted form Google on robots.txt

 

"To block access to all URLs that include a question mark (?), you could use the following entry:"

 

User-agent: *
Disallow: /*?

 

Unfortunately, I only recently found this as I had a lot of duplicates too but since I've added that in they are dropping daily.

I hope I don't offend anyone by posting this here but it might be an easy answer for some.

John Wisdom

Link to comment
Share on other sites

Here's some food for thought quoted form Google on robots.txt

 

"To block access to all URLs that include a question mark (?), you could use the following entry:"

 

User-agent: *
Disallow: /*?

 

Unfortunately, I only recently found this as I had a lot of duplicates too but since I've added that in they are dropping daily.

I hope I don't offend anyone by posting this here but it might be an easy answer for some.

 

You would block all dynamic urls basically not a good idea at all.

Link to comment
Share on other sites

one quick question. will this work ok with STS and header tag controller?

 

not got a clue about STS never used it (and never will) but it would just be a case of finding out what prints out the meta.

 

Re: Header tags and header tags SEO I have included the relevant code in the instructions but it is untested.

Link to comment
Share on other sites

not got a clue about STS never used it (and never will) but it would just be a case of finding out what prints out the meta.

 

Re: Header tags and header tags SEO I have included the relevant code in the instructions but it is untested.

 

I use template system and there are no problems.

It would depend on the version and contribution but most likely you will do modification in: /templates/templatename/main_page.tpl.php

Link to comment
Share on other sites

Has anyone seen any success with this contribution? I'm wondering if this will help with page ranking and thus help avoid being sandboxed by Google. I know when you're in the sandbox your organic results visitors will suffer. This sounds like a great solution to SEO headaches. What about performance in organic search results? Any improvements? Thanks for your comments.

Link to comment
Share on other sites

Hi. I installed this contrib, it looks great, it works I think - the number of duplicate pages in google webmasters tools actually dropped a bit, but... the ite loads much slower. Look here: SITE. What could be wrong? Same thing happened a while ago when I added some code .htaccess and it got back to normal after removing it. I have no clue what could be the problem now, but surely it started immediatly after installing Google Duplicate...

 

Thank you.

Link to comment
Share on other sites

Hi. I installed this contrib, it looks great, it works I think - the number of duplicate pages in google webmasters tools actually dropped a bit, but... the ite loads much slower. Look here: SITE. What could be wrong? Same thing happened a while ago when I added some code .htaccess and it got back to normal after removing it. I have no clue what could be the problem now, but surely it started immediatly after installing Google Duplicate...

 

Thank you.

 

This is not going to effect your site loading speed. Turn it off in the file includes/classes/preventDuplicates.php

var $turnServiceOn = true;

 

Set to

 

var $turnServiceOn = false;

 

If your problems remain (and they will) then it is not this contribution causing your issues..

Link to comment
Share on other sites

Thank you for the fast reply. Indeed, turning it off doesn't help. I'll try unistalling and see what happens, just to be sure, but probably you're right.

 

And just wondering if you may have an idea... where can I ask for help to fix this loading problem? Just to find out what's the problem at least. Thanks.

Edited by roxanacaz
Link to comment
Share on other sites

First, I would like to say great work! I know this will help with our duplicate content issues.

 

I have two requests. I will wait on an update.

 

(1) Whenever you have time, could you adjust the code so the page numbers could appear after the title?

 

ALSO

 

(2) Take a look at the All Products contribution, the duplicate content issue is partially solved there.

 

What I mean is there are A through Z links that are at the top of allprods.php. All products group products alphabetically.

 

Here is the code for allprods.php:

<?php 
/* 
$Id: allprods.php,v 4.4 2006/09/18 20:28:47 Mgx Co. Exp $


All Products v4.3 MS 2.2 with Images http://www.oscommerce.com/community/contributions,1501

osCommerce, Open Source E-Commerce Solutions
http://www.oscommerce.com

Copyright (c) 2004 osCommerce

Released under the GNU General Public License

*/ 

require('includes/application_top.php'); 
include(DIR_WS_LANGUAGES . $language . '/' . FILENAME_ALLPRODS); 

$breadcrumb->add(HEADING_TITLE, tep_href_link(FILENAME_ALLPRODS, '', 'NONSSL')); 

$firstletter=$HTTP_GET_VARS['fl'];
if (!$HTTP_GET_VARS['page']){
 $where="where pd.products_name like '$firstletter%' AND p.products_status='1' ";
}else {
 $where="where pd.products_name like '$firstletter%' AND p.products_status='1' ";
} 


?> 
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"> 
<html <?php echo HTML_PARAMS; ?>> 
<head>
<?php
  if (!isset($lng) || (isset($lng) && !is_object($lng))) {
	include(DIR_WS_CLASSES . 'language.php');
	$lng = new language;
  }

  reset($lng->catalog_languages);
  while (list($key, $value) = each($lng->catalog_languages)) {
?>
<link rel="alternate" type="application/rss+xml" title="<?php echo STORE_NAME . ' - ' . BOX_INFORMATION_RSS; ?>" href="<?php echo FILENAME_RSS, '?language=' . $key; ?>">
<?php
  }
?> 
<?php
// BOF: WebMakers.com Changed: Header Tag Controller v2.55 
// Replaced by header_tags.php 
if ( file_exists(DIR_WS_INCLUDES . 'header_tags.php') ) {
ob_start(); 
 require(DIR_WS_INCLUDES . 'header_tags.php');
 $preventDuplicates->checkTarget(ob_get_clean());
echo $preventDuplicates->finalMeta . "\n"; 
} else { 
?> 
 <title><?php echo TITLE ?></title> 
<?php 
} 
// EOF: WebMakers.com Changed: Header Tag Controller v1.0
?> 

<base href="<?php echo (getenv('HTTPS') == 'on' ? HTTPS_SERVER : HTTP_SERVER) . DIR_WS_CATALOG; ?>"> 
<link rel="stylesheet" type="text/css" href="stylesheet.css">
</head> 
<body marginwidth="0" marginheight="0" topmargin="0" bottommargin="0" leftmargin="0" rightmargin="0"> 
<!-- header //--> 
<?php require(DIR_WS_INCLUDES . 'header.php'); ?> 

<!-- header_eof //--> 

<!-- body //--> 
<table border="0" width="100%" cellspacing="3" cellpadding="3"> 
<tr> 
  <td width="<?php echo BOX_WIDTH; ?>" valign="top"><table border="0" width="<?php echo BOX_WIDTH; ?>" cellspacing="0" cellpadding="2"> 
<td class="col_left">
<!-- left_navigation //-->
<?php require(DIR_WS_INCLUDES . 'column_left.php'); ?>
<!-- left_navigation_eof //-->
</td> 
  </table></td> 
<!-- body_text //--> 
  <td width="100%" valign="top"><table border="0" width="100%" cellspacing="0" cellpadding="0">
 <tr>
   <td><table border="0" width="100%" cellspacing="0" cellpadding="0">
	 <tr>
	  <?php if ( file_exists(DIR_WS_INCLUDES . 'header_tags.php') ) {?> 
	   <td><h1><?php echo HEADING_TITLE; ?></h1></td>
	  <?php } else { ?>		   
	   <td class="pageHeading"><?php echo HEADING_TITLE; ?></td>
	  <?php } ?>
	   <td class="pageHeading" align="right"><?php echo tep_image(DIR_WS_IMAGES . 'table_background_products_new.gif', HEADING_TITLE, HEADING_IMAGE_WIDTH, HEADING_IMAGE_HEIGHT); ?></td>
	 </tr>
	 <tr>
	   <td class="main"><?php echo HEADING_SUB_TEXT; ?></td>
	 </tr>		   
	 <tr>
	  <td><?php echo tep_draw_separator('pixel_trans.gif', '100%', '10'); ?></td>
	 </tr>
   </table></td>
 </tr>
 <tr>
   <td align="center" class="smallText"><?php $firstletter_nav=
	'<a href="' . tep_href_link("allprods.php",  'fl=A', 'NONSSL') . '"> A |</A>' . 
	'<a href="' . tep_href_link("allprods.php",  'fl=B', 'NONSSL') . '"> B |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=C', 'NONSSL') . '"> C |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=D', 'NONSSL') . '"> D |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=E', 'NONSSL') . '"> E |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=F', 'NONSSL') . '"> F |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=G', 'NONSSL') . '"> G |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=H', 'NONSSL') . '"> H |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=I', 'NONSSL') . '"> I |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=J', 'NONSSL') . '"> J |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=K', 'NONSSL') . '"> K |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=L', 'NONSSL') . '"> L |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=M', 'NONSSL') . '"> M |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=N', 'NONSSL') . '"> N |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=O', 'NONSSL') . '"> O |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=P', 'NONSSL') . '"> P |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=Q', 'NONSSL') . '"> Q |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=R', 'NONSSL') . '"> R |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=S', 'NONSSL') . '"> S |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=T', 'NONSSL') . '"> T |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=U', 'NONSSL') . '"> U |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=V', 'NONSSL') . '"> V |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=W', 'NONSSL') . '"> W |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=X', 'NONSSL') . '"> X |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=Y', 'NONSSL') . '"> Y |</A>' .
	'<a href="' . tep_href_link("allprods.php",  'fl=Z', 'NONSSL') . '"> Z</A>  '   .
	'<a href="' . tep_href_link("allprods.php",  '',	 'NONSSL') . '"> FULL</A>';

	echo $firstletter_nav; ?></td>
 </tr>
<tr>
<td><?php echo tep_draw_separator('pixel_trans.gif', '100%', '10'); ?></td>
</tr>
 <tr>
   <td>
<?php
// create column list
$define_list = array('PRODUCT_LIST_MODEL' => PRODUCT_LIST_MODEL,
				  'PRODUCT_LIST_NAME' => PRODUCT_LIST_NAME,
				  'PRODUCT_LIST_MANUFACTURER' => PRODUCT_LIST_MANUFACTURER, 
				  'PRODUCT_LIST_PRICE' => PRODUCT_LIST_PRICE,
				  'PRODUCT_LIST_QUANTITY' => PRODUCT_LIST_QUANTITY,
				  'PRODUCT_LIST_WEIGHT' => PRODUCT_LIST_WEIGHT,
				  'PRODUCT_LIST_IMAGE' => PRODUCT_LIST_IMAGE,
				  'PRODUCT_LIST_BUY_NOW' => PRODUCT_LIST_BUY_NOW);
asort($define_list);

$column_list = array();
reset($define_list);
while (list($column, $value) = each($define_list)) {
  if ($value) $column_list[] = $column; 
}

$select_column_list = '';

for ($col=0, $n=sizeof($column_list); $col<$n; $col++) {
  if ( ($column_list[$col] == 'PRODUCT_LIST_BUY_NOW') || ($column_list[$col] == 'PRODUCT_LIST_NAME') || ($column_list[$col] == 'PRODUCT_LIST_PRICE') ) {
 continue;
  }
}

// listing all products
$listing_sql = "select p.products_id, p.products_model, pd.products_name, pd.products_description, p.products_image, p.products_price, p.products_tax_class_id, IF(s.status, s.specials_new_products_price, NULL) as specials_new_products_price, p.products_date_added, m.manufacturers_name from " . TABLE_PRODUCTS . " p left join " . TABLE_MANUFACTURERS . " m on p.manufacturers_id = m.manufacturers_id left join " . TABLE_PRODUCTS_DESCRIPTION . " pd on p.products_id = pd.products_id and pd.language_id = '" . $languages_id . "' left join " . TABLE_SPECIALS . " s on p.products_id = s.products_id  $where order by pd.products_name";

if (ALL_PRODUCTS_DISPLAY_MODE == 'true')
include(DIR_WS_MODULES . 'product_listing.php'); //display in standard format
else
include(DIR_WS_MODULES . 'allprods.php');
?>
   </td>
 </tr>
  </table></td>
<!-- body_text_eof //-->
  <td width="<?php echo BOX_WIDTH; ?>" valign="top"><table border="0" width="<?php echo BOX_WIDTH; ?>" cellspacing="0" cellpadding="2">
<!-- right_navigation //-->
<?php require(DIR_WS_INCLUDES . 'column_right.php'); ?>
<!-- right_navigation_eof //-->
  </table></td>
</tr>
</table>
<!-- body_eof //-->

<!-- footer //-->
<?php require(DIR_WS_INCLUDES . 'footer.php'); ?>
<!-- footer_eof //-->
<br>
</body>
</html>
<?php require(DIR_WS_INCLUDES . 'application_bottom.php'); ?>

 

Thanks for reading.

Link to comment
Share on other sites

Any idea how to make work with easy meta tags contribution, do i need to add to the easy_meta_tag.php file or just the individual pages? thanks

 

In the individual pages you should just need to make the changes as per the instructions around the code that generates the meta, I've never even seen the code for easy meta tags however.

Link to comment
Share on other sites

Hi there

do i put the code only in the index file or do i put it in all my root files

thanks danta67

 

Depends on your site and its added contributions. Most sites are experiencing duplicates via index.php.

 

Are your duplicates index.php related?

Link to comment
Share on other sites

  • 1 month later...

Robert,

Thanks for this very useful and clearly written contribution.

 

For a site that already has duplicated content in Google's index, does anyone know if Google would eventually drop the duplicated pages from its index if I use this setting?

 

var $IhaveDuplicateContent = false;

 

(I understand the purpose of the 'false' setting is to prevent duplicate content from occurring in the first place).

 

For example, Google has these duplicates:

product_info.php?cPath=1&products_id=15‎

product_info.php?currency=EUR&products_id=15‎

product_info.php?currency=GBP&products_id=15‎

product_info.php?currency=NZD&products_id=15‎

 

By setting the flag to "false" Google would no longer crawl the last 3 versions of the page. So would Google eventually drop them from its index?

 

For me this would seem to be the ideal situation as then Google would not hold pages with duplicate content in its index at all. It would avoid having pages in the index with essentially identical content and with title tags which are less than ideal.

 

Kind regards

Peter

Link to comment
Share on other sites

I use template system and there are no problems.

It would depend on the version and contribution but most likely you will do modification in: /templates/templatename/main_page.tpl.php

 

I use STS version 2.01.... I have installed seo, SPPC, header tags controler....

When I install Google Duplicate Content Manager it dosen't make any difernce... it dosent work...

I have tryed adding code to sts files lihe I at google analytics and other contributions.... it dosent make any dieference..

 

Any idea?

Link to comment
Share on other sites

I use STS version 2.01.... I have installed seo, SPPC, header tags controler....

When I install Google Duplicate Content Manager it dosen't make any difernce... it dosent work...

I have tryed adding code to sts files lihe I at google analytics and other contributions.... it dosent make any dieference..

 

Any idea?

 

It just has to be added around the code that actually prints to screen your tags. If installed correctly I've yet to find a situation where "it doesn't work".

Link to comment
Share on other sites

Robert,

Thanks for this very useful and clearly written contribution.

 

For a site that already has duplicated content in Google's index, does anyone know if Google would eventually drop the duplicated pages from its index if I use this setting?

 

var $IhaveDuplicateContent = false;

 

(I understand the purpose of the 'false' setting is to prevent duplicate content from occurring in the first place).

 

For example, Google has these duplicates:

product_info.php?cPath=1&products_id=15‎

product_info.php?currency=EUR&products_id=15‎

product_info.php?currency=GBP&products_id=15‎

product_info.php?currency=NZD&products_id=15‎

 

By setting the flag to "false" Google would no longer crawl the last 3 versions of the page. So would Google eventually drop them from its index?

 

For me this would seem to be the ideal situation as then Google would not hold pages with duplicate content in its index at all. It would avoid having pages in the index with essentially identical content and with title tags which are less than ideal.

 

Kind regards

Peter

 

The jury is out on this one nubbin. I did a lot of research and testing but what G does and doesn't index, drop, keep in the index is something that needed testing.

 

The contribution had the two settings for good reasons but it is also possible .. as you suggest .. that false would have worked anyway. The reason for the true setting is that is "forces" G to see the pages as different which I viewed at the start as a pretty surefire way to remove the nasties.

 

As I haven't received any feedback I'm still at point one.

 

I suppose the good news for users is that nobody has returned to state that I have destroyed their rankings :D

Link to comment
Share on other sites

The jury is out on this one nubbin. I did a lot of research and testing but what G does and doesn't index, drop, keep in the index is something that needed testing.

 

The contribution had the two settings for good reasons but it is also possible .. as you suggest .. that false would have worked anyway. The reason for the true setting is that is "forces" G to see the pages as different which I viewed at the start as a pretty surefire way to remove the nasties.

 

As I haven't received any feedback I'm still at point one.

 

I suppose the good news for users is that nobody has returned to state that I have destroyed their rankings :D

 

Robert, thanks for your reply. I will do an experiment to try and find the answer to this. For my most important site, which has been thoroughly indexed and so has much duplicate content, I will use the True setting as that seems to be a realtively risk free method. I have 2 other OSC sites which are relatively new, have far fewer pages and thus don't have much ranking. For those sites I will set the flag to False and I'll keep an eye on Google's site statistics over the next few months and see if the duplicate content gets dropped from the index. I'll let you know if it works as I hope.

 

Cheers

Peter

Link to comment
Share on other sites

Robert, thanks for your reply. I will do an experiment to try and find the answer to this. For my most important site, which has been thoroughly indexed and so has much duplicate content, I will use the True setting as that seems to be a realtively risk free method. I have 2 other OSC sites which are relatively new, have far fewer pages and thus don't have much ranking. For those sites I will set the flag to False and I'll keep an eye on Google's site statistics over the next few months and see if the duplicate content gets dropped from the index. I'll let you know if it works as I hope.

 

Cheers

Peter

 

Thanks Peter that would be very much appreciated.

 

Rob

Link to comment
Share on other sites

Thanks Peter that would be very much appreciated.

 

Rob

 

Rob,

OK my experiment seems to be working and remarkably quickly. In short, setting this contribution's mode to false on a site that is already indexed achieves the desired effect - the duplicates are removed from Google's index AND the pages left in Google's index have the title tags I want (not tags modified by the contribution).

 

Yesterday I implemented this contribution on a site which Google had indexed. Google was reporting 77 duplicate title tags for the site. I set the contribution's mode to false. This adds a "NOINDEX,FOLLOW" tag to the targetted pages.

 

Overnight Google crawled the site.

 

Today, 17 duplicated pages have now been removed from Google list of duplicate title tags - Google is now reporting only 60 duplicate title tags.

 

This is in line with what I expected based on Google's help page here : http://www.google.com/support/webmasters/b...py?answer=93710

 

The major remaing problem I have is duplications like this:

/product_info.php?cPath=6&products_id=40

‎‎/product_info.php?products_id=40‎

 

If I was to add 'cpath' to Var $getValues, would that prevent these duplicates? Can you foresee any problems with that? I think it would work and am happy to try it if someone could give me a bit of reassurance I haven't overlooked a problem with doing this ! I am a bit wary of trying this for fear of accidentally screwing up my site's ranking!

 

Kind regards

Peter

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...