Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Preventing duplicate content by customizing htaccess


WebDev22

Recommended Posts

Posted

I'm trying to customize the htaccess file so we won't have issues with duplicate content. Currently, there are two different URLs pointing to the same category or product, depending on how you navigate to a category or product page. When I add the code below to the htaccess file, I get this error:

 

Internal Server Error

 

The server encountered an internal error or misconfiguration and was unable to complete your request.

 

Please contact the server administrator, [email protected] and inform them of the time the error occurred, and anything you might have done that may have caused the error.

 

More information about this error may be available in the server error log.

 

Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.

 

Here's the code. Everything beneath "########## start block" is what I've added:

########## Begin - Rewrite rules to block out some common exploits
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} (\<|%3C).*iframe.*(\>|%3E) [NC,OR]
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
RewriteRule ^(.*)$ index.php [F,L]
########## start block 
RewriteEngine on
RewriteBase /
RewriteCond %{QUERY_STRING} cPath=[0-9_] &products_id=([0-9_] ) [NC,OR]
RewriteCond %{QUERY_STRING} manufacturers_id=[0-9_] &products_id=([0-9_] ) [NC]
RewriteRule (.*) product_info.php?products_id=%1 [R=301,L]

Posted

The "RewriteEngine on" statement has to come before any of the others. And it wouldn't hurt to move the "RewriteBase /" statement as well.

Check out Chad's News.

Posted

The "RewriteEngine on" statement has to come before any of the others. And it wouldn't hurt to move the "RewriteBase /" statement as well.

 

Chad, not to hijack a thread here, but I have made several .htaccess mods and now have multiple instances of RewriteEngine On. After the first one are the others redundant?

 

Examples

 

# Begin Ultimate SEO V2.2d

Options +FollowSymLinks

RewriteEngine On

RewriteBase /

(then the code for SEO)

 

# XXS Security Add On

Options +FollowSymLinks

RewriteEngine On

(then more code for this add on)

 

So in other words since I have RewriteEngine On at the beginning of the file, do I really need the others?

 

Thanks

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

So in other words since I have RewriteEngine On at the beginning of the file, do I really need the others?

 

You only need the first one.

 

Here's a link to the mod_rewrite documentation for Apache 2.0.

Check out Chad's News.

Posted

You only need the first one.

 

Here's a link to the mod_rewrite documentation for Apache 2.0.

 

Chad, I did some reading on this and apparently I also have unnecessary multiple instances of Options +FollowSymLinks.

 

I have changed my .htaccess file and now have:

Options +FollowSymLinks

RewriteEngine On

at the start of my .htaccess file.

 

But I am not sure how to use RewriteBase /

 

One reference I read stated that RewriteBase / is to immediately follow RewriteEngine On

 

However I have various mods to my .htaccess file either run this command:

 

RewriteCond

or this

RewriteRule

 

serving various needs including Redirect index.php to domain.comm, Adds WWW to domain.com, automating a google base upload and so on.

 

So my question is, once

Options +FollowSymLinks

RewriteEngine On

is set at the beginning, where must

RewriteBase /

be located so that the various snippets of code work correctly?

 

Thanks

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

I'm not an expert on RewriteBase either.

 

I think the use of the "RewriteBase /" in your original .htaccess code is to ensure that the rewritten URL is always in the website's root directory.

Check out Chad's News.

Posted

I think the use of the "RewriteBase /" in your original .htaccess code is to ensure that the rewritten URL is always in the website's root directory.

 

I redid my htaccess on a couple stores using RewriteBase / before any code that calls for RewriteBase / to to preceed. I tested the stores, they loaded adn there were no other bugs that I found. Thanks for the tips, much appreciated.

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

No problem on the hijacked thread. Circling back to the original topic, I've discovered that there are actually three URLs pointing to the same content. I'm not sure if we're allowed to post specific links on these forums so here's the end of each URL:

 

This first one is from clicking the item on the home page:

/product_info.php?products_id=287

 

The second one is when you navigate to it using the categories in the left margin:

/product_info.php?cPath=25_53&products_id=287

 

The last one is when you navigate to it from the manufacturers drop-down:

/product_info.php?manufacturers_id=18&products_id=287

Posted

No problem on the hijacked thread. Circling back to the original topic, I've discovered that there are actually three URLs pointing to the same content. I'm not sure if we're allowed to post specific links on these forums so here's the end of each URL:

 

This first one is from clicking the item on the home page:

/product_info.php?products_id=287

 

The second one is when you navigate to it using the categories in the left margin:

/product_info.php?cPath=25_53&products_id=287

 

The last one is when you navigate to it from the manufacturers drop-down:

/product_info.php?manufacturers_id=18&products_id=287

 

Brett, thanks for the understanding...but to your issue, the SEO Url contributions will deal with that, if I understand your situation correctly.

There are at least a couple out there. I use this one.

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

Brett, thanks for the understanding...but to your issue, the SEO Url contributions will deal with that, if I understand your situation correctly.

There are at least a couple out there. I use this one.

I had originally thought that the Ultimate SEO URLs add-on would take care of the duplicate content issue until I received a recommendation to also locate and install a contribution that deals with duplicate content as well: http://www.oscommerce.com/forums/topic/154166-contribution-ultimate-seo-urls-v21-by-chemo/page__hl__duplicate__st__5480.

Posted

I had originally thought that the Ultimate SEO URLs add-on would take care of the duplicate content issue until I received a recommendation to also locate and install a contribution that deals with duplicate content as well: http://www.oscommerce.com/forums/topic/154166-contribution-ultimate-seo-urls-v21-by-chemo/page__hl__duplicate__st__5480.

 

I think I found the post you are referring to. I do have Header Tags SEO and that does allow you to tackle a lot of the duplicate content issues. I also use Sam's remove & duplicate content add on located here

 

Those have helped but I still do get duplicate issues viewable through Google Analytics; I try to tackle those as they come up.

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

A few weeks ago we were dealing with security issues and now there's this. What am I going to discover next week that will need to be dealt with? It's like death by a thousand cuts.

Posted

A few weeks ago we were dealing with security issues and now there's this. What am I going to discover next week that will need to be dealt with? It's like death by a thousand cuts.

 

I feel your pain. Aside from the endless masochism of do it yourself coding upgrades, there is the satisfaction of getting the "it works!" moment after successully installing the addon.

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

What is the ultimate purpose of all this? Is it to stop search engines from indexing duplicate content? The better solution would be to use robots.txt to tell search engines to not index certain paths, such as cPath= and manufacturers_id=.

Posted

What is the ultimate purpose of all this? Is it to stop search engines from indexing duplicate content? The better solution would be to use robots.txt to tell search engines to not index certain paths, such as cPath= and manufacturers_id=.

 

I believe that is what Brett's original issue was. I use Spook's add on the exclude those parameters but I am interested in the syntax for the same purposes in robots.txt. My robot.txt disallows directories or specific php files e.g.:

 

Disallow: /includes

Disallow: /account.php

 

I am interested in how robots.txt would deal with cPath=,etc. Thanks

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Posted

For our next step, it sounds like it would make sense to go ahead and install the Ultimate SEO URLs add-on.

Posted

I am interested in how robots.txt would deal with cPath=,etc.

I think it would be something like

Disallow: /product_info.php?cPath=*
Disallow: /product_info.php?manufacturers_id=*

Posted

I think it would be something like

Disallow: /product_info.php?cPath=*
Disallow: /product_info.php?manufacturers_id=*

I just installed Ultimate SEO URLs. Do you think this is still needed? I'm concerned about duplicate content with the new URLs containing the same content as the old URLs, which were indexed prior to installing the add-on.

Posted

Sorry, I don't know how that add-on is going to interact with other things. The key is whatever URLs a search engine can see, it will try to explore pages on. So whether it's a URL Query String or a chain of fake directories that get you to a certain page, you need to tell search engines to ignore all but one. If you are using an SEO add-on, it might be best to ask in a discussion specific to that code.

 

If you're concerned about existing indexed pages causing duplicates, won't they go away all by themselves (be purged from the index) when the spider visits your site using the new SEO URLs?

Posted

 

If you're concerned about existing indexed pages causing duplicates, won't they go away all by themselves (be purged from the index) when the spider visits your site using the new SEO URLs?

 

Not necessarily.

 

I am using several add ons to deal with duplicate meta descriptions and title tags. Maybe that really isn't all that big a deal, but I am doing it anyway.

 

The combination of Jack's Ultimage SEO URls and Spook's Remove Duplicate content for example let's me deal with the page=, sort=, etc parameters that can lead to duplicates.

 

It's a lot of work for sure.

I am not a professional webmaster or PHP coder by background or training but I will try to help as best I can.

I remember what it was like when I first started with osC. It can be overwhelming.

However, I strongly recommend considering hiring a professional for extensive site modifications, site cleaning, etc.

There are several good pros here on osCommerce. Look around, you'll figure out who they are.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...