Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

Google reading rewitten url incorrect


bill110

Recommended Posts

I have my urls rewritten by .htaccess and a file seflt.php that works fine for customers browsing throughout the site. For some reason when google is spidering the site sometimes it adds amp;amp;amp;amp;amp;amp;amp; to the end of the url.

My urls are rewritten like this:

http:mysite.com/product5/product_info.html

 

What I am trying to find out is if google reads the rewrite info differently than browsers?

Any Ideas?

 

Here is the .htaccess rewrite rules if that helps and I can post the seflt.php if needed

 

DirectoryIndex index.php default.php

Options +FollowSymLinks

RewriteEngine on

RewriteBase /

RewriteRule ^([a-z]{2})/(.*)$ $2?language=$1&%{QUERY_STRING}

RewriteRule ^manufacturer([0-9{}]+_?[0-9{}]*)(/?.*)$ $2?manufacturers_id=$1&%{QUERY_STRING}

RewriteRule ^product([0-9{}]+_?[0-9{}]*)(/?.*)$ $2?products_id=$1&%{QUERY_STRING}

RewriteRule ^category([_0-9]+)/(.*)$ $2?cPath=$1&%{QUERY_STRING}

RewriteRule ^(.*)\.html(.*)$ $1.php?%{QUERY_STRING}

 

Thanks for any help

My Contributions

 

Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly

Password Protect Admin

"No matter where you go....There you are" - Buccaroo Bonsai

Link to comment
Share on other sites

is this from a contribution and if so which one?

 

The rewrite rules won't filter much, as the generated urls that hit the php code can be different based on how the server is configured, so the php code has plenty of processing to do. So without looking all the php processing is hard to tell.

Link to comment
Share on other sites

The contribution is sef link transformer. The support thread is dead. no help there. here is the php code

<?php
/*
SEF Link Transformer for osCommerce  (SEF stand for Search Engine Friendly)
Version: Lite 0.8.0 Alpha 
Author: Silencer ([email protected])
Release date: 26 November 2003
Legal notices: i don't care about all legal stuff, too lazy to attach GNU GPL licence, 
so forget it. But DO NOT remove my name and as always NO WARRANTIES.

Installation instructions: see readme.txt
Warning - do not use this on heavy loaded shops (more than 10000 visitors per day) 
if you not on dedicated server.

*/


function callback($pagecontent) {
 $pagecontent = preg_replace_callback("/(<[Aa][ \r\n\t]{1}[^>]*href[^=]*=[ '\"\n\r\t]*)([^ \"'>\r\n\t#]+)([^>]*>)/",'wrap_href',$pagecontent);
 return $pagecontent;

}

function transform_uri($param) {
$uriparts = parse_url($param[2]);
$newquery='';
$scheme = $uriparts['scheme'].'://';
if (($scheme != 'http://') && ($scheme != 'https://')) return $param[1].$param[2].$param[3];
$host = $uriparts['host'];
if ($host != $_SERVER['SERVER_NAME'] && $host != $_SERVER['SERVER_ADDR']) return $param[1].$param[2].$param[3];
$host .= '';
$path = $uriparts['path'];
list($file,$extension) = explode('.', basename($path));
if($extension != 'php') return $param[1].$param[2].$param[3];
$extension = ".html";
$path = rtrim(dirname($path),'');
$query = $uriparts['query'];
$anchor = $uriparts['anchor'];
if ($a = explode('&',$query)){
foreach ($a as $b) {
  list($key,$val) = split('=',$b);
  switch ($key) {
	case 'cPath':
			$path = '/' .'category'.$val.''.$path;
		break;
	case 'language':
		$path = $val.'/'.$path;
		break;
	case 'products_id':
		$path .= 'product'.$val.'/';
		break;
	case 'manufacturers_id':
		$path .= 'manufacturer'.$val.'/';
		break;
	case 'osCsid':
		if(strstr($_SERVER["HTTP_USER_AGENT"],'Mozilla'))  $newquery .= $key.'='.$val.'&';
		break;
	default:
		if($newquery || $key) $newquery .= $key.'='.$val.'&';	  
  }
}
}
if ($newquery) $newquery = '?'.rtrim($newquery,'&');
return $param[1].$scheme.$host.$path.$file.$extension.$newquery.$anchor.$param[3];

}
function wrap_href($param) {
return transform_uri($param);
}



ob_start("callback");

?>

 

I could not get ultimate seo urls to work ( I think due to other contributions and coding) so I tried this one. Seemed to work fine then noticed problem with google.

My Contributions

 

Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly

Password Protect Admin

"No matter where you go....There you are" - Buccaroo Bonsai

Link to comment
Share on other sites

ok for decoding, there isn't any processing in that file so this module relies solely on the htaccess rewriterules.

 

The sefit.php only builds the seo urls doing the encoding. (Haven't tested that preg_replace_callback rule to see if it encodes all the urls properly that's one thing to verify that regular expression).

 

Kinda hard to find from the spider alone, but have you checked if there are trailing ampersands with the regular links? Because the rules don't filter anything like this. (see arg2 here)

RewriteRule ^([a-z]{2})/(.*)$ $2?language=$1&%{QUERY_STRING}

this filters arg1 allright but arg2 is not filtered.

 

also this rule

RewriteRule ^(.*)\.html(.*)$ $1.php?%{QUERY_STRING}

will translate every html to php so real html files won't work.

 

anyways that's my $.02

Link to comment
Share on other sites

ok for decoding, there isn't any processing in that file so this module relies solely on the htaccess rewriterules.

 

The sefit.php only builds the seo urls doing the encoding. (Haven't tested that preg_replace_callback rule to see if it encodes all the urls properly that's one thing to verify that regular expression).

 

Kinda hard to find from the spider alone, but have you checked if there are trailing ampersands with the regular links? Because the rules don't filter anything like this. (see arg2 here)

RewriteRule ^([a-z]{2})/(.*)$ $2?language=$1&%{QUERY_STRING}

this filters arg1 allright but arg2 is not filtered.

 

also this rule

RewriteRule ^(.*)\.html(.*)$ $1.php?%{QUERY_STRING}

will translate every html to php so real html files won't work.

 

anyways that's my $.02

Thanks for the reply.I will look further for the ampersands. Do you know of any reason the spider would add so many ampersands?

My Contributions

 

Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly

Password Protect Admin

"No matter where you go....There you are" - Buccaroo Bonsai

Link to comment
Share on other sites

This module you have, patches the url at the end of the page generation. So if it filters characters in between arguments it can cause it.

 

The next time the osc tep_href_link function executes won't filter ampersands but '&' So the stock code won't work in that case.

 

	while ( (substr($link, -1) == '&') || (substr($link, -1) == '?') ) $link = substr($link, 0, -1);

 

as it searches for '&' not "& amp" (without space). Maybe if you add another filter there for ampersands could reduce this side effect.

 

Otherwise the invalid parameter will go through propagating the next time. That's my opinion anyways.

 

Forgot to mention earlier to check another seo module, is more recent but it's still at its infancy, needs some manual execution to generate the names for the different entities, so the store owner has to do few things.

http://www.oscommerce.com/community/contributions,5080

Link to comment
Share on other sites

forgot to ask do you have the stock tep_href_link function or you have mods there? normally the amperand encoding wont even reach the _GET array is translated way before to '&'.

Link to comment
Share on other sites

forgot to ask do you have the stock tep_href_link function or you have mods there? normally the amperand encoding wont even reach the _GET array is translated way before to '&'.

Sorry for late reply. 4 kids all needing to go different directions.

The tep_href_link function is stock.

I think I found it in the functions/general.php in the tep_get_all_get_params

stock

$get_url .= $key . '=' . rawurlencode(stripslashes($value)) . '&';

mine

$get_url .= $key . '=' . rawurlencode(stripslashes($value)) . '&amp';

not sure which contribution changed that.

however now I am having problems with updating or removing cart quantities.

If I remove a product the cart is actually updated but the cart still shows the product. go to a new page then cart shows empty.

My Contributions

 

Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly

Password Protect Admin

"No matter where you go....There you are" - Buccaroo Bonsai

Link to comment
Share on other sites

yes, I see, that's not very good because the tep_href_link tries to trim this

'&'

 

So if the parameters have '& amp;' (as your tep_get_all_get_params does the encoding there) then this filtering for instance won't work as I mentioned earlier.

 

while ( (substr($link, -1) == '&') || (substr($link, -1) == '?') ) $link = substr($link, 0, -1);

 

and if you do the opposite, encoding at the end of the tep_href_link function then you're risking those payment gateways forms where the '&amp' will be appended and the return links won't work.

Link to comment
Share on other sites

Well I guess I'm gonna scrap it. No page rank yet at google so no real harm there i guess. The have 169 pages indexed so I'll either use another contribution and try a rewrite (perm. moved) to the new or just resubmit a site map and have them remove the page not found links as they occur.

My Contributions

 

Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly

Password Protect Admin

"No matter where you go....There you are" - Buccaroo Bonsai

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...