bill110 Posted May 8, 2007 Share Posted May 8, 2007 I have my urls rewritten by .htaccess and a file seflt.php that works fine for customers browsing throughout the site. For some reason when google is spidering the site sometimes it adds amp;amp;amp;amp;amp;amp;amp; to the end of the url. My urls are rewritten like this: http:mysite.com/product5/product_info.html What I am trying to find out is if google reads the rewrite info differently than browsers? Any Ideas? Here is the .htaccess rewrite rules if that helps and I can post the seflt.php if needed DirectoryIndex index.php default.php Options +FollowSymLinks RewriteEngine on RewriteBase / RewriteRule ^([a-z]{2})/(.*)$ $2?language=$1&%{QUERY_STRING} RewriteRule ^manufacturer([0-9{}]+_?[0-9{}]*)(/?.*)$ $2?manufacturers_id=$1&%{QUERY_STRING} RewriteRule ^product([0-9{}]+_?[0-9{}]*)(/?.*)$ $2?products_id=$1&%{QUERY_STRING} RewriteRule ^category([_0-9]+)/(.*)$ $2?cPath=$1&%{QUERY_STRING} RewriteRule ^(.*)\.html(.*)$ $1.php?%{QUERY_STRING} Thanks for any help My Contributions Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly Password Protect Admin "No matter where you go....There you are" - Buccaroo Bonsai Link to comment Share on other sites More sharing options...
Guest Posted May 8, 2007 Share Posted May 8, 2007 is this from a contribution and if so which one? The rewrite rules won't filter much, as the generated urls that hit the php code can be different based on how the server is configured, so the php code has plenty of processing to do. So without looking all the php processing is hard to tell. Link to comment Share on other sites More sharing options...
bill110 Posted May 8, 2007 Author Share Posted May 8, 2007 The contribution is sef link transformer. The support thread is dead. no help there. here is the php code <?php /* SEF Link Transformer for osCommerce (SEF stand for Search Engine Friendly) Version: Lite 0.8.0 Alpha Author: Silencer ([email protected]) Release date: 26 November 2003 Legal notices: i don't care about all legal stuff, too lazy to attach GNU GPL licence, so forget it. But DO NOT remove my name and as always NO WARRANTIES. Installation instructions: see readme.txt Warning - do not use this on heavy loaded shops (more than 10000 visitors per day) if you not on dedicated server. */ function callback($pagecontent) { $pagecontent = preg_replace_callback("/(<[Aa][ \r\n\t]{1}[^>]*href[^=]*=[ '\"\n\r\t]*)([^ \"'>\r\n\t#]+)([^>]*>)/",'wrap_href',$pagecontent); return $pagecontent; } function transform_uri($param) { $uriparts = parse_url($param[2]); $newquery=''; $scheme = $uriparts['scheme'].'://'; if (($scheme != 'http://') && ($scheme != 'https://')) return $param[1].$param[2].$param[3]; $host = $uriparts['host']; if ($host != $_SERVER['SERVER_NAME'] && $host != $_SERVER['SERVER_ADDR']) return $param[1].$param[2].$param[3]; $host .= ''; $path = $uriparts['path']; list($file,$extension) = explode('.', basename($path)); if($extension != 'php') return $param[1].$param[2].$param[3]; $extension = ".html"; $path = rtrim(dirname($path),''); $query = $uriparts['query']; $anchor = $uriparts['anchor']; if ($a = explode('&',$query)){ foreach ($a as $b) { list($key,$val) = split('=',$b); switch ($key) { case 'cPath': $path = '/' .'category'.$val.''.$path; break; case 'language': $path = $val.'/'.$path; break; case 'products_id': $path .= 'product'.$val.'/'; break; case 'manufacturers_id': $path .= 'manufacturer'.$val.'/'; break; case 'osCsid': if(strstr($_SERVER["HTTP_USER_AGENT"],'Mozilla')) $newquery .= $key.'='.$val.'&'; break; default: if($newquery || $key) $newquery .= $key.'='.$val.'&'; } } } if ($newquery) $newquery = '?'.rtrim($newquery,'&'); return $param[1].$scheme.$host.$path.$file.$extension.$newquery.$anchor.$param[3]; } function wrap_href($param) { return transform_uri($param); } ob_start("callback"); ?> I could not get ultimate seo urls to work ( I think due to other contributions and coding) so I tried this one. Seemed to work fine then noticed problem with google. My Contributions Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly Password Protect Admin "No matter where you go....There you are" - Buccaroo Bonsai Link to comment Share on other sites More sharing options...
Guest Posted May 8, 2007 Share Posted May 8, 2007 ok for decoding, there isn't any processing in that file so this module relies solely on the htaccess rewriterules. The sefit.php only builds the seo urls doing the encoding. (Haven't tested that preg_replace_callback rule to see if it encodes all the urls properly that's one thing to verify that regular expression). Kinda hard to find from the spider alone, but have you checked if there are trailing ampersands with the regular links? Because the rules don't filter anything like this. (see arg2 here) RewriteRule ^([a-z]{2})/(.*)$ $2?language=$1&%{QUERY_STRING} this filters arg1 allright but arg2 is not filtered. also this rule RewriteRule ^(.*)\.html(.*)$ $1.php?%{QUERY_STRING} will translate every html to php so real html files won't work. anyways that's my $.02 Link to comment Share on other sites More sharing options...
bill110 Posted May 8, 2007 Author Share Posted May 8, 2007 ok for decoding, there isn't any processing in that file so this module relies solely on the htaccess rewriterules. The sefit.php only builds the seo urls doing the encoding. (Haven't tested that preg_replace_callback rule to see if it encodes all the urls properly that's one thing to verify that regular expression). Kinda hard to find from the spider alone, but have you checked if there are trailing ampersands with the regular links? Because the rules don't filter anything like this. (see arg2 here) RewriteRule ^([a-z]{2})/(.*)$ $2?language=$1&%{QUERY_STRING} this filters arg1 allright but arg2 is not filtered. also this rule RewriteRule ^(.*)\.html(.*)$ $1.php?%{QUERY_STRING} will translate every html to php so real html files won't work. anyways that's my $.02 Thanks for the reply.I will look further for the ampersands. Do you know of any reason the spider would add so many ampersands? My Contributions Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly Password Protect Admin "No matter where you go....There you are" - Buccaroo Bonsai Link to comment Share on other sites More sharing options...
Guest Posted May 8, 2007 Share Posted May 8, 2007 This module you have, patches the url at the end of the page generation. So if it filters characters in between arguments it can cause it. The next time the osc tep_href_link function executes won't filter ampersands but '&' So the stock code won't work in that case. while ( (substr($link, -1) == '&') || (substr($link, -1) == '?') ) $link = substr($link, 0, -1); as it searches for '&' not "& amp" (without space). Maybe if you add another filter there for ampersands could reduce this side effect. Otherwise the invalid parameter will go through propagating the next time. That's my opinion anyways. Forgot to mention earlier to check another seo module, is more recent but it's still at its infancy, needs some manual execution to generate the names for the different entities, so the store owner has to do few things. http://www.oscommerce.com/community/contributions,5080 Link to comment Share on other sites More sharing options...
Guest Posted May 8, 2007 Share Posted May 8, 2007 forgot to ask do you have the stock tep_href_link function or you have mods there? normally the amperand encoding wont even reach the _GET array is translated way before to '&'. Link to comment Share on other sites More sharing options...
bill110 Posted May 9, 2007 Author Share Posted May 9, 2007 forgot to ask do you have the stock tep_href_link function or you have mods there? normally the amperand encoding wont even reach the _GET array is translated way before to '&'. Sorry for late reply. 4 kids all needing to go different directions. The tep_href_link function is stock. I think I found it in the functions/general.php in the tep_get_all_get_params stock $get_url .= $key . '=' . rawurlencode(stripslashes($value)) . '&'; mine $get_url .= $key . '=' . rawurlencode(stripslashes($value)) . '&'; not sure which contribution changed that. however now I am having problems with updating or removing cart quantities. If I remove a product the cart is actually updated but the cart still shows the product. go to a new page then cart shows empty. My Contributions Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly Password Protect Admin "No matter where you go....There you are" - Buccaroo Bonsai Link to comment Share on other sites More sharing options...
Guest Posted May 9, 2007 Share Posted May 9, 2007 yes, I see, that's not very good because the tep_href_link tries to trim this '&' So if the parameters have '& amp;' (as your tep_get_all_get_params does the encoding there) then this filtering for instance won't work as I mentioned earlier. while ( (substr($link, -1) == '&') || (substr($link, -1) == '?') ) $link = substr($link, 0, -1); and if you do the opposite, encoding at the end of the tep_href_link function then you're risking those payment gateways forms where the '&' will be appended and the return links won't work. Link to comment Share on other sites More sharing options...
bill110 Posted May 9, 2007 Author Share Posted May 9, 2007 Well I guess I'm gonna scrap it. No page rank yet at google so no real harm there i guess. The have 169 pages indexed so I'll either use another contribution and try a rewrite (perm. moved) to the new or just resubmit a site map and have them remove the page not found links as they occur. My Contributions Stylesheet With Descriptions Glassy Grey Boxtops Our Products Meta Tags On The Fly Password Protect Admin "No matter where you go....There you are" - Buccaroo Bonsai Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.