adbart Posted May 29, 2006 Share Posted May 29, 2006 Hi all, I've just seen this contribution, have a look: http://www.oscommerce.com/community/contri...,spider+remover I currently have indexed search results which include Sid numbers... which isn't great. Ideally, I would like googlebot and all other search engine bots to be redirected to the normal URL for each product, without the session ID. The "Spider Session Remover" seems like exactly what I need... apart from it's built for Apache, and I'm using IIS (Windows-based server). Has anyone got an idea of how to do this which is simply based on PHP I can include into my osC store? Thanks in advance for your ideas and help on this one! Link to comment Share on other sites More sharing options...
matrix2223 Posted May 29, 2006 Share Posted May 29, 2006 There is a few ways to do this 1) Force cookies and Set prevent spider sessions to True 2) You can use the SID Killer contrib which works great 3) Use Ultimate SEO URLS ( this will typically show the products url only howver Ive seen it to show the sid for a click or two then go back to normal. Hope this helps Link to comment Share on other sites More sharing options...
adbart Posted May 30, 2006 Author Share Posted May 30, 2006 There is a few ways to do this 1) Force cookies and Set prevent spider sessions to True 2) You can use the SID Killer contrib which works great 3) Use Ultimate SEO URLS ( this will typically show the products url only howver Ive seen it to show the sid for a click or two then go back to normal. Hope this helps Hi Eric Cheers for the feedback... I'm using the SID Killer contrib (just installed it today) and it's great, yes. However, the current URLs that google has already indexed still work... I need to find a way that if a client requests a URL with an SID, it is returned as a 301 redirect to the new URL which doesnt carry the SID. Link to comment Share on other sites More sharing options...
matrix2223 Posted May 30, 2006 Share Posted May 30, 2006 The best thing I would say for that is Modify your Products Not Found message in english.php there is a thread in the tips and tricks for changing this or you change it to what ever you like. Also there is a product not found contrib that this was specifically made for when google has the product indexed and you dont have it any longer. Hope this helps Link to comment Share on other sites More sharing options...
adbart Posted May 30, 2006 Author Share Posted May 30, 2006 But I do have the products still... and I want them to stay indexed with Google! I just need to get rid of the Session IDs from the index.... Link to comment Share on other sites More sharing options...
boxtel Posted May 30, 2006 Share Posted May 30, 2006 But I do have the products still... and I want them to stay indexed with Google! I just need to get rid of the Session IDs from the index.... you could add this to application_top.php after the spider identification part: function banned_redirect($to, $code = '301 Moved Permanently') { global $request_type; header("HTTP/1.1 ".$code); if ($to != '') { if ($request_type == 'SSL') { header("Location: https://$to"); } else { header("Location: http://$to"); } } exit(); } $X301_redirect = false; if ($spider_flag) { if (is_array($_GET) && (sizeof($_GET) > 0)) { reset($_GET); while (list($key, $value) = each($_GET)) { if (in_array($key,tep_session_name())) { // session id in url - ignore for new url and activate 301 redirect $X301_redirect = true; } else { // add normal parameter to url $p_str .= $key . '=' . rawurlencode(stripslashes($value)) . '&'; } } } // remove last & character $p_str = substr($p_str,0,strlen($p_str)-1); // create new url via seo but strip the http part $new_url = str_replace('http://','',tep_href_link(basename($PHP_SELF), $p_str)); if ($X301_redirect) { // perform the 301 redirect banned_redirect($new_url); } } Treasurer MFC Link to comment Share on other sites More sharing options...
matrix2223 Posted May 30, 2006 Share Posted May 30, 2006 Boxtel, With this bit of code are you saying that you dont need the sid killer contrib or should you use both. Thanks, Link to comment Share on other sites More sharing options...
boxtel Posted May 31, 2006 Share Posted May 31, 2006 Boxtel, With this bit of code are you saying that you dont need the sid killer contrib or should you use both. Thanks, I would never use sid killer as it is a very shaky hacking method. I would always recommend using the normal spiders.txt method to prevent spiders from getting a session in the first place. This code is to redirect spiders who already have a session id indexed to the same page without the session id. Basically a php alternative to the contribution which uses apache rewrite to do the same. I personally use that one but if you are not running apache you cannot. I do however use this code to redirect on other parameters I no longer use like page or sort which have been indexed and are now resulting in multiple url's with the same content. Treasurer MFC Link to comment Share on other sites More sharing options...
matrix2223 Posted May 31, 2006 Share Posted May 31, 2006 I always look for the updated spiders.txt and have a robots.txt and use the prevent spiders session. For some reason all search engines still get the session ids attach the funny I think. I also you Ultimate seo urls and the sid only shows once through normally browsing maybe its not the same way for search engines. What do you recomend? Thanks, Eric Link to comment Share on other sites More sharing options...
boxtel Posted May 31, 2006 Share Posted May 31, 2006 I always look for the updated spiders.txt and have a robots.txt and use the prevent spiders session. For some reason all search engines still get the session ids attach the funny I think. I also you Ultimate seo urls and the sid only shows once through normally browsing maybe its not the same way for search engines. What do you recomend? Thanks, Eric test it. use Firefox, change the useragent to any spider, disable cookies and turn off js. In other words, make your site believe you are a genuine spider. then visit your site. If you then get no session id in the urls, neither do the real spiders. if you do, then it is time for some tracing. Treasurer MFC Link to comment Share on other sites More sharing options...
zalik22 Posted June 15, 2006 Share Posted June 15, 2006 you could add this to application_top.php after the spider identification part: function banned_redirect($to, $code = '301 Moved Permanently') { global $request_type; header("HTTP/1.1 ".$code); if ($to != '') { if ($request_type == 'SSL') { header("Location: https://$to"); } else { header("Location: http://$to"); } } exit(); } $X301_redirect = false; if ($spider_flag) { if (is_array($_GET) && (sizeof($_GET) > 0)) { reset($_GET); while (list($key, $value) = each($_GET)) { if (in_array($key,tep_session_name())) { // session id in url - ignore for new url and activate 301 redirect $X301_redirect = true; } else { // add normal parameter to url $p_str .= $key . '=' . rawurlencode(stripslashes($value)) . '&'; } } } // remove last & character $p_str = substr($p_str,0,strlen($p_str)-1); // create new url via seo but strip the http part $new_url = str_replace('http://','',tep_href_link(basename($PHP_SELF), $p_str)); if ($X301_redirect) { // perform the 301 redirect banned_redirect($new_url); } } Where do you enter the code above? After this? // start the session $session_started = false; if (SESSION_FORCE_COOKIE_USE == 'True') { tep_setcookie('cookie_test', 'please_accept_for_session', time()+60*60*24*30, $cookie_path, $cookie_domain); if (isset($HTTP_COOKIE_VARS['cookie_test'])) { tep_session_start(); // user_tracking modifications if (!$referer_url) { $referer_url = $HTTP_SERVER_VARS['HTTP_REFERER']; if ($referer_url) { tep_session_register('referer_url'); } } $session_started = true; } } elseif (SESSION_BLOCK_SPIDERS == 'True') { $user_agent = strtolower(getenv('HTTP_USER_AGENT')); $spider_flag = false; if (tep_not_null($user_agent)) { $spiders = file(DIR_WS_INCLUDES . 'spiders.txt'); for ($i=0, $n=sizeof($spiders); $i<$n; $i++) { if (tep_not_null($spiders[$i])) { if (is_integer(strpos($user_agent, trim($spiders[$i])))) { $spider_flag = true; break; } } } } if ($spider_flag == false) { tep_session_start(); $session_started = true; } } else { tep_session_start(); $session_started = true; } and before this: // set SID once, even if empty $SID = (defined('SID') ? SID : ''); Thanks! Link to comment Share on other sites More sharing options...
matrix2223 Posted June 15, 2006 Share Posted June 15, 2006 Where do you enter the code above? After this? // start the session $session_started = false; if (SESSION_FORCE_COOKIE_USE == 'True') { tep_setcookie('cookie_test', 'please_accept_for_session', time()+60*60*24*30, $cookie_path, $cookie_domain); if (isset($HTTP_COOKIE_VARS['cookie_test'])) { tep_session_start(); // user_tracking modifications if (!$referer_url) { $referer_url = $HTTP_SERVER_VARS['HTTP_REFERER']; if ($referer_url) { tep_session_register('referer_url'); } } $session_started = true; } } elseif (SESSION_BLOCK_SPIDERS == 'True') { $user_agent = strtolower(getenv('HTTP_USER_AGENT')); $spider_flag = false; if (tep_not_null($user_agent)) { $spiders = file(DIR_WS_INCLUDES . 'spiders.txt'); for ($i=0, $n=sizeof($spiders); $i<$n; $i++) { if (tep_not_null($spiders[$i])) { if (is_integer(strpos($user_agent, trim($spiders[$i])))) { $spider_flag = true; break; } } } } if ($spider_flag == false) { tep_session_start(); $session_started = true; } } else { tep_session_start(); $session_started = true; } and before this: // set SID once, even if empty $SID = (defined('SID') ? SID : ''); Thanks! zalik22, I added it after and it works Hope this helps Link to comment Share on other sites More sharing options...
zalik22 Posted June 15, 2006 Share Posted June 15, 2006 zalik22, I added it after and it works Hope this helps So enter it in after the: // set SID once, even if empty $SID = (defined('SID') ? SID : ''); Is there an easy way to check and see if it works? Thanks Eric! Link to comment Share on other sites More sharing options...
boxtel Posted June 16, 2006 Share Posted June 16, 2006 So enter it in after the: // set SID once, even if empty $SID = (defined('SID') ? SID : ''); Is there an easy way to check and see if it works? Thanks Eric! well, there are 3 ways: 1) put your own useragent in the spiders.txt file so that your system identifies you as a spider. 2) use a known spider useragent with Firefox so that you are identified as a spider. 3) simply force the $spider_flag variable to true so that everybody is identified as a spider. then request a page from your site with a sessionid attached to the request and see that you will be redirected to the same page without that sessionid attached. Treasurer MFC Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.