Jump to content
  • Checkout
  • Login
  • Get in touch

osCommerce

The e-commerce.

[Contribution] Googlebot/Spider session id killer


Ian

Recommended Posts

  • Replies 191
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

below is the fix of login, my account, checkout link on the default page.

they generate sid on the 2nd round spider hit.

 

I hope this is the final dance with the robots.

 

1. in catalog/includes/header.php find this line

 

<td align="right" class="headerNavigation"><?php if (tep_session_is_registered('customer_id')) { ?><a href="<?php echo tep_href_link(FILENAME_LOGOFF, '', 'SSL'); ?>" class="headerNavigation"><?php echo HEADER_TITLE_LOGOFF; ?></a>  |  <?php } ?><a href="<?php echo tep_href_link(FILENAME_ACCOUNT, '', 'SSL'); ?>" class="headerNavigation"><?php echo HEADER_TITLE_MY_ACCOUNT; ?></a>  |  <a href="<?php echo tep_href_link(FILENAME_SHOPPING_CART, '', 'NONSSL'); ?>" class="headerNavigation"><?php echo HEADER_TITLE_CART_CONTENTS; ?></a>  |  <a href="<?php echo tep_href_link(FILENAME_CHECKOUT_PAYMENT, '', 'SSL'); ?>" class="headerNavigation"><?php echo HEADER_TITLE_CHECKOUT; ?></a>   </td>

 

change to

 

<td align="right" class="headerNavigation"><?php if (tep_session_is_registered('customer_id')) {

echo "<form name="form_logoff" method="post" action="";

echo tep_href_link(FILENAME_LOGOFF, '', 'SSL');

echo ""> <input name="Submit" type="image" src="logoff.gif" border=0";

echo "></form>";?>

 |  <?php } ?>

 

<?php echo "<form name="form_my_account" method="post" action="";

echo tep_href_link(FILENAME_ACCOUNT, '', 'SSL');

echo ""> <input name="Submit" type="image" src="my_account.gif" border=0";

echo "></form>";?>

 | 

 

<?php echo "<form name="form_my_account" method="post" action="";

echo tep_href_link(FILENAME_SHOPPING_CART, '', 'SSL');

echo ""> <input name="Submit" type="image" src="cart_contents.gif" border=0";

echo "></form>";?>

 | 

 

<?php echo "<form name="form_my_account" method="post" action="";

echo tep_href_link(FILENAME_CHECKOUT_PAYMENT, '', 'SSL');

echo ""> <input name="Submit" type="image" src="checkout.gif" border=0";

echo "></form>";?>

  

</td>

 

Note, I add 4 gif under catalog dir, you can modify code to any other dir.

 

http://members.shaw.ca/cesun/logoff.gif

http://members.shaw.ca/cesun/checkout.gif

http://members.shaw.ca/cesun/my_account.gif

http://members.shaw.ca/cesun/cart_contents.gif

 

2. in catalogincludeslanguagesenglish.php

 

find this line

 

define('TEXT_GREETING_GUEST', 'Welcome <span class="greetUser">Guest!</span> Would you like to <a href="%s"><u>log yourself in</u></a>? Or would you prefer to <a href="%s"><u>create an account</u></a>?');

 

 

change to

 

define('TEXT_GREETING_GUEST', 'Welcome <span class="greetUser">Guest!</span> Would you like to

<form name="form_login" method="post" action="%s">

<input type="submit" name="Submit" value=" Log In ">

</form>

or

<form name="form_create_an_account" method="post" action="%s">

<input type="submit" name="Submit" value="Sign Up">

</form>

?

');

 

note, here I used the default submit button, you can change it to you image.

 

That seems all changed files i remember, if i forgot some files, i will post

them later.

 

regards

 

david

Link to comment
Share on other sites

Hi All,

 

I'm pretty new to all of this, but I've put in a couple of mods based on these postings. Google came by and seems to have just got my first page. As I make more changes, is there a good way to test without having to wait until the next time google comes by?

 

Thanks in advance.

Link to comment
Share on other sites

Hi All,

 

I'm pretty new to all of this, but I've put in a couple of mods based on these postings. Google came by and seems to have just got my first page. As I make more changes, is there a good way to test without having to wait until the next time google comes by?

 

Thanks in advance.

 

AFAIK (correct me if I am wrong), Google's visit comprises of two sessions: (1) the Googlebot comes to check your site for updated content (2) if there is something new w.r.t the prior indexing session, the Google Indexer will visit a few days later and lick your updated links etc. to be included.

 

Sunny

Link to comment
Share on other sites

Hello @ll,

 

firsteval I like to thank you all for your work regarding Googlebot/Spider session id killer.

 

I implemented Ians code and have some trouble with the default language setting. There is a problem regarding the default language mentioned in the threads.

 

I am using osC in german language only and I get the following error message when opening default.php on my test system:

 

Parse error: parse error, unexpected T_STRING in d:apacheapachehtdocscatalogincludesapplication_top.php on line 435

 

Fatal error: Failed opening required 'DIR_WS_LANGUAGES/FILENAME_DEFAULT' (include_path='.;c:php4pear') in d:apacheapachehtdocscatalogdefault.php on line 33

 

Do you have an idea how to fix this?

 

If you have any idea please post it :P

 

Thanks a lot!

 

Christian

Link to comment
Share on other sites

@ Ian

 

 

Could you post the complete installation of your add on? What files, which lines.....

 

The most questions came from the different snapshots. I have also a problem, but i didn't know where. So i can check my install.......

 

I have the problem with the add on smal, medium, big images....

If a bot go to the product_info with the medium image.......he got a session like this

 

URL: /shop/catalog/images/Peitsche_Paddel_schmal_med.jpg/osCsid/6defc5b0fe8300f9953ddd32deeff0b9

User Address: 66.196.72.61

User Agent: Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)

Referer:

 

on all other files it works. But the product_info is the most important file...

 

 

Chris

Link to comment
Share on other sites

Ian,

 

Please could you help, I have loaded up a new snapshot and applied your mode as stated in this forum.

 

After a day trying to get this mode to work, I finally have to admit defeat, which mean osCommerce is un-useable in its current state, as search engines listings is the top priorty for any e-commerce shop on the net.

 

The result I get on the screen is as follows:

 

===========

Parse error: parse error in /Users/darren/Sites/test/catalog/includes/application_top.php on line 413

 

Fatal error: Failed opening required 'DIR_WS_LANGUAGES/FILENAME_DEFAULT' (include_path='.:/usr/lib/php') in /Users/darren/Sites/test/catalog/default.php on line 21

===========

 

Without the mod the site works fine, but is search engine un-friendly.

 

Please could you help me out, as I would emagine there are others here that have only access to the latest snapshot.

 

Thank you in advance.

 

Dar

Link to comment
Share on other sites

here is what the code reads at the mentioned errors.

 

// calculate category path

if ($HTTP_GET_VARS['cPath']) {

$cPath = $HTTP_GET_VARS['cPath'];

line 413 > } elseif ($HTTP_GET_VARS['products_id'] && !$HTTP_GET_VARS['manufacturers_id']) {

$cPath = tep_get_product_path($HTTP_GET_VARS['products_id']);

} else {

$cPath = '';

}

if (strlen($cPath) > 0) {

$cPath_array = explode('_', $cPath);

$current_category_id = $cPath_array[(sizeof($cPath_array)-1)];

} else {

$current_category_id = 0;

}

 

and at line 21:

 

require(DIR_WS_LANGUAGES . $language . '/' . FILENAME_DEFAULT);

 

I hope that is enough info

 

Thanks again

 

Dar[/b]

Link to comment
Share on other sites

The problem is in your application_top.php.

 

Check that you have added my code correctly, and that you added it just after

 

require(DIR_WS_CLASSES . 'breadcrumb.php');

$breadcrumb = new breadcrumb;

Trust me, I'm an Accountant.

Link to comment
Share on other sites

Hi all,

 

Ian's solution for removing Session ID "If no customer is logged in or there is nothing in the cart, we kill the session id" is much better than using spider recognization code by maintaining IP and matching HTTP_USER_AGENT. Mantaining ip and its agent is a tedious job for updating new Bot/Spider, and it will not last if Bot/Spider changes its ip or its agent name.

 

Ian's solution is better, but his codes still has some draw backs and problems:

 

1. Customer with disable cookie and has empty cart or Bot/Spider visiting will generate a new session file for each request or a new entry of session is added in the sessions table. This is because a cookie can not be retrieved and therfore session_start() is called. This will introduce a performance issue when too many files are being created for each link customer clicks on!!!

 

2. "Buy Now" buttons direct link adds item to cart and redirects to shopping_cart.php with Session ID in its url. This can be prevented by creating a form instead of link on buy now button. There is a contribution from Joshua Dechant http://www.oscommerce.com/community/contributions,864 .

 

Another altenative solution is to add the disallow to shopping_cart.php and login.php pages in your robots.txt. login.php page will also introduce Session ID in URL due to $kill_sid = false when customer is on this page.

 

# Your robots.txt file

User-agent: *

Disallow: /catalog/shopping_cart.php

Disallow: /catalog/login.php

 

In your shopping_cart.php and login.php, add this line to its Meta Header:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

 

3. Customer with disable cookie and has empty cart switches to another Language or Currency and the next link customer clicks on, it will be switched back to the Default language or Currency. This problem can be solved by matching the $parameter in tep_href_link() function to detect if customer has 'language=' or 'currency=' in previous url clicked on. If $parameter does not contain 'language=' or 'currency=' and customer had clicked on Language or Currency then Language or Currency is being assigned by the previous Language or Currency of that customer clicked on. So the next link will either contain its previous Language or Currency.

 

4. Customer with disable cookie and has empty cart created account successfully will require to login. This is not the normal behaviour. This problem occurred due to tep_session_register('customer_id') in create_account_process.php is called after application_top.php (Ian's code at the end of this page). To solve this problem, place the new modified Ian's code in tep_href_link() function.

 

Overall, Ian's code is better than Spider Recognize code, but it needs to rewrite to address the above problems and drawbacks.

 

I had rewrote it to address the above issues 3, and 4 and the codes should be placed within tep_href_link() function. If you modified Ian's codes, you should always test it with cookies enable and disable with empty cart and non-empty cart.

 

Regards,

Bn

Link to comment
Share on other sites

Ok, I'm dumb. I'm not sure I understand everything Bao recommended and how they'll apply to oscv2.2. My understanding is you're saying implement Ian's SID killer and Buy Now link to forms and the second part of item #2 is optional.

For 3 and 4, can you tell us which code to change and the files affected. I'm new to php and oscommerce.

 

Lastly, is there a final version of Ian's sid killer as a contrib? I've seen several posts which started as early as October and November and I'm not sure if they're still valid in v2.2 or even the most recent snapshots.

These posts seems to make sense for people experienced with osc and/or php.

Link to comment
Share on other sites

I saw a thread that combined most of the code but the date was rather old.

I also searched through the contrib area and haven't found it yet.

 

Where is the combined code?

 

Even if it's still beta I would be very appreciative if I could have everything put in one place.

 

This is the last mod I want to install before I put up a beta version of my site on the net. That way while debuggin is going on I can wait for the bots to start hitting :wink:

NewsDesk(934) / FAQDesk(1106) / OrderCheck(1168) :::

Link to comment
Share on other sites

I'm currently working on an upgrade to the script. This should address a couple of points.

 

Firstly moving the script to before the session_start call in application top and wrapping the session_start call in an if statement to make sure useless sessions are not generated.

 

Secondly, propagating any change for language/currency on the url. This will do away with having a session_id generated in this case with cookies off.

 

However, the shelf life of this is diminishing given Haralds thread at.

 

http://www.oscommerce.com/forums/viewtopic.php?t=31928

Trust me, I'm an Accountant.

Link to comment
Share on other sites

My site is correctly spidered by google, i got over 900 links thanks to the session id killer.

If people find me in google and click on the link the go to the page with the product on it. If they press the "order product" button there the cart stays empty. So people have to go to the default page and then go back to the product to get it into the cart.

 

Is my conclusion that the session id is applied at the defeault page the right one?

A sollution for this?

 

Anybody can relate to this?

 

Greetings...

Link to comment
Share on other sites

My site is correctly spidered by google, i got over 900 links thanks to the session id killer.  

If people find me in google and click on the link the go to the page with the product on it. If they press the "order product" button there the cart stays empty. So people have to go to the default page and then go back to the product to get it into the cart.  

 

Can anything be done about this - had people phoning up all the time and complaing - if they entered something into the cart a secod time though all was fine:

 

Try it - heres a direct product link:

 

http://www.medisave.net/product_info.php/c...products_id/248

 

Can anything be done - lots of people come direct off google to individual products and then try and buy and get put off when that happens.

Graham Wright

________________

Link to comment
Share on other sites

hmmm..

 

odd that you 2 are getting this problem... I just tried on mine, typing in a direct link, and I don't have that same problem.

 

are you using the most up to date $kill_sid code in application_top.php???

 

it looks like:

//================================================================ 

if ( ($HTTP_GET_VARS['currency']) ) { 

  tep_session_register('kill_sid'); 

  $kill_sid=false; 

 } 

if ( ($HTTP_GET_VARS['language']) ) { 

 tep_session_register('kill_sid'); 

 $kill_sid = false; 

 } 

if (basename($_SERVER['HTTP_REFERER']) == 'allprods.php' ) $kill_sid = true; 

if ( ( !tep_session_is_registered('customer_id') ) && ( $cart->count_contents()==0 ) && (!tep_session_is_registered('kill_sid') ) ) $kill_sid = true; 

if (basename($PHP_SELF) == FILENAME_LOGIN ) $kill_sid = false; 

//================================================================

The only thing necessary for evil to flourish is for good men to do nothing

- Edmund Burke

Link to comment
Share on other sites

I have just read throught the whole topic, but first let me give you some background, as I think I am one of many who use OSC.

 

I am not PHP knowledgeable, MySQL is being picked up as needed, I think that OSC is a great product and congratulate the people who work on it, I have even made some contributions, but with entries like

 

 

"The PostNuke session ID"

"testing user_agent, ip address etc,"

"are no ppl on my site "

"but the login, my account, checkout link on the default page generate sid on the 2nd round. "

"and for those of us who don't have the line"

 

What chance do people like myself have ? :(

 

When you talk of

URL: /shop/catalog/images/Peitsche_Paddel_schmal_med.jpg/osCsid/6defc5b0fe8300f9953ddd32deeff0b9  

User Address: 66.196.72.61  

User Agent: Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)  

Referer:

Which I assume comes from the server, I ask the question, "Can most people using OSC get this info and if so, would they know how to deal with it, I sure don't.

 

What I need and I am sure there are many out there is a definitive list of instructions on what to do and how to implement this code.

 

Boa goes a long way towards this

Ian's solution for removing Session ID "If no customer is logged in or there is nothing in the cart, we kill the session id" is much better than using spider recognization code by maintaining IP and matching HTTP_USER_AGENT. Mantaining ip and its agent is a tedious job for updating new Bot/Spider, and it will not last if Bot/Spider changes its ip or its agent name.  

 

But there is also the problem with OSC V2.2, is there an issue with Pre Nov 2002 OSC and Post Nov 2002.

 

Everyone wants to get listed but I for one, am very confused after looking at this thread, what do I need to do, to get listed :?:

Phil Townsend

Waterslap Farm, Airth

Falkirk Stirlingshire FK2 8QW

Link to comment
Share on other sites

hmmm..  

 

odd that you 2 are getting this problem... I just tried on mine, typing in a direct link, and I don't have that same problem.  

 

are you using the most up to date $kill_sid code in application_top.php???  

 

it looks like:  

 

Yes I have the latest app top. Does your site run search engine safe urls? - i just turned them off and it did add to cart first time? Yes I am running latest sid killer code.

Graham Wright

________________

Link to comment
Share on other sites

Yes I have the latest app top. Does your site run search engine safe urls? - i just turned them off and it did add to cart first time? Yes I am running latest sid killer code.

 

no I do not run search engine safe URLs for a few reasons... #1 they are not fully developed and $2 they are not neccessary anymore since search engines have no trouble handling the ? in URLS anymore.

The only thing necessary for evil to flourish is for good men to do nothing

- Edmund Burke

Link to comment
Share on other sites

I had them switched on because I had done quite well with other sites in search engines with them turned on!

 

However I "think" you may be right about search engines not caring so have just switched the safe urls off!

 

I will see how it goes with google.

Graham Wright

________________

Link to comment
Share on other sites

I got everithing mentioned above but.....search engine safe url's ON.

Hmmm, i turned it OFF now. But now everyone clicking a direct link gets a screen saying "product not found"....have to wait for google spider again to pick up the correct links.

 

Thanks for the help...hope it helps...

Link to comment
Share on other sites

Trying to keep abreast of this thread and understand what is happening.

 

Could someone let me know what is meand by

but.....search engine safe url's ON.  

Hmmm, i turned it OFF now.

 

My admin has it as True or False, is True = On and False = OFF

 

Also what is the difference between the two URL's

 

I changed my shop to include the SID killer and "buy now" form today but after that nothing was added to the basket so I changed the "safe url's" and then removed the "buy now" column and it now seems to work.

 

I think :?

Phil Townsend

Waterslap Farm, Airth

Falkirk Stirlingshire FK2 8QW

Link to comment
Share on other sites

Ian,

 

I have installed your SID killer and tested it using http://www.searchengineworld.com/cgi-bin/

 

And it seems to work properly - NO osCsid but when I spider the AllProds.php page the osCsid comes back on all files accept Product_info.PHP files.

 

Is that ok?

 

I am sorry if this is a stupid question

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...