Mark Russell Posted November 20, 2002 Posted November 20, 2002 I've read all the threads - more than once - including the 14 pager. Here is what I gathered in terms of the contribs and settings. I'm not too confident in having all this correct. Though, I hope this helps those that are also still confused by getting right down to the details. - spider catcher: this is the code that goes at the root (e.g. index.html) and either serves up allprods via readme or redirects to the default page. Purpose: to detect bot/spider and feed product links via allprods. Or redirect 'real user' to default. - Ian's Kill SID code: this is the code to detect a bot/spider once in the site to kill sids. Purpose: to prevent bots from getting trapped in site and to prevent getting product links listed in google with sids appended? QUESTION: I saw code in Ian's thread that looked just like the spider catcher code (one with bots in footprint array and all the ip addresses) that goes into html_output. How is this code different than spider catcher that goes into an index.html? - all_prods: code to facilitate product link submittals to the engines and to allow customers to view all products on one page. Link to all_prods should be visible and in the header, main page, or footer. Not in the left info box that does not get crawled. - meta-tags: contrib that allows meta-tags to be generated specific to pages/categories. Google might not use this, but it is good to have for other bots/spiders. - search engine safe urls - turn them off. Having them on prevents a user who has cookies disabled from buying AND either setting doesn't affect the SEs. Is that it? Can anyone shed additional light on the above? Much thanks for eveyone's ongoing efforts... Mark Quote
wizardsandwars Posted November 20, 2002 Posted November 20, 2002 Mark, That pretty much nails it down, with just a couple of deviations. Here's my comments on your summary. - spider catcher: this is the code that goes at the root (e.g. index.html) and either serves up allprods via readme or redirects to the default page. Purpose: to detect bot/spider and feed product links via allprods. Or redirect 'real user' to default. I wouldn't use this at all. there seems t obe some dissent about whether or not Google frowns upon this practice. Ultimatly, they have said that we should not direct a bot someplace other than where the general public goes. Ian's Kill SID code: this is the code to detect a bot/spider once in the site to kill sids. Purpose: to prevent bots from getting trapped in site and to prevent getting product links listed in google with sids appended? QUESTION: I saw code in Ian's thread that looked just like the spider catcher code (one with bots in footprint array and all the ip addresses) that goes into html_output. How is this code different than spider catcher that goes into an index.html? Almost, but not quite right. As I'm sure you have read, googls has some trouble with the SID. I suggested that we use the aforementioned spider catcher to determine if we assign an SID in the thml_output.php. To make a long story short, it turned out that this was a good idea. As long as the spider was in the IP address array, Google didn't get stuck indexing our site. However, it was brought to my attention by Burt (Thanks burt) that the USER_AGENT array in this hack does not work properly, which probably something in the substr function. There has been a suggestion for a fix, but I just havn't had the time, yet. Until it is fixed, you'll have to make sure to maintain your list of spider IP addresses. Ian knew that this hack has some high maintenence involved, and devised a way to keep OSC from assigning a SID until either the user tried to log in, or add something to the cart, which Google would never try to do. Ians code is designed to work in place of the spider catcher. It still has some bugs, but I would contact him directly for more information on this. His way is definatly better. all_prods: code to facilitate product link submittals to the engines and to allow customers to view all products on one page. Link to all_prods should be visible and in the header, main page, or footer. Not in the left info box that does not get crawled. Exactly meta-tags: contrib that allows meta-tags to be generated specific to pages/categories. Google might not use this, but it is good to have for other bots/spiders. - search engine safe urls - turn them off. Having them on prevents a user who has cookies disabled from buying AND either setting doesn't affect the SEs. You hit the nail on the head. In addtion, I'd like to add that I moved the catalog up a level to the web root directory. I received an email from google that said that sites with re-directs would not be indexed by Google. They consider a redirect anything that can not get back to google with one click of the 'back' button on the browser. I originally had an index.html that would redirect to the catalog default.php, and you have to click 'back' twice using that method. So, I moved my whole catalog up a level. I'm not sure what HPDL et al think of this, but this is what I did, and I believe I am efficiently indexed on google. Quote ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help.
Mark Russell Posted November 21, 2002 Author Posted November 21, 2002 Chris, Thanks for the follow up. My next two goals are move to root and session killer. Then I should be all set. Mark Quote
Mark Russell Posted November 21, 2002 Author Posted November 21, 2002 Chris, Thanks for the follow up. My next two goals are move to root and session killer. Then I should be all set. Mark Quote
winterradio Posted November 21, 2002 Posted November 21, 2002 I believe the session-id killer is now unnecessary. It has been addressed in a November 18 commit. Directly from CVS: Quote from http://www.searchengineworld.com/spiders/g...faq.htm#item355 "Does Google index dynamic content? A: It will in certain instances. What criteria is used to determine if a dynamic page is indexable is debatable. Most have found that clean, high ranking (High page rank) sites can get dynamic content indexed" This makes me think that Googlebot does indeed index dynamic pages, but not everybodies. Another Quote: "Sites that use session tracking urls to give each visitor a dynamic url. These sites can generate an infinite amount of pages for a spider to visit. These types of pages, are usually blocked from being indexed by Google." With this modification, the session id is back to the parameter area (even with Search Engine Friendly URL's turned on) so that no spiders get trapped in our infinite number of pages and we are not banner from Google. Any extra info on the subject is welcome. I also have my entire storefront at root level and have noticed a mind blowing difference. Search engines frown upon redirects from domain level no matter how this is accomplished. Search engine safe urls are going the way of Betamax, I've also read that Google prefers to serve dynamic pages pertaining to the most accurate content which brings me to the meta tag controller. The meta tag controller is probably the second most important ingredient next to the session_id fix. If it could be improved to a category level as well, the possibilities are endless. Some tips on meta tags: Title: I prefer to use capitalize my words, use no punctuation and to keep it within 6-10 words keeping the keyword density high. i.e.(Winter Clothing Jackets Coats Outerwear) Description: Use capital letters again with no punctuation. I follow the same pattern as the title tag except use between 15-22 words. i.e.(Clothing Coats Ski Jacket Down Winter Leather Down Outerwear[and so on]) Keywords: all lowercase with commas seperating keywords. 25-35 keyword phrases. (Probably the least important tag now used) i.e.(winter jacket, jacket, ski jacket, ski clothing, down winter jacket, winter coat, coat, [and so on]) I've found following this system and making subtle changes on page content with a high ranking priority get you listed just about anywhere you want. I hope this might help some people and I'd appreciate anyone else's ideas suggestions. (Seems like sharing information is the only way to really get a grasp on it) Henry Quote
mrsym2 Posted April 3, 2004 Posted April 3, 2004 (edited) The more I read about this, the more confused I get. I want to make sure that I am making my site to be as search-engine-friendly as possible. I am using the Dynamic Product Meta Tags contribution. Is there any advantage between this and the Header Tags Controller contribution? I have the following set: Force Cookie Use True Check SSL Session ID False Check User Agent False Check IP Address False Prevent Spider Sessions True Recreate Session False Use Search-Engine Safe URLs True Could someone look at my site and advise me on if I am doing anything wrong in the BIG search engine's views. http://aquatin.com Edited April 3, 2004 by mrsym2 Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.