wizardsandwars Posted March 22, 2004 Share Posted March 22, 2004 The session keeps track of what the user has in their cart. The Session ID uniquely identifies the session. The spiders.txt is a list of user agents of known spider. This is used for the 'prevent spider sessions' feature, so that spiders are not assigned a session. This is required if you want to use the 'prevent spider sessions' feature. The robots.txt is a file that all spiders look for when they visit your site. You can use this file to tell the spiders what directories/file you do not want the spider to visit. This is not required. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
tammy507 Posted March 22, 2004 Share Posted March 22, 2004 thank you so much! now the robots.txt goes in the root directory, and the spider.txt goes where? Thanks Link to comment Share on other sites More sharing options...
tammy507 Posted March 22, 2004 Share Posted March 22, 2004 This is what I have as a spider.txt , I dont remember creating it so I assume it is part of a contribution or OSC? LOL I found it in includes/spider.txt $Id: spiders.txt,v 1.2 2003/05/05 17:58:17 dgw_ Exp $almaden.ibm.com appie 1.1 architext ask jeeves asterias2.0 augurfind baiduspider bannana_bot bdcindexer crawler crawler@fast docomo fast-webcrawler fluffy the spider frooglebot geobot googlebot gulliver henrythemiragorobot ia_archiver infoseek kit_fireball lachesis lycos_spider mantraagent mercator moget/1.0 muscatferret nationaldirectory-webspider naverrobot ncsa beta netresearchserver ng/1.0 osis-project polybot pompos scooter seventwentyfour sidewinder sleek spider slurp/si [email protected] steeler/1.3 szukacz t-h-u-n-d-e-r-s-t-o-n-e teoma turnitinbot ultraseek vagabondo voilabot w3c_validator zao/0 zyborg/1.0 Link to comment Share on other sites More sharing options...
mrjkb Posted March 23, 2004 Share Posted March 23, 2004 I have a quick question on the SEO issue. I picked up the recommendation to run my site through http://www.webconfs.com/search-engine-spider-simulator.php and have many of my links ending with /catalog/index.php?cPath=22&osCsid= I have a fealing that the ending with sid= will prevent the search engine from following the link. How do I get rid of this? It is only on the category links. The products are clean. Laser labels, barcode labels, custom labels Link to comment Share on other sites More sharing options...
mrjkb Posted March 23, 2004 Share Posted March 23, 2004 Clarification.... I do have "Prevent Spider Sessions" on in my admin, so I would think that the sids would be off.... Is that a right assumption? So why the sid= extention? Laser labels, barcode labels, custom labels Link to comment Share on other sites More sharing options...
Rwfresh Posted March 23, 2004 Share Posted March 23, 2004 Jack, Check and see how Google is seeing your page by using this link: http://www.webconfs.com/search-engine-spider-simulator.php. Check to see if the spiders/bots are getting any errors on your pages. I had a similar problem, it was caused by having the 'prevent spider sessons' option turned on. Since there was no session there was a variable getting a blank for the language rather than the language name, thus causing an error on my pages for Google and other spiders. I went through and made corrections to my code and now I'm getting traffic from the bots. I had this problem because when I'm using a development version of the software, or one of the mid-releases from late last fall. Put a link to your website in your profile, that way when you ask questions people can look at your site. Hey Can you tell us, well me, exactly what you did? What code changes you made? I am having the exact same problem. I get the language/session error when a spider comes to particular site i setup with a snapshot from late last fall. When i turn on spider sessions the error disappears but now the oscid is showing in the url.. What did you do? Thanks!! Link to comment Share on other sites More sharing options...
Rwfresh Posted March 23, 2004 Share Posted March 23, 2004 TO ALL, About 4 months ago i was tearing my hair out, convinced my site(s) would never be spidered and indexed. I will say that YES OSC can be both spidered and indexed by google. I had many of the same exchanges with Burt and Wizard & Wars. If you are using a snapshot that contains the new session code you may have problems.. that can be resolved. MS2.2 out of box can and will be spidered and Indexed. Unique title tags is most important for indexing. Try your own custom code to generate title, description and keyword tags from the text in your pages. This is not to say that some snapshots of OSC will not cause problems. They do. Track them down with the available tools and fix them. There really needs to be a definitive resource for SEO issues with OSC. It can be difficult to wade through the conflicting information on these forums. Especially for beginners. Link to comment Share on other sites More sharing options...
mrjkb Posted March 24, 2004 Share Posted March 24, 2004 This is not to say that some snapshots of OSC will not cause problems. They do. Track them down with the available tools and fix them. What availabel tools are there to fix these pages? Can you point me to some recommendations? Laser labels, barcode labels, custom labels Link to comment Share on other sites More sharing options...
ari Posted March 30, 2004 Share Posted March 30, 2004 I have the prevent spider session set to true and it is working fine for googlebot but not for YahooSeeker. I have added the Yahoo Seeker bot to spiders.txt and I still see many lines like this in my log file [66.196.93.4 /product_info.php?cPath=3_23&products_id=73&osCsid=c41356a5f4d1673670a96036cc3a0de5 "YahooSeeker/1.1 ] whereas googlebot looks like this [64.68.82.201 /articles.php?tPath=1 "Googlebot/2.1] my spiders.txt has these lines (partial list) [slurp/si [email protected] steeler/1.3 . . . ultraseek vagabondo voilabot voila w3c_validator yahooseeker YahooSeeker YahooSeeker/1.1 zao/0 zyborg/1.0 ] Any idea what is wrong here? thanks a lot Ari Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 30, 2004 Share Posted March 30, 2004 It appears that nothign is wrong. How do you know that those sessions weren't gathered earlier? ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
tammy507 Posted March 30, 2004 Share Posted March 30, 2004 I have a question, I hope someone here can help me with. I have my "store" in my root directory. I have a subdomain for a message board forum. I need to know the best way to link the forum from my Info box. Currently, I created a php page and am using a redirect on that page. However, from everything Ive read redirects are a no-no with Google. So, this is what I need to know: How can I add "Forum" to my Info box and have it go to the forum on my subdomain? Thanks Tammy Link to comment Share on other sites More sharing options...
wizardsandwars Posted March 30, 2004 Share Posted March 30, 2004 I have a question, I hope someone here can help me with. I have my "store" in my root directory. I have a subdomain for a message board forum. I need to know the best way to link the forum from my Info box. Currently, I created a php page and am using a redirect on that page. However, from everything Ive read redirects are a no-no with Google. So, this is what I need to know: How can I add "Forum" to my Info box and have it go to the forum on my subdomain? Thanks Tammy By starting a new thread, or by posting in a thread with relevent discussion. What in the world does that have to do with Search Engines? ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
tammy507 Posted March 30, 2004 Share Posted March 30, 2004 What in the world does that have to do with Search Engines? Because the way I have it working, works fine....... But I need a way that is appropriate and GOOGLE (the major search engine) friendly. Link to comment Share on other sites More sharing options...
ari Posted April 2, 2004 Share Posted April 2, 2004 It appears that nothign is wrong. How do you know that those sessions weren't gathered earlier? I thought that the spider is scanning the pages in real time and not really coming to the site armed with URL from previous scans. Anyway, this has been going on for more then a month. Yahoo Seeker comes to visit about once a week. I will say, that the result set in yahoo search does not have the session ID. I am just concerned that the spider is not doing as good a job as googlebot is doing - and I do have much better results with Google. Thanks -- Ari Link to comment Share on other sites More sharing options...
wizardsandwars Posted April 2, 2004 Share Posted April 2, 2004 Most likily, it is returning to URLs it previously gathered. It may continue to do so for several months. ------------------------------------------------------------------------------------------------------------------------- NOTE: As of Oct 2006, I'm not as active in this forum as I used to be, but I still work with osC quite a bit. If you have a question about any of my posts here, your best bet is to contact me though either Email or PM in my profile, and I'll be happy to help. Link to comment Share on other sites More sharing options...
Guest Posted April 7, 2004 Share Posted April 7, 2004 I had the same problem with yahoo. It seems that you have to have it in spiders.txt in lowercase. Try it. I uset this tool to check spiders http://submitexpress.com/analyzer you can set the user agent. Anyway lowercase did it for me. Link to comment Share on other sites More sharing options...
ari Posted April 8, 2004 Share Posted April 8, 2004 I think Wizard is right. The spider come armed with URLs. By the way, my spider.txt has it both in lower and title case and still the SID is in there. The results in Yahoo don't have the SID. Unless you have other ideas, I will just let time do its thing. Thanks Ari Link to comment Share on other sites More sharing options...
Guest Posted April 8, 2004 Share Posted April 8, 2004 Did you try the link to submitexpress tool. It'll you exactly if the sids are killed for the user-agent specified. I had the same problem with msn bot. Link to comment Share on other sites More sharing options...
ari Posted April 8, 2004 Share Posted April 8, 2004 I just did and the SID is not showing when I type YahooSeeker in the user agent box. I did this test with the Mozila Firefox browser in the past http://www.mozilla.org/products/firefox/ with a user agent switcher which is doing the same thing. But then the real log file has the SID for YahooSeeker and not for Googlebot. That's what is so strange.Ari Link to comment Share on other sites More sharing options...
expert Posted May 23, 2004 Share Posted May 23, 2004 ;) latest List of Robot Agent Strings can be found at here! Link to comment Share on other sites More sharing options...
Pixxi Posted May 28, 2004 Share Posted May 28, 2004 I just did and the SID is not showing when I type YahooSeeker in the user agent box. I did this test with the Mozila Firefox browser in the past http://www.mozilla.org/products/firefox/ with a user agent switcher which is doing the same thing. But then the real log file has the SID for YahooSeeker and not for Googlebot. That's what is so strange.Ari I have the same problem when I try the submitexpress tester, but the other way round... Google (and some others) get a SID but Yahoo (and a few others) don't get one. Doesn't seem to be any logic why - all of them are listed in the spiders.txt file. Perhaps the submitexpress tester isn't working the way it should, or at least the way osC expects? 'Prevent spider sessions' is on. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.