salewit Posted June 5, 2004 Share Posted June 5, 2004 My site just went live a few days ago, and I'm really uneasy about the whole SEO thing. I've got the Header Tags Controller mod installed, and I've got Prevent Spider Sessions" set to true. I ran a few of those "spider simulators" and got back kind of a mess. Here's the result of one: Spidered Links : http://www.trainvideodepot.com http://www.trainvideodepot.com/index.php https://secure24.nocdirect.com/~lakeminn/tr...pot/account.php http://www.trainvideodepot.com/shopping_cart.php https://secure24.nocdirect.com/~lakeminn/tr...ut_shipping.php http://www.trainvideodepot.com/advanced_search.php http://www.trainvideodepot.com/shipping.php http://www.trainvideodepot.com/privacy.php http://www.trainvideodepot.com/conditions.php http://www.trainvideodepot.com/contact_us.php product_info.php?products_id=41 product_info.php?products_id=40 product_info.php?products_id=28 product_info.php?products_id=29 product_info.php?products_id=30 product_info.php?products_id=31 product_info.php?products_id=32 product_info.php?products_id=33 product_info.php?products_id=34 product_info.php?products_id=35 product_info.php?products_id=36 product_info.php?products_id=37 product_info.php?products_id=38 product_info.php?products_id=39 http://www.trainvideodepot.com/shopping_cart.php http://www.trainvideodepot.com/reviews.php http://www.trainvideodepot.com/product_rev...28&reviews_id=3 http://www.trainvideodepot.com/product_rev...28&reviews_id=3 I've only got 14 items for now, and they seem to be listed there, but the link to them looks "relational". Inotherwords, the web address isn't in front of the links. Is this a problem? What about the other links picked up? Should I create a robots.txt file to keep robots off of those files? And I can't figure out why it picked up some secure paths (I'm using a shared SSL). Another spider simulator I used had the above product path as: http://product.info/product_info.php?products_id=37 This scares the heck out of me. There are so many mods out there for SEO, and when I read about people talking about them here, it gets me more worried. Like the one that uses Javascript re-directs. Ok... I've got to lay off the caffeine.... But seriously, am I worrying for nothing? I went to Google, and they say their index is updated every *4 WEEKS*. When I get spidered and the results aren't correct, 4 weeks is kind of a long time to wait to get another shot. Thanks for putting up with my neurotic manerisms. Sam Link to comment Share on other sites More sharing options...
peterr Posted June 5, 2004 Share Posted June 5, 2004 Hi, I tried 2 different spider tests, and it appears that you don't have "turn session id's off" for spiders, the oscSid values were showing everywhere. Is your file /spiders.txt up to date ?? Peter Link to comment Share on other sites More sharing options...
salewit Posted June 5, 2004 Author Share Posted June 5, 2004 Yeah I got SID's on one test also (not on another though). I see no flag for "turn session id's off". Everything under "sessions" in the admin panel is set to false except for "prevent spider sessions". As for spiders.txt, I didn't even know I had that file, but I found it and it came with the most recent version of OSC. The date on it is 5/5/03. Thanks for helping me through this... I appreciate it. Sam Link to comment Share on other sites More sharing options...
peterr Posted June 5, 2004 Share Posted June 5, 2004 Hi, You have "Prevent Spider sessions' set to true, which is correct. My statement ("turn sessions off for spiders") was my roundabout way of the same. :) Yes, some spider test return the session id. Just note the agent that they use , check your web server logs to check on the agent, then add that agent ot the file "spiders.txt". Make usre it is all in lowercase. Peter Link to comment Share on other sites More sharing options...
salewit Posted June 5, 2004 Author Share Posted June 5, 2004 Thanks again. I'm thinking that I'm getting a handle on this. The reason I'm getting the relative links, is because I hard coded those products into index.php. Simple fix. So I guess what happens when I have "prevent spider sessions" set to true is that when OSC sees a known spider from the spiders.txt file, it simply drops the SID, correct? And I have no worries just because the simulators are seeing them, they will be dropped. One more question? I set up a robots.txt file to try and get rid of some of the extraneous stuff. I.e.: User_agent: * Disallow: /shipping.php Disallow: /privacy.php etc. I did this, but the simulators still picked it up. Do *they* look at the robots.txt file or am I doing something wrong? Thanks again.. last question... promise. Sam Link to comment Share on other sites More sharing options...
peterr Posted June 5, 2004 Share Posted June 5, 2004 Hi, So I guess what happens when I have "prevent spider sessions" set to true is that when OSC sees a known spider from the spiders.txt file, it simply drops the SID, correct? And I have no worries just because the simulators are seeing them, they will be dropped. To answer the first question, correct, see this code here: /catalog/includes/application_top.php if (tep_not_null($user_agent)) { $spiders = file(DIR_WS_INCLUDES . 'spiders.txt'); for ($i=0, $n=sizeof($spiders); $i<$n; $i++) { if (tep_not_null($spiders[$i])) { if (is_integer(strpos($user_agent, trim($spiders[$i])))) { $spider_flag = true; break; } } } } it just loops through, and if it finds the agent name, osC sets the var $spider_flag to true, which then prevents a session being started. To answer the second question, if the simulator is ONLY used by the website that has the tool (the simulator) and NOT used by actual spiders/crawlers, then you really don't have to alter "spiders.txt" to cater for the simulator. On the other hand, if a simulator uses (for example) a 'bot' the same as a spider/crawler that visits your site, then you would need to add that agent name. Btw, ad the name 'msnbot'. One more question? I set up a robots.txt file to try and get rid of some of the extraneous stuff. I.e.: User_agent: * Disallow: /shipping.php Disallow: /privacy.php etc. I did this, but the simulators still picked it up. Do *they* look at the robots.txt file or am I doing something wrong ? They _may_ look at it, but as I said before (this thread I think), they are under NO obligation to follow the "rules' we place. The ones that don't follow rles are often called 'naughty' user agents. But it is spiders/crawlers that you need to be concerened about, spider simulators only give some indication, and at least show you if the session id is appearing. I tried 2 simulators, one gave absolute links, the other only gave relative, and actually made a mess of some of the links. Peter Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.