xnewbi3x Posted December 29, 2005 Share Posted December 29, 2005 Hi, i want to place robots.txt in my htdocs, root i'm wondering do i need to allow all search engine to access /catalog/includes? or can i block them from indexing the /includes/ or /admin/ directory? and if i use Ultimate SEO constribution, do i block those directory or no? and i notice that there was a spyders.txt and tld.txt in my catalog/includes/ folder what does it do? thanks Link to comment Share on other sites More sharing options...
Guest Posted December 29, 2005 Share Posted December 29, 2005 do not add your admin directory to robots.txt, this file is usually a target for hackers. rename your admin to a secret name (myadminfile4545412 for example) and if you're paranoid, add the "noindex,nofollow" robot meta tags by default, osc disables anyone directly accessing /includes if you do not create a robots.txt, the spiders/bots will roam free on your site until they are denied by script (for example, password protect via php) or htaccess if you don't want your customers' files showing up on google, your best bet is to disallow all of the account & checkout pages via robots.txt Link to comment Share on other sites More sharing options...
Guest Posted December 29, 2005 Share Posted December 29, 2005 spiders.txt prevents spiders (like googlebot) from getting session id's Link to comment Share on other sites More sharing options...
xnewbi3x Posted December 29, 2005 Author Share Posted December 29, 2005 do not add your admin directory to robots.txt, this file is usually a target for hackers. what does it mean do not add admin direcotr to robots.txt? can i use this syntax instead? disallowed: /admin/ ???? Link to comment Share on other sites More sharing options...
xnewbi3x Posted December 29, 2005 Author Share Posted December 29, 2005 here is a copy of my robot text... i intend to manually added bots, and disallow * (wildcard bots) I chop off the rest so the list look short and easier for you guy to look. i add disallow to every user-agent. and disallow / for user-agent : * please let me know if this is the right way to do it? User-agent: Mozilla/3.0 (compatible;miner;mailto:[email protected]) Disallow: Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ User-agent: WebFerret Disallow: Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ User-agent: Due to a deficiency in Java it's not currently possible to set the User-agent. Disallow: Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ User-agent: no Disallow: Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ User-agent: 'Ahoy! The Homepage Finder' Disallow: Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ User-agent: Arachnophilia Disallow: Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ User-agent: * Disallow: / Disallow: /images/ Disallow: /admin/ Disallow: /shop/ Disallow: /includes/ Link to comment Share on other sites More sharing options...
Guest Posted December 31, 2005 Share Posted December 31, 2005 what does it mean do not add admin direcotr to robots.txt? can i use this syntax instead? disallowed: /admin/ ???? let's say i'm a bad guy and want to hack your site. a quick way to know your private directories is to read your robots.txt (which is public - ANYONE can read it) i would advise against keeping your admin directory named "admin" and putting it in robots.txt i would also advise against putting any private filenames in robots.txt as well. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.