barakas Posted February 22, 2009 Share Posted February 22, 2009 hi, I recently put my site online after a while of offline fiddling, currently I'm aware that this isn't an iron clad security measure, but I'd feel more comfortable if people weren't getting pages to do with my hosting when they looked for my site. I tried putting in a robots.txt, but so far it has had no effect, I believe the problem is in the port of the domain, but i am unsure what to do to remedy the situation here is the links that are appearing on google (i have annonymized the actual address( www.mydomain.com/cpanel www.mydomain.com:2095/unprotected/loader.html and here is the text in my robots.txt Disallow: /frontend Disallow: /unprotected/loader.html Disallow: /unprotected Disallow: /cpanel Disallow: :2095 I put my robots in the main public_html, can anyone help? Link to comment Share on other sites More sharing options...
barakas Posted February 22, 2009 Author Share Posted February 22, 2009 I think i found the error, but still I can't work out how to fix it. My robots.txt isn't loading properly. I put it in my root directory, but upon trying to access it via the www, it comes up with this Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, [email protected] and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request. which basically is the error for files that don't exist. Link to comment Share on other sites More sharing options...
Guest Posted February 22, 2009 Share Posted February 22, 2009 does the error go away if you remove/rename robots.txt? Have you made any recent changes to .htaccess? Link to comment Share on other sites More sharing options...
barakas Posted February 22, 2009 Author Share Posted February 22, 2009 hi, turns out that was just an unrelated server error while i was happening to be messing around with robots.txt The original error remains the same, my robots.txt is like this User-agent: * Disallow: /admin Disallow: /account.php Disallow: /advanced_search.php Disallow: /checkout_shipping.php Disallow: /create_account.php Disallow: /login.php Disallow: /login.php Disallow: /password_forgotten.php Disallow: /popup_image.php Disallow: /shopping_cart.php Disallow: /frontend Disallow: /unprotected/loader.html Disallow: /unprotected Disallow: /cpanel But on google, the result right under my website if you search the name, is the /cpanel listing. I'm not sure what else i can do this domain www.mydomain.com:2095/unprotected/loader.html especially is stumping me, as I don't know how to incorporate the :2095 part into a robots.txt Link to comment Share on other sites More sharing options...
Guest Posted February 22, 2009 Share Posted February 22, 2009 Once something is indexed by a search engine, the only way to get it out is to wait to be reindexed, or to request that the content be deleted. -jared Link to comment Share on other sites More sharing options...
germ Posted February 22, 2009 Share Posted February 22, 2009 Another thing to remember is that is behavior is voluntary not mandatory. Hackers and "bad bots" read the robots.txt file for places you want them to stay out of, but those are the first places they hit. :o So NEVER put anything in robots.txt they couldn't find by other means. You just "tip your hand" if you do. :blush: If I suggest you edit any file(s) make a backup first - I'm not perfect and neither are you. "Given enough impetus a parallelogramatically shaped projectile can egress a circular orifice." - Me - "Headers already sent" - The definitive help "Cannot redeclare ..." - How to find/fix it SSL Implementation Help Like this post? "Like" it again over there > Link to comment Share on other sites More sharing options...
Guest Posted February 22, 2009 Share Posted February 22, 2009 true. robots.txt is for search engine spider directives. .htaccess is for security. -jared Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.