bonbec Posted July 3, 2019 Posted July 3, 2019 An interesting reading: https://webmasters.googleblog.com/2019/07/a-note-on-unsupported-rules-in-robotstxt.html with OsC 2.2 since 2006 ...
♥Gyakutsuki Posted July 3, 2019 Posted July 3, 2019 Thank you for this information. Regards ----------------------------------------- Loïc Contact me by skype for business Contact me @gyakutsuki for an answer on the forum
♥JcMagpie Posted July 3, 2019 Posted July 3, 2019 Should not be a problem most pages can simply replace noindex with Disallow: to stop google indexing. I personaly havent used noindex for a long time.
Hotclutch Posted July 3, 2019 Posted July 3, 2019 you can't use a robots.txt disallow directive to stop google indexing. You have to use a noindex meta tag for that, which has nothing to do with robots.txt
Jack_mcs Posted July 3, 2019 Posted July 3, 2019 Google says not to try to block using the robots file (see the "You should not use robots.txt..."). Their reason for the robots change is that they are trying to establish a standard, which they will probably achieve. So we all need to start adjusting our thinking to be what they want. :( Support Links: For Hire: Contact me for anything you need help with for your shop: upgrading, hosting, repairs, code written, etc. All of My Addons Get the latest versions of my addons Recommended SEO Addons
♥JcMagpie Posted July 3, 2019 Posted July 3, 2019 3 minutes ago, Hotclutch said: you can't use a robots.txt disallow directive to stop google indexing I know it has nothing to do with indexing, it is however one of the recomended alternatives listed by google and I have been using it for years. As google says if you have content you dont wish to be seen then you can pasword protect it or use disallow, if how ever you dont with it to be indexed but still wish it to be seen the you have to use one other the other alternatives. As allways if your not sure get professional help. " For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options: Noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed. 404 and 410 HTTP status codes: Both status codes mean that the page does not exist, which will drop such URLs from Google's index once they're crawled and processed. Password protection: Unless markup is used to indicate subscription or paywalled content, hiding a page behind a login will generally remove it from Google's index. Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future. Search Console Remove URL tool: The tool is a quick and easy method to remove a URL temporarily from Google's search results."
Hotclutch Posted July 3, 2019 Posted July 3, 2019 16 minutes ago, JcMagpie said: Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future. This is not true, and most often misunderstood. If you have something in the index, then putting disallow in the robots.txt won't cause it to drop out of the index. In fact it will now stay there forever, because google cannot crawl the URL to see a noindex directive. Alternatively, if you don't have something in the index, and you put a disallow in the robots.txt because you think it will prevent search engines from listing the content, then you're mistaken, because an external link to that URL will cause the search engine to still list the URL. There are only 2 ways to prevent indexing. 1) meta noindex in the header. 2) 301 the URL A URL that 404s, eventually drops out of the index, but search engines continue to crawl that URL indefinitely, with reduced frequency between crawls. And there's doubt as to how Google handles a 410 response code.
♥JcMagpie Posted July 3, 2019 Posted July 3, 2019 Thankyou for your feedback, I'm happy with my understanding of googles recomendations. Others will have to decide what's best for their website for themselves. As i said above... 1 hour ago, JcMagpie said: As allways if your not sure get professional help. It's not a big issue as all you need to do is turn on the Robot NoIndex header_tags module in CE, so most people should be fine.
Allen Solly Posted July 5, 2019 Posted July 5, 2019 If this will implement in september then for what purpose we will use the robot.txt file.
Hotclutch Posted July 5, 2019 Posted July 5, 2019 2 hours ago, Allen Solly said: If this will implement in september then for what purpose we will use the robot.txt file. The only thing i put in my robots.txt file is a link to sitemap. But putting disallow in the robots.txt can be useful if you're trying to optimise your crawl budget.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.