en.wikipedia.org/wiki/Robots.txt
2 corrections found
Google ignores this directive, but provides an interface in its search console for webmasters, to control the Googlebot's subsequent visits.
This is outdated. Google deprecated the Search Console Crawl Rate Limiter tool in January 2024, and its current documentation says site owners should use server responses or file a special request instead.
Full reasoning
Google still ignores the crawl-delay robots.txt directive, but the article's statement that Google provides an interface in Search Console is no longer correct.
Google announced in an official Search Central blog post that "The crawl rate limiter tool in Search Console is being deprecated on Jan 8th, 2024" and that "we'll be deprecating this tool in Search Console." After that change, Google's current documentation for reducing crawl rate no longer points users to a Search Console interface; instead it says to return 500, 503, or 429 responses for emergencies, or file a special request for unusually high crawl rate.
So the outdated part is specifically the claim that Search Console still offers an interface for this. It did historically, but Google's own documentation says that tool was deprecated and replaced with other mechanisms.
2 sources
- Upcoming deprecation of Crawl Rate Limiter Tool in Search Console | Google Search Central Blog
The crawl rate limiter tool in Search Console is being deprecated on Jan 8th, 2024... we'll be deprecating this tool in Search Console.
- Reduce the Google crawl rate | Google Crawling Infrastructure
If serving errors to Google's crawlers is not feasible on your infrastructure, file a special request to report a problem with unusually high crawl rate... You cannot request an increase in crawl rate.
The Robot Exclusion Standard does not mention the "*" character in the Disallow: statement.
This is incorrect under the current robots.txt standard. RFC 9309 explicitly defines `*` as a supported special character in robots.txt path matching.
Full reasoning
The article says the robots exclusion standard does not mention * in Disallow rules, but the current IETF standard does exactly that.
RFC 9309, Robots Exclusion Protocol (published September 2022), includes a section titled "Special Characters" and states that crawlers "MUST support" *, describing it as "0 or more instances of any character." The RFC also includes examples such as allow: /this/*/exactly and explains wildcard matching in rule paths.
Because RFC 9309 is the current formal specification for robots.txt, the page's statement is factually wrong as written.
2 sources
- RFC 9309: Robots Exclusion Protocol
Section 2.2.3 'Special Characters' says crawlers MUST support '*' and describes it as: 'Designates 0 or more instances of any character.'
- RFC 9309 (text version)
2.2.3. Special Characters: Crawlers MUST support the following special characters ... * | Designates 0 or more instances of any character.