Google May Expand Unsupported Robots.txt Rules List

April 23, 2026

Google could broaden the checklist of unsupported robots.txt guidelines in its documentation primarily based on evaluation of real-world robots.txt information collected via HTTP Archive.

Gary Illyes and Martin Splitt described the challenge on the most recent episode of Search Off the Report. The work began after a neighborhood member submitted a pull request to Google’s robots.txt repository proposing two new tags be added to the unsupported checklist.

Illyes defined why the workforce broadened the scope past the 2 tags within the PR:

“We tried to not do issues arbitrarily, however slightly gather information.”

Reasonably than add solely the 2 tags proposed, the workforce determined to take a look at the highest 10 or 15 most-used unsupported guidelines. Illyes mentioned the aim was “a good place to begin, a good baseline” for documenting the most typical unsupported tags within the wild.

Table of Contents

How The Analysis Labored

The workforce used HTTP Archive to check what guidelines web sites use of their robots.txt information. HTTP Archive runs month-to-month crawls throughout tens of millions of URLs utilizing WebPageTest and shops the leads to Google BigQuery.

The primary try hit a wall. The workforce “shortly discovered that nobody is definitely requesting robots.txt information” through the default crawl, which means the HTTP Archive datasets don’t sometimes embrace robots.txt content material.

After consulting with Barry Pollard and the HTTP Archive neighborhood, the workforce wrote a customized JavaScript parser that extracts robots.txt guidelines line by line. The customized metric was merged earlier than the February crawl, and the ensuing information is now accessible within the custom_metrics dataset in BigQuery.

What The Knowledge Exhibits

The parser extracted each line that matched a field-colon-value sample. Illyes described the ensuing distribution:

“After enable and disallow and consumer agent, the drop is extraordinarily drastic.”

Past these three fields, rule utilization falls into a protracted tail of much less frequent directives, plus junk information from damaged information that return HTML as a substitute of plain textual content.

Google at present helps 4 fields in robots.txt. These fields are user-agent, enable, disallow, and sitemap. The documentation says different fields “aren’t supported” with out itemizing which unsupported fields are commonest within the wild.

Google has clarified that unsupported fields are ignored. The present challenge extends that work by figuring out particular guidelines Google plans to doc.

The highest 10 to fifteen most-used guidelines past the 4 supported fields are anticipated to be added to Google’s unsupported guidelines checklist. Illyes didn’t identify particular guidelines that may be included.

Typo Tolerance Could Increase

Illyes mentioned the evaluation additionally surfaced frequent misspellings of the disallow rule:

“I’m in all probability going to broaden the typos that we settle for.”

His phrasing implies the parser already accepts some misspellings. Illyes didn’t decide to a timeline or identify particular typos.

Why This Issues

Search Console already surfaces some unrecognized robots.txt tags. If Google paperwork extra unsupported directives, that might make its public documentation extra intently mirror the unrecognized tags individuals already see surfaced in Search Console.

Trying Forward

The deliberate replace would have an effect on Google’s public documentation and the way disallow typos are dealt with. Anybody sustaining a robots.txt file with guidelines past user-agent, enable, disallow, and sitemap ought to audit for directives which have by no means labored for Google.

The HTTP Archive information is publicly queryable on BigQuery for anybody who needs to look at the distribution immediately.

Featured Picture: Screenshot from: YouTube.com/GoogleSearchCentral, April 2026.

Google May Expand Unsupported Robots.txt Rules List

How The Analysis Labored

What The Knowledge Exhibits

Typo Tolerance Could Increase

Why This Issues

Trying Forward

Search And Agents Are One Product. You Only Need One Playbook

New WordPress Plugin Safely And Easily Connects AI To Your Website

How ChatGPT Actually Picks Sources

LEAVE A REPLY Cancel reply

Most Popular

TikTok Adds Post Scheduling to Studio App

What The Scrub Daddy Tells Us About The Perfect...

Threads Adds Image Sharing in DMs

10 New YouTube Marketing Strategies With Fresh Examples For...

Apple Marketing Strategy: What Brands Can Learn & Apply...

14 Digital Content Types You’re Probably Not Using Enough

What Content Works Well In LLMs?

EDITOR PICKS

Meta Lowers Age Limits for VR Social Experiences

Instagram Launches ‘Blend’ Reels Sharing Feature

Meta Unveils AI-Powered Ad Tools at Newfronts

Popular News

50 Service Business Ideas You Can Easily Start From Home

By July 2027, Diageo shares could turn £5,000 into…

Why performance marketing needs clean data before AI adoption

POPULAR Tags

Popular Tags

ABOUT US

FOLLOW US

Google May Expand Unsupported Robots.txt Rules List

How The Analysis Labored

What The Knowledge Exhibits

Typo Tolerance Could Increase

Why This Issues

Trying Forward

Related posts:

LEAVE A REPLY Cancel reply

Most Popular

EDITOR PICKS

Popular News

POPULAR Tags

Popular Tags

ABOUT US

FOLLOW US