Some builders have been experimenting with bot-specific Markdown supply as a method to scale back token utilization for AI crawlers.
Google Search Advocate John Mueller pushed again on the thought of serving uncooked Markdown recordsdata to LLM crawlers, elevating technical considerations on Reddit and calling the idea “a silly thought” on Bluesky.
What’s Occurring
A developer posted on r/TechSEO, describing plans to make use of Subsequent.js middleware to detect AI person brokers similar to GPTBot and ClaudeBot. When these bots hit a web page, the middleware intercepts the request and serves a uncooked Markdown file as an alternative of the total React/HTML payload.
The developer claimed early benchmarks confirmed a 95% discount in token utilization per web page, which they argued ought to improve the positioning’s ingestion capability for retrieval-augmented technology (RAG) bots.
Mueller responded with a collection of questions.
“Are you certain they’ll even acknowledge MD on a web site as something apart from a textual content file? Can they parse & observe the hyperlinks? What’s going to occur to your web site’s inner linking, header, footer, sidebar, navigation? It’s one factor to present it a MD file manually, it appears very completely different to serve it a textual content file once they’re in search of a HTML web page.”
On Bluesky, Mueller was extra direct. Responding to technical search engine optimization guide Jono Alderson, who argued that flattening pages into Markdown strips out that means and construction,
Mueller wrote:
“Changing pages to markdown is such a silly thought. Do you know LLMs can learn photographs? WHY NOT TURN YOUR WHOLE SITE INTO AN IMAGE?”
Alderson argued that collapsing a web page into Markdown removes necessary context and construction, and framed Markdown-fetching as a comfort play reasonably than a long-lasting technique.
Different voices within the Reddit thread echoed the considerations. One commenter questioned whether or not the trouble may restrict crawling reasonably than improve it. They famous that there’s no proof that LLMs are skilled to favor paperwork which can be much less resource-intensive to parse.
The unique poster defended the speculation, arguing LLMs are higher at parsing Markdown than HTML as a result of they’re closely skilled on code repositories. That declare is untested.
Why This Issues
Mueller has been constant on this. In a earlier change, he responded to a query from Lily Rayabout creating separate Markdown or JSON pages for LLMs. His place then was the identical. He stated to give attention to clear HTML and structured information reasonably than constructing bot-only content material copies.
That response adopted SE Rating’s evaluation of 300,000 domains, which discovered no connection between having an llms.txt file and the way usually a site will get cited in LLM solutions. Moreover, Mueller has in contrast llms.txt to the key phrases meta tag, a format main platforms haven’t documented as one thing they use for rating or citations.
To date, public platform documentation hasn’t proven that bot-only codecs, similar to Markdown variations of pages, enhance rating or citations. Mueller raised the identical objections throughout a number of discussions, and SE Rating’s information discovered nothing to recommend in any other case.
Wanting Forward
Till an AI platform publishes a spec requesting Markdown variations of net pages, the perfect observe stays as it’s. Preserve HTML clear, scale back pointless JavaScript that blocks content material parsing, and use structured information the place platforms have documented schemas.
