For roughly twenty years, the search engine optimization self-discipline operated on a quiet assumption that turned out to be certainly one of its Most worthy options. Steering from one search engine traveled. If Google stated sitemaps mattered, Bing stated sitemaps mattered. If Bing stated structured information deserved actual effort, Google stated the identical. Practitioners optimized for Google with cheap confidence that the work would carry throughout the opposite engines, and more often than not it did. That portability was not luck. It was the product of a structurally massive overlap layer that the foremost serps had collectively constructed, brick by brick, over twenty years.
That world doesn’t exist in LLM-land. The foremost suppliers practice on completely different corpora, run completely different crawlers beneath completely different insurance policies, route completely different queries via completely different retrieval techniques, and apply completely different alignment processes that form the ultimate response in methods the upstream alerts can’t predict. Steering from anybody supplier, together with Google’s steerage about its personal Gemini merchandise, is one information level. Practitioners carrying the search engine optimization behavior ahead, the behavior of treating one engine’s steerage as roughly the entire map, will optimize confidently for one platform and miss the others.
Sidebar: As I used to be finalizing this piece, Google printed contemporary steerage on optimizing for his or her generative AI options. Their framing is express: from Google Search’s perspective, optimizing for AI search remains to be search engine optimization. That framing is correct for Google Search. It doesn’t lengthen to ChatGPT, Claude, Perplexity, or some other LLM, and that’s exactly the entice this text is about.
The Shared Requirements That Made search engine optimization Steering Transportable
The period of moveable steerage was constructed on precise collaboration, not coincidence. The Sitemaps protocol turned the joint property of Google, Yahoo, and Microsoft in November 2006, when the three engines formally agreed to help a standard protocol at model 0.90, constructing on Google’s earlier Sitemaps 0.84 from June 2005. 5 years later, on June 2, 2011, the identical three engines launched Schema.org, with Yandex becoming a member of shortly after, to create a standard vocabulary for structured information markup. That was the announcement that received made on stage at SMX Superior. I used to be on the Bing group on the time, and what struck me then is what nonetheless issues now. The engines had been opponents, however they’d determined {that a} shared vocabulary served all of them. Site owners received one algorithm. The net received cleaner information. The engines received higher alerts. All people received.
The sample repeated with robots.txt, the 1994 conference that turned RFC 9309 on the IETF in 2022, formalizing what each critical crawler already honored. And it repeated once more, extra just lately, with IndexNow, the protocol Microsoft Bing and Yandex launched in October 2021. IndexNow is now supported by Bing, Yandex, Naver, Seznam, and Yep. Google has examined the protocol since 2021, however has not adopted it.
That overlap layer is strictly why Google’s steerage felt secure to observe, even for those who cared about Bing visitors. The alerts the engines used weren’t an identical, however the inputs they accepted, the protocols they honored, and the requirements they marketed had been. Optimization had a shared substrate.
The place The LLM Stacks Really Diverge
The LLM surroundings doesn’t have a shared substrate of comparable measurement. The variations will not be beauty, and they aren’t short-term. They’re baked into how the techniques are constructed.
Begin with coaching information. OpenAI has signed disclosed licensing offers with Information Corp price as much as $250 million over 5 years, Axel Springer at roughly $13 million per yr, Reddit at an estimated $70 million per yr, plus the Monetary Instances, Condé Nast, Hearst, Vox Media, The Atlantic, the Related Press, Le Monde, and others. Google has its personal Reddit deal, estimated at $60 million per yr, granting real-time information API entry. Anthropic has not publicly disclosed equal writer licensing offers, and that undisclosed standing is itself the practitioner-facing level. The corpora that fed these fashions, and that proceed to refresh them, will not be the identical paperwork. Practitioners can’t know what any given supplier has paid for and what it hasn’t.
The crawler infrastructure diverges subsequent. OpenAI runs three separate bots: GPTBot for coaching, OAI-SearchBot for search indexing, and ChatGPT-Consumer for user-initiated retrieval. Anthropic runs three of its personal: ClaudeBot for coaching, Claude-SearchBot for search, and Claude-Consumer for user-initiated retrieval. Perplexity runs PerplexityBot and Perplexity-Consumer. Google launched Google-Prolonged in September 2023 because the user-agent that controls whether or not Google can use a web site’s content material to coach Gemini, separate fully from the Googlebot that handles conventional search indexing. There isn’t a single AI user-agent. Each supplier requires a separate rule, and the foundations don’t translate cleanly throughout suppliers as a result of the bots don’t do equal jobs in equal methods.
The retrieval architectures diverge structurally. ChatGPT has traditionally used Bing’s index as its major internet search supply, and that connection seems to nonetheless be major, although OpenAI continues to construct out extra infrastructure alongside it. Perplexity constructed its retrieval system on a Vespa-based pipeline that treats paperwork and sub-document chunks as first-class retrievable models. Google’s Gemini makes use of Google’s personal index plus Data Graph grounding. Claude makes use of Courageous Search as a retrieval accomplice. Similar question, 4 completely different retrieval techniques, 4 completely different views of which sources exist and which sources are price surfacing.
Then comes the alignment layer, which is the place search engine optimization had no equal in any respect. After a mannequin is skilled on its corpus, suppliers run post-training to form how the mannequin really behaves: tone, refusal patterns, format, security posture, what counts as a very good reply. OpenAI’s major method has been RLHF, or Reinforcement Studying from Human Suggestions, the place human raters rating mannequin outputs and the mannequin learns to provide extremely rated responses. Anthropic developed Constitutional AI, which trains fashions to critique and revise their very own outputs in opposition to a written set of ideas. These methodologies produce demonstrably completely different habits within the ultimate merchandise. The identical retrieved content material, fed into two fashions aligned by two methodologies, can yield two materially completely different responses about the identical model.
When One Supplier’s Steering Demonstrably Fails To Port
The clearest single instance of steerage that doesn’t port is llms.txt. Jeremy Howard of Reply.AI proposed the file in September 2024 as a markdown manifest, positioned at a web site’s root, that will information LLMs to a very powerful content material. The proposal received picked up throughout the search engine optimization neighborhood. Yoast constructed a generator. Businesses added llms.txt creation to their service catalogs. Convention audio system declared it important.
As of mid-2026, no main LLM supplier has confirmed they eat the file. Not OpenAI. Not Anthropic. Not Google. Server-log analyses throughout lots of of 1000’s of domains present main AI crawlers don’t routinely request /llms.txt in any respect. Google’s John Mueller publicly in contrast it to the deprecated meta key phrases tag. Gary Illyes confirmed at Search Central Stay in July 2025 that Google doesn’t help llms.txt and isn’t planning to.
I’ve written about this elsewhere, so I received’t repeat the technicalities right here. What issues for this argument is the structural lesson. Schema.org succeeded as a result of three engines constructed it collectively after which enforced it collectively. Llms.txt was proposed by one researcher, picked up by tooling distributors, and ignored by the platforms it was imagined to serve. The shared-standards mannequin that gave search engine optimization its moveable steerage just isn’t out there to LLM practitioners on the similar scale, as a result of the platforms will not be constructing the requirements collectively. They’re constructing their very own pipelines.
The Gemini Inversion
The cleanest illustration of how far steerage portability has degraded sits inside one firm. Google publishes its personal search engine optimization documentation at Search Central, the canonical steerage the trade has adopted for twenty years. These paperwork emphasize conventional rating alerts, E-E-A-T, content material high quality, technical accessibility, and structured information. That steerage remains to be helpful for Google Search itself.
Google additionally makes Gemini, the mannequin that powers AI Overviews and Google’s separate AI Mode floor. And the quotation habits of these surfaces doesn’t seem to trace the steerage the identical firm publishes for its personal search outcomes.
In late 2024, roughly three-quarters of pages cited in AI Overviews additionally ranked in Google’s high 12 for a similar question. By early 2026, after Google upgraded AI Overviews to Gemini 3 in January, Ahrefs analyzed 4 million AI Overview URLs and located that solely 38% of cited pages additionally appeared within the high 10 for a similar question. A separate BrightEdge evaluation put the overlap nearer to 17%. SE Rating’s post-upgrade work discovered that Gemini 3 changed roughly 42% of the domains beforehand cited beneath earlier mannequin variations and generates 32% extra sources per response.
The hole widens additional whenever you have a look at Google’s AI Mode, which is a separate conversational floor that runs on the identical Gemini household. Semrush information exhibits AI Mode and AI Overviews attain semantically related conclusions 86% of the time, however cite the identical URLs solely 13.7% of the time. Solely 14% of AI Mode citations rank in Google’s conventional high 10.
It seems, up to now, that the canonical relationship has shifted. Google’s printed search engine optimization steerage remains to be the cleanest path to rating in Google Search. However that rating is now not a dependable proxy for being cited by Google’s personal AI surfaces. The identical steerage, the identical content material, the identical area, can produce three meaningfully completely different outcomes throughout Google Search, AI Overviews, and AI Mode, though all three reside inside the identical firm. The outdated playbook of following the search engine’s steerage and trusting that the engine’s different surfaces would behave persistently doesn’t seem like delivering the identical returns it used to.
What Nonetheless Ports, And Why It’s Smaller Than It Appears to be like
A common layer does survive. Crawler accessibility nonetheless issues throughout each supplier. Main-source factual content material nonetheless wins extra citations than aggregator restatement. Clear retrievable construction nonetheless helps each system perceive what a web page is about. Presence on the high-authority sources that each one main LLMs disproportionately cite, Wikipedia, YouTube, Reddit, main information retailers, nonetheless features as a pressure multiplier throughout platforms. Incomes visibility on these sources provides content material an opportunity to floor in any LLM that attracts on them.
However the common layer is far smaller than it was within the search engine optimization period. Qwairy’s evaluation of 118,000 AI responses throughout ChatGPT, Perplexity, Google AI Mode, and Claude discovered that solely 11% of cited domains appeared throughout a number of platforms. The opposite 89% had been platform-specific. A model that wins citations on Perplexity could also be largely invisible on Claude. A model that’s a daily reference on ChatGPT might not present up in AI Overviews in any respect. The identical content material might be the best reply for one system and the improper reply for the system subsequent to it.
What This Means For The Work
The sensible implication just isn’t abandoning all hope. It’s that practitioners must cease treating any single LLM supplier’s steerage because the common map and begin treating it as one enter amongst a number of. Learn what each main supplier publishes about their very own techniques. Take a look at your visibility throughout platforms, not simply on the platform you occur to make use of most. Deal with divergence because the default and overlap because the exception, not the opposite manner round.
This isn’t how search engine optimization labored, and the distinction issues. The outdated reflex was to optimize for Google and belief the portability. The brand new actuality is that following one LLM’s steerage, even Google’s steerage about Gemini, will depart you optimized for a slice of the panorama and probably blind to the remainder. The self-discipline is being rebuilt on platform-specific work that didn’t exist within the search engine optimization period, and the practitioners who acknowledge that first are going to spend the subsequent two years setting the requirements everybody else follows.
The overlap has shrunk. You now have extra work than ever to perform.
If in case you have ideas on the place the divergence between suppliers is sharpest in your personal work, attain out instantly. I’d genuinely like to listen to what’s exhibiting up within the information.
Extra Assets:
This publish was initially printed on Duane Forrester Decodes.
Featured Picture: Rawpixel.com/Shutterstock; Paulo Bobita/Search Engine Journal
