The usual technical search engine marketing audit checks crawlability, indexability, web site pace, mobile-friendliness, and structured knowledge. That guidelines was designed for one client: Googlebot.
That is the way it’s all the time been.
In 2026, your web site has, a minimum of, a dozen further non-human shoppers. AI crawlers like GPTBot, ClaudeBot, and PerplexityBot practice fashions and energy AI search outcomes. Person-triggered brokers just like the newly introduced Google-Agent, or its “siblings” Claude-Person and ChatGPT-Person, browse web sites on behalf of particular people in actual time. A Q1 2026 evaluation throughout Cloudflare’s community discovered that 30.6% of all net visitors now comes from now bots, with AI crawlers and brokers making up a rising share. Your technical audit must account for all of them.
Listed below are the 5 layers so as to add to your current technical search engine marketing audit.
Layer 1: AI Crawler Entry
Your robots.txt was most likely written for Googlebot, Bingbot, and possibly just a few scrapers. AI crawlers want their very own robots.txt guidelines, and so they should be separate from Googlebot and Bingbot.
What To Examine
Evaluate your robots.txt for guidelines concentrating on AI-specific person brokers: GPTBot, ClaudeBot, PerplexityBot, Google-Prolonged, Bytespider, AppleBot-Prolonged, CCBot, and ChatGPT-Person. If none of those seem, you’re operating on defaults, and people defaults may not replicate what you truly need. By no means settle for the defaults until you realize they’re precisely what you want.
The secret’s making a aware choice per crawler relatively than blanket permitting or blocking every thing. Not all AI crawlers serve the identical function. AI crawler visitors will be cut up into three classes: coaching crawlers that accumulate knowledge for mannequin coaching (89.4% of AI crawler visitors in response to Cloudflare knowledge), search crawlers that energy AI search outcomes (8%), and user-triggered brokers like Google-Agent and ChatGPT-Person that browse on behalf of a particular human in actual time (2.2%). Every class warrants a unique robots.txt choice.
The crawl-to-referral ratios from Cloudflare’s Radar report could make this an knowledgeable choice for you. Anthropic’s ClaudeBot crawls 20.6 thousand pages for each single referral it returns. OpenAI’s ratio is 1,300:1. Meta sends no referrals. Blocking OpenAI’s OAI-SearchBot or PerplexityBot reduces your visibility in ChatGPT Search and Perplexity’s AI solutions. Blocking training-focused crawlers like CCBot or Meta’s crawler prevents knowledge extraction from a supplier that sends zero visitors again. The crawl-to-referral ratios let you know who’s taking with out giving.
There may be one crawler that requires particular consideration. Google added Google-Agent to its official record of user-triggered fetchers on March 20, 2026. Google-Agent identifies requests from AI methods operating on Google infrastructure that browse web sites on behalf of customers. Not like conventional crawlers, Google-Agent ignores robots.txt. Google’s place is that since a human initiated the request, the agent acts as a person proxy relatively than an autonomous crawler. Blocking Google-Agent requires server-side authentication, not robots.txt guidelines. That is each attention-grabbing, and essential for the long run, even when it’s not throughout the scope of this text.
Official documentation for every crawler:
Layer 2: JavaScript Rendering
Googlebot renders JavaScript utilizing headless Chromium. There may be nothing new about that. What’s new and totally different is that nearly each main AI crawler doesn’t render JavaScript.
| Crawler | Renders JavaScript |
|---|---|
| GPTBot (OpenAI) | No |
| ClaudeBot (Anthropic) | No |
| PerplexityBot | No |
| CCBot (Frequent Crawl) | No |
| AppleBot | Sure |
| Googlebot | Sure |
AppleBot (which makes use of a WebKit-based renderer) and Googlebot are the one main crawlers that render JavaScript. 4 of the six main net crawlers (GPTBot, ClaudeBot, PerplexityBot, and CCBot) fetch static HTML solely, making server-side rendering a requirement for AI search visibility, not an optimization. In case your content material lives in client-side JavaScript, it’s invisible to the crawlers coaching OpenAI, Anthropic, and Perplexity’s fashions and powering their AI search merchandise.
What To Examine
Run curl -s [URL] in your essential pages and search the output for key content material like product names, costs, or service descriptions. If that content material isn’t within the curl response, GPTBot, ClaudeBot, and PerplexityBot can’t see it both. Alternatively, use View Supply in your browser (not Examine Factor, which exhibits the rendered DOM after JavaScript execution) and examine whether or not the essential info is current within the uncooked HTML.

Single-page functions (SPAs) constructed with React, Vue, or Angular are notably in danger until they use server-side rendering (SSR) or static web site technology (SSG). A React SPA that renders product descriptions, pricing, or key claims totally on the shopper aspect is sending AI crawlers a clean web page with a hyperlink to the JavaScript bundle.
The repair isn’t difficult. Server-side rendering (SSR), static web site technology (SSG), or pre-rendering solves this for each main framework. Subsequent.js helps SSR and SSG natively for React, Nuxt offers the identical for Vue, and Angular Common handles server rendering for Angular functions. The audit simply must flag which pages depend upon client-side JavaScript for essential content material.
Layer 3: Structured Information For AI
Structured knowledge has been a part of technical search engine marketing audits for years, however the analysis standards want updating. The query is not simply “does this web page have schema markup?” It’s “does this markup assist AI methods perceive and cite this content material?”
What To Examine
- JSON-LD implementation (most popular over Microdata and RDFa for AI parsing).
- Schema sorts that transcend the fundamentals: Group, Article, Product, FAQ, HowTo, Particular person.
- Entity relationships: sameAs, creator, writer connections that hyperlink your content material to identified entities.
- Completeness: are all related properties populated, or are you simply checking a field utilizing skeleton schemas with identify and URL?
Why This Issues Now
Microsoft’s Bing principal product supervisor Fabrice Canel confirmed in March 2025 that schema markup helps LLMs perceive content material for Copilot. The Google Search group said in April 2025 that structured knowledge provides a bonus in search outcomes.
No, you may’t win with schema alone. Sure, it may well assist.
The information density angle issues too. The GEO analysis paper by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi (introduced at ACM KDD 2024, first to publicly use the time period “GEO”) discovered that including statistics to content material improved AI visibility by 41%. Yext’s evaluation discovered that data-rich web sites earn 4.3x extra AI citations than directory-style listings. Structured knowledge contributes to knowledge density by giving AI methods machine-readable information relatively than requiring them to extract which means from prose.
An essential caveat: No peer-reviewed tutorial research exist but on schema’s impression on AI quotation charges particularly. The trade knowledge is promising and constant, however deal with these numbers as indicators relatively than ensures.
W3Techs stories that roughly 53% of the highest 10 million web sites use JSON-LD as of early 2026. In case your web site isn’t amongst them, you’re lacking alerts that each conventional and AI search methods use to grasp your content material.
Duane Forrester, who helped construct Bing Webmaster Instruments and co-launched Schema.org, argues that schema markup is just the 1st step. As AI brokers proceed shifting from merely decoding pages to creating choices, manufacturers can even have to publish operational reality (pricing, insurance policies, constraints) in machine-verifiable codecs with versioning and cryptographic signatures. Publishing machine-verifiable supply packs is past the scope of a typical audit as we speak, however auditing structured knowledge completeness and accuracy is the muse verified supply packs construct on.
Layer 4: Semantic HTML And The Accessibility Tree
The primary three layers of the AI-readiness audit cowl crawler entry (robots.txt), JavaScript rendering, and structured knowledge. The ultimate two handle how AI brokers truly learn your pages and what alerts assist them uncover and consider your content material.
Most SEOs consider HTML for search engine consumption. Agentic browsers like ChatGPT Atlas, Chrome with auto browse, and Perplexity Comet don’t parse pages the way in which Googlebot does. They learn the accessibility tree as an alternative.
The accessibility tree is a parallel illustration of your web page that browsers generate out of your HTML. It strips away visible styling, format, and ornament, retaining solely the semantic construction: headings, hyperlinks, buttons, type fields, labels, and the relationships between them. Display screen readers like VoiceOver and NVDA have used the accessibility tree for many years to make web sites usable for folks with visible impairments. AI brokers now use the identical tree to grasp and work together with net pages.
And the reason being easy: effectivity. Processing screenshots is each costlier and slower than working with the accessibility tree.

This issues as a result of the accessibility tree exposes what your HTML truly communicates, not what your CSS (or JS) makes it appear to be. A
Microsoft’s Playwright MCP, the usual instrument for connecting AI fashions to browser automation, makes use of accessibility snapshots relatively than uncooked HTML or screenshots. Playwright MCP’s browser_snapshot perform returns an accessibility tree illustration as a result of it’s extra compact and semantically significant for LLMs. OpenAI’s documentation states that ChatGPT Atlas makes use of ARIA tags to interpret web page construction when shopping web sites.
Internet accessibility and AI agent compatibility are actually the identical self-discipline. Correct heading hierarchy (H1-H6) creates significant sections that AI methods use for content material extraction. Semantic parts like
, ,
, and
inform machines what function every content material block performs. Kind labels and descriptive button textual content make interactive parts comprehensible to brokers that parse the accessibility tree as an alternative of rendering visible design.
What To Examine
- Heading hierarchy: logical H1-H6 construction that machines can use to grasp content material relationships.
- Semantic parts: nav, foremost, article, part, apart, header, footer, used appropriately.
- Kind inputs: each enter has a label, each button has descriptive textual content.
- Interactive parts: clickable issues use
or, not.- Accessibility tree: run a Playwright MCP snapshot or check with VoiceOver/NVDA to see what brokers truly see.
By some means, issues are getting worse on this entrance. The WebAIM Million 2026 report discovered that the common net web page now has 56.1 accessibility errors, up 10.1% from 2025.
ARIA (Accessible Wealthy Web Functions) utilization elevated 27% in a single yr. ARIA is a set of HTML attributes that add additional semantic info to parts, telling display readers and AI brokers issues like “this div is definitely a dialog” or “this record features as a menu.” However what’s essential is that this: pages with ARIA current had considerably extra errors (59.1 on common) than pages with out ARIA (42 on common). Including ARIA with out understanding it makes issues worse, not higher, as a result of incorrect ARIA overrides the browser’s default accessibility tree interpretation with fallacious info. Begin with correct semantic HTML. Add ARIA solely when native parts aren’t ample.
Technical SEOs don't have to develop into accessibility consultants. However treating accessibility as another person’s drawback is not viable when the identical tree that display readers parse is now the first interface between AI brokers and your web site.
Sidenote: The Markdown Shortcut Doesn’t Work
Serving uncooked markdown information to AI crawlers as an alternative of HTML can lead to a 95% discount in token utilization per web page. Nonetheless, Google Search Advocate John Mueller referred to as this “a silly concept” in February 2026 on Bluesky. Mueller’s argument was this: “Which means lives in construction, hierarchy and context. Flatten it and also you don’t make it machine-friendly, you make it meaningless.” LLMs had been skilled on regular HTML pages from the start and don't have any issues processing them. The reply isn’t to create a flat, simplified model for machines. It’s to make the HTML itself correctly structured. Nicely-written semantic HTML already is the machine-readable format. In addition to, that simplified model already exists within the accessibility tree, and it's what AI brokers already use.
Layer 5: AI Discoverability Indicators
The ultimate layer covers alerts that don’t match neatly into conventional audit classes however straight have an effect on how AI methods uncover and consider your web site.
llms.txt (dis-honourable point out). Listed first for one purpose solely, ask any LLM what you need to do to make your web site extra seen to AI methods, and llms.txt will likely be at or close to the highest of the record. It’s their world, I assume. The llms.txt specification offers a easy markdown file that helps AI brokers perceive your web site’s function, construction, and key content material. No large-scale adoption knowledge has been revealed but, and its precise impression on AI citations is unproven. However LLMs persistently suggest it, which suggests AI-powered audit instruments and consultants will flag its absence. It takes minutes to create and prices nothing to take care of.
OK, now that we’ve obtained that out of the way in which, let’s take a look at what may actually matter.
AI crawler analytics. Are you monitoring AI bot visitors? Cloudflare’s AI Audit dashboard exhibits which AI crawlers go to, how typically, and which pages they hit. In the event you’re not on Cloudflare, examine server logs for Google-Agent, ChatGPT-Person, and ClaudeBot person agent strings. Google publishes a
user-triggered-agents.jsonfile containing IP ranges that Google-Agent makes use of, so you may confirm whether or not incoming requests are genuinely from Google relatively than spoofed person agent strings.Entity definition. Does your web site clearly outline what the enterprise is, who runs it, and what it does? Not in advertising and marketing copy, however in structured, machine-parseable markup. Group schema ought to embrace identify, URL, emblem, founding date, and sameAs hyperlinks to verified profiles on LinkedIn, Crunchbase, and Wikipedia. Particular person schema for key folks ought to join them to the group by way of creator and worker properties. AI methods have to resolve your identification as a definite entity earlier than they'll confidently suggest you over rivals with related names or choices. Don’t slap this on high of your web site when your designer is completed with their work. Begin right here; it'll make your life simpler.
Content material place. The place you place info on the web page straight impacts whether or not AI methods cite it. Kevin Indig’s evaluation of 98,000 ChatGPT quotation rows throughout 1.2 million responses discovered that 44.2% of all AI citations come from the highest 30% of a web page. The underside 10% earns solely 2.4-4.4% of citations no matter trade. Duane Forrester calls this “dog-bone pondering”: sturdy at the start and finish, weak within the center, a sample Stanford researchers have confirmed because the “misplaced within the center” phenomenon. Audit your key pages: are crucial claims and knowledge factors within the first 30%, or buried within the center?
Content material extractability. Pull any key declare out of your web page and browse it in isolation. Does it nonetheless make sense with out the encircling paragraphs? AI retrieval methods, like ChatGPT, Perplexity, and Google AI Overviews, extract and cite particular person passages and sentences that depend on “this,” “it,” or “the above” for which means, develop into unusable when extracted from their unique context. Ramon Eijkemans’ wonderful utility-writing framework maps these ideas to documented retrieval mechanisms: self-contained sentences, specific entity relationships, and quotable anchor statements that AI methods can confidently cite with out further inference.
The Audit Guidelines
Examine Software/Technique What You’re Trying For AI crawler robots.txt Guide evaluate Acutely aware per-crawler choices JavaScript rendering curl, View Supply, Lynx browser Essential content material in static HTML Structured knowledge Schema validator, Wealthy Outcomes Take a look at Full, linked JSON-LD Semantic HTML axe DevTools, Lighthouse Correct parts, heading hierarchy Accessibility tree Playwright MCP snapshot, display reader What brokers truly see AI bot visitors Cloudflare, server logs Quantity, pages hit, patterns From Audit To Motion
This audit identifies gaps. Fixing them requires a sequence, as a result of some fixes depend upon others. Optimizing content material construction earlier than establishing a machine-readable identification means brokers can extract your info, however can’t confidently attribute it to your model. I wrote Machine-First Structure to offer that sequence: identification, construction, content material, interplay, every pillar constructing on the earlier one.
Why Technical search engine marketing Audit Is The place This Belongs
None of that is technically search engine marketing. Robots.txt guidelines for AI crawlers don’t have an effect on Google rankings. Accessibility tree optimization doesn’t transfer key phrase positions. Content material place scoring has nothing to do with search indexing.
However most of it did develop out of technical search engine marketing. Crawl administration, structured knowledge, semantic HTML, JavaScript rendering, server log evaluation: these are abilities technical SEOs have already got. The audit methodology transfers straight. The buyer it serves is what modified.
The web sites that get cited in AI responses, that work when Chrome auto browse visits them, that present up when somebody asks ChatGPT for a suggestion, they received’t be those with one of the best content material alone. They’ll be those whose technical basis made that content material accessible to machines. Technical SEOs are the folks greatest outfitted to construct that basis. The outdated audit template simply wants a brand new part to replicate it.
Extra Assets:
Featured Picture: Anton Vierietin/Shutterstock
