The Science Of How AI Picks Its Sources

March 24, 2026

Enhance your expertise with Development Memo’s weekly skilled insights. Subscribe without cost!

In “The science of how AI pays consideration,” I analyzed 1.2 million ChatGPT responses to know precisely how AI reads a web page. That is Half 2.

The place Half 1 informed you the place on a web page AI seems, this one tells you which pages AI routinely considers.

The information clarifies:

Why ~30 domains personal 67% of citations in any matter.
The web page construction that earns citations throughout 50+ distinct queries vs. the one which will get cited as soon as.
Whether or not the ski ramp from Half 1 is definitely steeper or flatter in your vertical.

Picture Credit score: Kevin Indig

Table of Contents

1. ~30 Domains Personal 67% Of AI Citations Per Matter

Basic search is a winner-takes-all recreation. The highest outcome will get disproportionately extra clicks than the second. Is that additionally true for ChatGPT solutions? Is the distribution of cited domains democratic or totalitarian?

Method:

Compute the quotation share per area per vertical.
Calculate the cumulative share captured by the highest 10% of domains.
Dataset: 21,482 ChatGPT quotation rows, 670 distinctive domains, 2,344 distinctive URLs, 127 distinctive prompts.

Outcomes: The highest 10 domains take 46% of all citations in a subject. The highest 30 take 67%.

AI quotation is barely much less concentrated than conventional natural search, however nonetheless excessive:

Successfully, there are ~30 seats (domains) on the quotation desk for any given matter. All the pieces else is sort of invisible.
Instance: storylane.io seems as a cited supply throughout 102 distinct prompts (distinctive questions requested of ChatGPT), reprise.com throughout 98. Despite the fact that reprise.com has extra whole citations (1,369 vs. storylane.io’s 968), storylane.io exhibits up in solutions to a broader vary of various questions.

We confirmed these findings in product-comparison verticals (SaaS instruments, monetary advisors). Nonetheless, you’ll see beneath that the sample is weaker in healthcare and open internet subjects, the place no single area dominates. Notably, the training sector receives probably the most AI citations of any vertical we studied.

What The Business Patterns Confirmed

The findings above are from product comparability verticals (SaaS, monetary advisors), however the sample is weaker in healthcare and open internet subjects, the place no single area dominates, and stronger within the training sector.

Schooling is winner-take-most: the highest 10% of domains seize 59.5% of all citations.

In case you are not already within the high 5-10 domains in training, reaching quotation breadth is exceptionally laborious.
tefl.org alone solutions 102 distinctive prompts and holds 18.75% of all Schooling citations.

Crypto is the second most concentrated at 43.0% for the highest 10%.

A small set of technical documentation and comparability websites (alchemy.com, quicknode.com, chainstack.com) dominate Solana RPC and infrastructure queries.
The technical nature of Solana queries means few credible sources exist; as soon as a site earns belief on this area of interest, it captures a big share.

Finance sits at 29.4% for top-10%.

Focus is query-type particular: Monetary advisor locator pages (forfiduciary.com at 139 distinctive prompts, smartasset.com at 168 distinctive prompts) dominate city-level advisor queries.
However the lengthy tail of monetary product queries retains whole focus reasonable.

Healthcare is the least concentrated at 13.0% for the highest 10%.

No single area dominates. New entrants have a sensible path to quotation attain.
The quotation floor is unfold throughout lots of of domains, every overlaying a small slice of telehealth, HIPAA compliance, and healthcare app queries.

CRM/SaaS and HR Tech are equally diffuse (16.1% and 14.4% top-10%).

These are multi-product software program classes the place dozens of comparability websites, overview platforms, and vendor pages cut up citations.
Monday.com leads CRM with solely 2.88% of all citations (37 distinctive prompts). A genuinely open aggressive subject

Prime Takeaways

1. Breadth of matter protection issues greater than area authority. A single well-structured comparability web page (be taught.g2.com: 65 distinctive prompts, 495 citations) can nonetheless outperform your entire area portfolio of a well known model. The objective is to not rank for one question, however to reply a cluster.

2. Focus displays class maturity. Fragmentation is a chance. Schooling and Crypto have slender, well-defined question areas the place a number of authoritative sources have locked in belief. Healthcare and CRM are broad, fragmented classes the place no single area dominates. That fragmentation is your opening.

3. Quotation attain (the variety of distinct prompts a site solutions) is a extra helpful strategic metric than uncooked quotation depend. In low-concentration verticals like Healthcare and CRM, a targeted 30-50 web page technique can realistically compete for a seat on the desk. In high-concentration verticals like Schooling and Crypto, the trail is narrower: change into the definitive useful resource on a selected sub-topic or settle for that you just’re combating for scraps.

2. The Quotation Benefit Begins At 10,000 Phrases

In basic Search, phrase depend and web page size are considerably indicative of ranks, so long as the standard is excessive. I questioned, once more, if that can be true for exhibiting up in ChatGPT solutions?

Method

Measure uncooked textual content size of each cited web page.
Group size into seven buckets.
For every bucket, calculate common citations per web page.

Outcomes: Extra phrases do certainly correlate with extra citations, however there’s a ceiling.

The 5,000-to-10,000 leap is the most important single step – practically 2x. Pages above 20,000 characters common 10.18 citations every vs. 2.39 for pages underneath 500 characters.

The size impact is vertical-specific: Finance inverts it solely. Excessive-cited Finance pages common 1,783 phrases vs. 2,084 for low-cited pages – a 0.86x raise. Authoritative compact sources, charge tables, and regulatory summaries outperform complete guides there. The ten,000-character rule holds for SaaS and editorial content material.

Finance peaks at 5,000-10,000 phrases (10.9 citations/web page), then drops sharply at 10,000-20,000 (4.92 citations/web page).

Finance additionally exhibits the steepest absolute acquire: Pages underneath 500 phrases earn solely 3.84 citations/web page whereas 5,000-10,000 pages earn 10.9, which is a 2.8x multiplier from size optimization alone.
Very lengthy Finance pages could dilute the citation-triggering content material with redundant element.

Schooling exhibits the clearest length-wins-everything sample.

Citations per web page climb steadily from 1.85 (underneath 500 phrases) to six.05 (20K+ phrases) with no drop-off.

Crypto and Product Analytics behave equally to Schooling.

Size persistently pays off, plateauing across the 10,000-20,000 tier (5.34 and 4.01, respectively). Each are technical verticals the place comprehensiveness indicators authority.

SaaS exhibits the weakest size impact: Citations per web page vary from 1.06 (1,000-2,000 phrases) to 2.77 (20,000+ phrases).

Even the longest CRM pages solely get 2.77 citations per web page on common.
On this vertical, size alone doesn’t decide citations. Format, construction, and area authority seem extra vital.

Healthcare exhibits a reasonable size impact (1.74 to three.92 citations/web page).

However with one anomaly: 5,000-10,000 phrases (2.80) underperforms vs. 2,000-5,000 phrases (3.36).
Very lengthy Healthcare pages could embrace an excessive amount of scientific element that dilutes citation-triggering content material.

Prime Takeaways

1. Common discovering: Very brief pages (underneath 1,000 phrases) underperform in each vertical. The underperformance of skinny content material is constant, however the reward for lengthy content material is vertical-specific.

2. Goal your size primarily based on trade, content material kind, and question intent, not a common phrase depend. For Finance verticals: Goal for five,000-10,000 phrases. Schooling, Crypto, and Product Analytics: Go so long as attainable. CRM/SaaS: Prioritize construction over phrase depend.

3. 58% Of Cited URLs Are Cited As soon as

After we have a look at the citations inside a subject, we regularly see many pages on a site getting cited. So, what number of citations can a single web page get?

Method

1. Depend the variety of distinctive prompts for every web page.

Classify variety of citations into: 1, 2-5, 6-10, 11+.
Examine the highest URLs per vertical for structural patterns.

Outcomes: On common, 67% of cited URLs seem in just one immediate.

Consider it like a footprint recreation. Uncooked quotation depend tells you ways in style a web page is. Quotation breadth tells you ways strategically priceless it’s. An evergreen web page in AI quotation is just not one which will get cited loads; it’s one which retains showing throughout numerous queries.

The highest 4.8% of URLs (cited 10+) are all category-level comparisons or guides answering “what’s it,” “who makes use of it,” “how to decide on,” and “pricing” in a single URL.

The quotation pool isn’t a meritocracy of one of the best reply, however the diploma varies sharply.

CRM/SaaS has the very best one-hit charge at 84.7%.
Finance produces the highest-reach evergreen pages: forfiduciary.com covers 119 distinctive prompts.
Crypto generates probably the most concentrated evergreen pages at 55.4% within the technical tier: chainstack.com/best-solana-rpc-providers-in-2026 (63 prompts), alchemy.com/overviews/solana-rpc (62 prompts), and rpcfast.com/weblog/rpc-node-providers (61 prompts). All three are comparability pages overlaying the Solana RPC supplier panorama from barely completely different angles.
Schooling evergreen pages observe a special logic: tefl.org, internationalteflacademy.com, and gooverseas.com get cited broadly as a result of they reply TEFL-adjacent queries (value, location, certification kind) from a single useful resource. One URL serves many question angles.

1. Evergreen pages share constant structural patterns: Class-level information format (greatest X for 2025/2026), broad matter protection inside a single web page (what’s X, how to decide on X, high X distributors, pricing), and specific yr anchoring in URL or title. Pages that reply a category of questions earn quotation breadth.

2. The highest 5 evergreen pages in each vertical are both comparability roundups, authoritative guides, or listing/itemizing pages. No skinny single-topic web page reaches the 11+ immediate tier in any vertical.

3. A single evergreen web page overlaying 10+ question intents is value extra in AI quotation attain than 10 single-intent pages. The ROI of complete content material is front-loaded: one well-built web page compounds quotation attain over time. The lengthy tail exists, however the high 5% of pages seize a disproportionate share of ongoing quotation exercise.

4. The Ski Ramp Is Steeper In Some Verticals

The science of how AI pays consideration confirmed that ChatGPT cites 44.2% from the highest 30% of any web page. Does that development maintain throughout completely different verticals?

Method: Re-run the identical positional evaluation throughout 7 verticals with 42,460 matched citations.

Outcomes: The development is actual however varies by matter. One quantity holds in every single place: The underside 10% of any web page earns 2.4-4.4% of citations, roughly 1 / 4 of what the height band earns. The conclusion part is sort of invisible to AI, no matter vertical.

What The Business Patterns Confirmed

The true peak decile throughout all verticals is just not the very opening. The ten-20% band is the place AI reads hardest in each vertical. The primary 10% is usually navigation, headlines, and intro fluff that AI skips.

Finance is the intense case. 43.7% of citations land within the first 30% of the web page. Finance pages front-load charge knowledge, percentages, and key figures. AI grabs them and barely reads previous the midway level.
Healthcare and HR Tech have the flattest ramps. Helpful content material is distributed extra evenly throughout these pages.
Schooling peaks on the 30-40% decile relatively than 10-20%, as a result of academic content material tends to bury the important thing reply barely deeper after the intro.

Prime Takeaways

1. Put your most citable claims and knowledge within the first 30% of the web page – it doesn’t matter what trade you’re in. Summaries and conclusions not often get cited.

2. For Finance manufacturers: Entrance-load your thesis and statistics as a lot as attainable.

What This Means For How You Construct LLM Visibility

The domains that personal quotation share didn’t get there by writing higher sentences. They constructed pages that maintain true topical authority, addressing a number of queries in a single place, after which repeated that authority throughout sufficient sub-topics to carry a number of seats on the desk.

Getting cited throughout 30, 60, or 100 distinct prompts requires a focused content material structure: pages constructed round question clusters and proudly owning complete subjects relatively than particular person key phrases. Groups that hold the standard “one key phrase, one web page” mannequin can be structurally locked out of AI quotation, even when their particular person pages are fantastically written.

However as the info exhibits, there isn’t a common playbook. The techniques that work for a broad CRM platform may actively hurt a Finance model.

Methodology

We analyzed ~98,000 ChatGPT quotation rows pulled from roughly 1.2 million ChatGPT responses from Gauge.

As a result of AI behaves in a different way relying on the subject, we remoted the info throughout 7 distinct, verified verticals to make sure the findings weren’t skewed by one particular trade.

Analyzed verticals:

B2B SaaS
Finance
Healthcare
Schooling
Crypto
HR Tech
Product Analytics

To reverse-engineer the quotation choice, I ran the info by a number of layers of research:

Structural parsing: I measured the uncooked character size of each cited web page and mapped heading hierarchies (H1s, H2s, H3s) to see how data structure impacts visibility.
Positional mapping: I used Jaccard sliding-window similarity to pinpoint precisely the place on the web page the AI extracted its solutions from, all the way down to the precise decile.
Entity & Sentiment extraction: I ran the opening textual content of distinctive cited URLs by the Google Pure Language API to categorise named entities (dates, costs, merchandise) and used TextBlob to attain sentiment, evaluating the efficiency of company content material towards user-generated content material (UGC).

Featured Picture: Roman Samborskyi/Shutterstock; Paulo Bobita/Search Engine Journal

The Science Of How AI Picks Its Sources

1. ~30 Domains Personal 67% Of AI Citations Per Matter

What The Business Patterns Confirmed

Prime Takeaways

2. The Quotation Benefit Begins At 10,000 Phrases

Prime Takeaways

3. 58% Of Cited URLs Are Cited As soon as

4. The Ski Ramp Is Steeper In Some Verticals

What The Business Patterns Confirmed

Prime Takeaways

What This Means For How You Construct LLM Visibility

Methodology

How Content Marketing Drives Visibility in AI Search

Google Responds To Error That Causes Old Branding To Persist In...

What I Shared At SEJ Live

LEAVE A REPLY Cancel reply

Most Popular

TikTok Adds Post Scheduling to Studio App

What The Scrub Daddy Tells Us About The Perfect...

10 New YouTube Marketing Strategies With Fresh Examples For...

Apple Marketing Strategy: What Brands Can Learn & Apply...

14 Digital Content Types You’re Probably Not Using Enough

What Content Works Well In LLMs?

Leveraging Multi-Channel Strategies For Maximum Reach

EDITOR PICKS

For The Love Of Shoes: How Sophia Webster Built Her Eponymous...

TikTok Updates Option To Download Clips Without Watermark

Adobe on AI, emotion and the ‘enormous pressure’ for content

Popular News

As the stock market goes crazy, here’s a FTSE 250 share...

Why Technical Expertise Alone Won’t Cut It Anymore

What I Shared At SEJ Live

POPULAR Tags

Popular Tags

ABOUT US

FOLLOW US

The Science Of How AI Picks Its Sources

1. ~30 Domains Personal 67% Of AI Citations Per Matter

What The Business Patterns Confirmed

Prime Takeaways

2. The Quotation Benefit Begins At 10,000 Phrases

Prime Takeaways

3. 58% Of Cited URLs Are Cited As soon as

4. The Ski Ramp Is Steeper In Some Verticals

What The Business Patterns Confirmed

Prime Takeaways

What This Means For How You Construct LLM Visibility

Methodology

Related posts:

LEAVE A REPLY Cancel reply

Most Popular

EDITOR PICKS

Popular News

POPULAR Tags

Popular Tags

ABOUT US

FOLLOW US