The Retrieval Layer Is the Real SEO. Everything Else Is Just Eligibility.

Q: How many words should a page be for AI search citation?

The AirOps study found the citation sweet spot is 500-2,000 words. Health publishers with a median of 2,111 words achieved the highest citation rates in the study. Pages above 2,000 words showed diminishing returns at the page level, though covering the broader topic cluster across multiple focused pages remains important for retrieval.

Q: What schema type earns the most AI citations?

The AirOps study found JSON-LD adds approximately +6.5 percentage points to citation rate. The mechanism is likely indirect: schema improves Google's understanding of the page, which improves retrieval rank, which improves citation probability. Article, FAQPage, and HowTo schema are the most applicable for content pages.

85% of pages that ChatGPT retrieves are never cited in the final answer. They make it into the pool. They don't make it onto the page. Most AI search advice treats this as one problem. It's two.

The AirOps x Kevin Indig study published this month is the largest empirical AI citation dataset ever run: 16,851 queries, three separate runs each, 353,799 pages scraped across 10 industry verticals. I've spent time going through the findings and mapping them against what I've been seeing across client sites at StudioHawk.

The data confirmed something I've suspected for a while. We're optimising for citation when half the time the real problem is retrieval eligibility. And those two problems have almost nothing in common.

Two stages, two strategies: where most AI SEO goes wrong

AI search works in two distinct stages.

In stage one, ChatGPT issues search queries (often 8-12 rewritten sub-queries per user prompt) and retrieves roughly 100 content chunks from the web. Your page either makes it into that pool or it doesn't.

In stage two, the model selects which retrieved chunks to actually cite in the final response. Only about 15% of retrieved pages make the cut.

Getting into the retrieval pool is a Google ranking problem. Being cited from that pool is a content structure problem. They overlap, but they're not the same, and conflating them leads to strategies that optimise neither.

Almost everything currently sold as "AI SEO" focuses on citation signals while ignoring whether the page can even get retrieved. That's backwards for most sites.

What a chunk actually is

A chunk is the unit AI retrieval works on, and it is not a page. Before an AI engine evaluates anything, it breaks content into tokens (roughly 75 tokens per 100 words of English) and groups those tokens into chunks of around 400 words. The retrieval pool described above is not a pool of 100 pages. It is a pool of 100 chunks.

That changes what you compete on. Page-versus-page is the old model. In AI retrieval you compete chunk-versus-chunk: your 400-word block answering a sub-query, sitting beside a competitor's 400-word block answering the same sub-query. The model picks the better chunk. The rest of your page never enters that specific comparison.

Parent document retrieval is why the rest of the page still matters. When the retrieval system finds a promising chunk, it reads the surrounding page, the heading directly above the chunk and the section it sits in, to confirm the chunk is being used in the right context. A strong chunk under a vague heading, on a page about something else, gets discounted. The chunk wins the comparison; the parent document confirms it belongs.

A good chunk has three properties:

Self-contained. It answers its sub-query without leaning on three paragraphs of build-up above it. A chunk that only makes sense in context loses, because it gets extracted without that context.
Dense. The useful information is packed in, not padded out. Filler dilutes the signal and makes the chunk easier to beat.
Correctly placed. It sits directly under a heading that matches the sub-query, on a page genuinely about that topic.

This carries a strategic consequence most content teams miss: you do not always need a new page to earn visibility for a query. If you can place a strong, self-contained chunk under the right heading on an existing, relevant page, that chunk can win retrieval on its own. Chunk-level thinking is often a tighter, faster play than publishing another page, and it sidesteps the cannibalisation that comes from spawning near-duplicate pages.

Retrieval rank is the gate: the numbers are stark

The clearest finding in the AirOps study: retrieval position is a 4x signal.

Pages at position 1 in ChatGPT's retrieval layer earn a 58% citation rate. Pages at position 10 earn 14%. That gap compounds: a page at position 1 with strong heading match hits 79.6% citation rate. Position 11 or lower, even with perfect heading match, falls to 21.5%.

What drives retrieval position? Mostly traditional Google ranking. Aleyda Solis confirmed this with a first-party test at the Shenzhen SEO Conference in April: an unindexed page was invisible to ChatGPT. Once Google indexed it, ChatGPT cited it. The Google index is the entry ticket to ChatGPT retrieval.

This is why citation mechanics can't be separated from SEO fundamentals. If your page doesn't rank in Google's top 10 for the relevant query, the chance of it entering the ChatGPT retrieval pool at all is marginal. The citation optimisation work is wasted if the retrieval problem isn't solved first.

Once you're retrieved: heading match beats topical breadth

Inside the retrieval pool, the signal that matters most is heading-to-query match, not coverage breadth.

Pages with a heading similarity score of 0.8 or above to the query earned a 41% citation rate. Pages with broad topical coverage but weaker heading match earned 29%. That's a 12 percentage point gap driven purely by whether your H2s and H3s mirror the specific language of the query.

The practical application: write heading tags that answer the exact question, not the category. "How to improve AI search visibility" as a heading beats "AI Search Strategy" every time. The heading is the first signal the model uses to decide whether this chunk is relevant to the sub-query it's answering.

I've seen this play out on client sites. Pages with vague section headings get retrieved but rarely cited. Restructuring to question-based H2s with specific answers in the opening sentence consistently improved citation tracking across the pages we monitor.

The "ultimate guide" trap: why more coverage hurts

This is the finding that should change how content teams think about format.

Pages with moderate fan-out coverage (addressing 26-50% of the query cluster's sub-queries) outperformed pages with exhaustive coverage (100%) when query match was held constant. The "ultimate guide" playbook actively hurts AI citation rates at the page level.

Here's why this isn't contradictory with Surfer's finding that pages ranking for fan-out queries are 161% more likely to earn AI citations. Surfer measures ranking for fan-out queries. That's a cluster-level measurement. AirOps measures citation rate at the individual page level. Both are true at different layers:

At the cluster level: cover all the sub-queries to rank across the topic
At the page level: write each page focused on one query, not all of them

The strategy that reconciles both: build a tight cluster of focused pages rather than one sprawling pillar. Each page owns one query and earns higher citation rates. Together they cover the cluster and earn higher retrieval across the fan-out. This maps directly to how topical authority should be built for AI search.

Stop pitching 5,000-word guides as the AI search content product. The data says 500-2,000 words per page is the citation sweet spot. Focused wins.

Domain authority has no positive correlation with citation rate

Across 353,799 pages, domain authority showed no positive correlation with AI citation rate. The finding lines up with Mahmoud Elsaid's Ahrefs analysis from February 2026: only 38% of AI Overview citations now come from Google's top 10, down from 76% just seven months earlier. Rankings and AI citations are decoupling.

The practical proof: HubSpot at DR93, ranking position 3, received zero AI citations. A DR17 site at position 23 was cited repeatedly, because its content was chunked cleanly, entity-dense, and answered directly without filler.

Domain authority helps you get retrieved (it correlates with Google ranking, which is the retrieval gate). But once you're in the retrieval pool, authority is irrelevant. The model doesn't know your DR. It knows whether your content answers the query cleanly. This is why information gain matters more than it ever did for traditional SEO.

The structural signals that actually move citation rates

Inside the retrieval pool, three structural signals had measurable positive correlation with citation rates:

Word count: 500-2,000 words. Below 500, there's not enough content to be useful. Above 2,000, you're adding noise relative to the query's scope. The health publisher vertical confirms this: median 2,111 words, 46.4% citation rate, best retrieval rank in the study.

Readability: Flesch-Kincaid grade 16-17. That's college-level prose. Precise, technical, and specific. This isn't about dumbing down. It's about removing filler, hedging, and meandering that makes content harder to extract cleanly.

JSON-LD structured data: +6.5 percentage points. This seems to contradict the existing evidence that schema markup gets stripped before it reaches AI models. The likely explanation: JSON-LD improves how Google understands and indexes the page, which improves retrieval rank, which then improves citation probability downstream. The schema isn't directly helping the citation layer. It's helping the retrieval layer. Same outcome, different mechanism.

The Wikipedia exception and why it doesn't apply to you

Wikipedia achieved a 59.2% citation rate despite the worst retrieval rank in the study (median position 24) and the lowest heading-to-query match (0.576). It wins on encyclopedic density: 4,383 average words, 31 lists, 6.6 tables per page.

Don't try to replicate this.

Wikipedia wins because it's Wikipedia. It has a trust relationship with every LLM that no other site can match. Its citation rate at rank 24 is the exception that proves the rule. For every other site in the study, retrieval rank was the dominant factor.

The lesson isn't "write longer with more tables." The lesson is: if you're not Wikipedia, you need retrieval rank first and citation structure second. Health publishers show the correct model: rank well, write focused content, cite at 46.4%.

What to build instead: the two-layer content strategy

Based on the data, here's how to structure AI search content properly.

Layer 1: Earn retrieval eligibility. Build topical clusters that rank in Google's top 10 for the query cluster. Traditional SEO. Technical foundation. Internal linking. Authority building. No shortcut here. If you're not ranking, you're not getting retrieved.

Layer 2: Optimise for citation selection. For pages already ranking, structure for extraction. Answer-first openings in the first 150 words. Question-based H2s and H3s that mirror query language. Focused scope (one query per page, 26-50% of the sub-query cluster covered). Clean prose at readability grade 16-17. JSON-LD where applicable.

Run those two layers as separate audits. Most sites have a retrieval problem. Some have a citation problem. Very few have both fully solved. Knowing which you're fixing changes everything about what you prioritise next.

Frequently asked questions

What is content chunking in AI search?

Content chunking is how AI engines break a page into smaller units for retrieval. Text is converted to tokens (roughly 75 tokens per 100 words), then grouped into chunks of around 400 words. The AI evaluates and cites chunks, not whole pages, so the retrieval pool is a pool of chunks competing against each other.

How long should a content chunk be?

Around 400 words is the working number, because that is roughly the chunk size most retrieval systems evaluate. The practical rule is less about hitting an exact length and more about making each section under a heading self-contained: a reader, or an AI engine, should get a complete answer from that block without needing the paragraphs above it.

How many words should a page be for AI search citation?

The AirOps study found the citation sweet spot is 500-2,000 words. Health publishers with a median of 2,111 words achieved the highest citation rates in the study. Pages above 2,000 words showed diminishing returns at the page level, though covering the broader topic cluster across multiple focused pages remains important for retrieval.

Does domain authority help with AI citations?

Domain authority doesn't directly correlate with AI citation rates. It helps you rank in Google, which helps you get retrieved by ChatGPT. But once you're in the retrieval pool, the model doesn't factor in domain authority. A DR17 site with clean, focused content can be cited over a DR93 site with dense, filler-heavy pages.

What schema type earns the most AI citations?

The AirOps study found JSON-LD adds approximately +6.5 percentage points to citation rate. The mechanism is likely indirect: schema improves Google's understanding of the page, which improves retrieval rank, which improves citation probability. Article, FAQPage, and HowTo schema are the most applicable for content pages.

Is Wikipedia an example of what to copy for AI visibility?

No. Wikipedia achieves exceptional citation rates despite poor retrieval rank because of its unique trust relationship with every major LLM. No other site shares that relationship. The correct model to copy is health publishers: strong retrieval rank, focused content around 2,000 words, 46.4% citation rate.

What's the difference between retrieval and citation in AI search?

Retrieval is stage one: ChatGPT issues search queries and pulls approximately 100 content chunks into a candidate pool. Citation is stage two: the model selects which chunks to include in the final answer. 85% of retrieved pages are never cited. Retrieval is a ranking problem. Citation is a content structure problem. Most AI SEO advice conflates both.