Lawrence Hitches Written by Lawrence Hitches | AI SEO Consultant | June 19, 2026 | 2 min read

Every robots.txt directive plus the index and snippet controls beyond it, in one reference. robots.txt (User-agent, Disallow, Allow, Sitemap, Crawl-delay) controls crawling, not indexing. To remove a page from the index, use a noindex robots meta on a crawlable page. llms.txt and IndexNow are the AI-era additions.

robots.txt directives

robots.txt sits at your site root and controls what crawlers may fetch. It controls crawling, not indexing.

DirectiveWhat it doesExample
User-agentTargets a specific bot (or * for all)User-agent: GPTBot
DisallowBlocks a path from crawlingDisallow: /private/
AllowPermits a path (overrides a Disallow)Allow: /private/public/
SitemapPoints crawlers to your sitemapSitemap: https://site.com/sitemap.xml
Crawl-delaySeconds between requests (Bing/Yandex; Google ignores it)Crawl-delay: 10
Wildcards* matches any sequence, $ matches URL endDisallow: /*.pdf$

Index and snippet control (beyond robots.txt)

robots.txt cannot remove a page from the index. To deindex, the page must be crawlable and carry a noindex.

SignalWhere it livesWhat it does
noindex robots meta<head>Keeps the page out of the index (must be crawlable to be seen)
nofollow robots meta<head>Tells engines not to follow links on the page
X-Robots-TagHTTP response headernoindex or nofollow for non-HTML files like PDFs
rel=canonical<head> or headerConsolidates duplicate URLs to one preferred version
data-nosnippetInline attributeExcludes that text from search snippets

AI-era files

FilePurpose
llms.txtA plain-markdown guide at root that tells LLMs who you are and what to read
llms-full.txtThe full-content version of llms.txt
IndexNow key fileLets you instantly notify Bing and Yandex of new or changed URLs

llms.txt format: markdown, an H1 with your name, a blockquote summary, then sections of linked pages. See the truth about llms.txt and the AI crawler cheatsheet for allow/block guidance.

Frequently asked questions

Does robots.txt stop a page from being indexed?

No. robots.txt stops crawling, not indexing. A blocked URL can still be indexed from external links, shown without a snippet. To deindex, let the page be crawled and add a noindex robots meta, or use X-Robots-Tag.

What is llms.txt?

A proposed standard: a plain-markdown file at your site root that tells large language models who you are and which pages to read. It does not replace robots.txt; it guides LLMs to your best content.

How do I block AI crawlers in robots.txt?

Target the bot by user-agent and disallow everything, for example User-agent: GPTBot then Disallow: /. Keep search and assistant bots allowed, because blocking them removes you from AI answers.

Does Google respect Crawl-delay?

No. Google ignores Crawl-delay and manages crawl rate in Search Console instead. Bing and Yandex do respect it.

Find this useful? Add Lawrence Hitches as a preferred source on Google to get my latest in Search and AI results.
Add as preferred source

Soaring Above Search

Weekly AI search insights from the front line. One newsletter. Six sections. Everything that actually moved this week, with a practitioner's take.

Lawrence Hitches
Lawrence Hitches AI SEO Consultant, Melbourne

Chief of Staff at StudioHawk, Australia's largest dedicated SEO agency. Specialising in AI search visibility, technical SEO, and organic growth strategy. Leading a team of 120+ across Melbourne, Sydney, London, and the US. Book a free consultation →