robots.txt & llms.txt Cheatsheet (Directives + Examples)

Every robots.txt directive plus the index and snippet controls beyond it, in one reference. robots.txt (User-agent, Disallow, Allow, Sitemap, Crawl-delay) controls crawling, not indexing. To remove a page from the index, use a noindex robots meta on a crawlable page. llms.txt and IndexNow are the AI-era additions.

robots.txt directives

robots.txt sits at your site root and controls what crawlers may fetch. It controls crawling, not indexing.

Directive	What it does	Example
`User-agent`	Targets a specific bot (or * for all)	User-agent: GPTBot
`Disallow`	Blocks a path from crawling	Disallow: /private/
`Allow`	Permits a path (overrides a Disallow)	Allow: /private/public/
`Sitemap`	Points crawlers to your sitemap	Sitemap: https://site.com/sitemap.xml
`Crawl-delay`	Seconds between requests (Bing/Yandex; Google ignores it)	Crawl-delay: 10
Wildcards	* matches any sequence, $ matches URL end	Disallow: /*.pdf$

Index and snippet control (beyond robots.txt)

robots.txt cannot remove a page from the index. To deindex, the page must be crawlable and carry a noindex.

Signal	Where it lives	What it does
`noindex` robots meta	<head>	Keeps the page out of the index (must be crawlable to be seen)
`nofollow` robots meta	<head>	Tells engines not to follow links on the page
`X-Robots-Tag`	HTTP response header	noindex or nofollow for non-HTML files like PDFs
`rel=canonical`	<head> or header	Consolidates duplicate URLs to one preferred version
`data-nosnippet`	Inline attribute	Excludes that text from search snippets

AI-era files

File	Purpose
`llms.txt`	A plain-markdown guide at root that tells LLMs who you are and what to read
`llms-full.txt`	The full-content version of llms.txt
`IndexNow` key file	Lets you instantly notify Bing and Yandex of new or changed URLs

llms.txt format: markdown, an H1 with your name, a blockquote summary, then sections of linked pages. See the truth about llms.txt and the AI crawler cheatsheet for allow/block guidance.

Frequently asked questions

Does robots.txt stop a page from being indexed?

No. robots.txt stops crawling, not indexing. A blocked URL can still be indexed from external links, shown without a snippet. To deindex, let the page be crawled and add a noindex robots meta, or use X-Robots-Tag.

What is llms.txt?

A proposed standard: a plain-markdown file at your site root that tells large language models who you are and which pages to read. It does not replace robots.txt; it guides LLMs to your best content.

How do I block AI crawlers in robots.txt?

Target the bot by user-agent and disallow everything, for example User-agent: GPTBot then Disallow: /. Keep search and assistant bots allowed, because blocking them removes you from AI answers.

Does Google respect Crawl-delay?

No. Google ignores Crawl-delay and manages crawl rate in Search Console instead. Bing and Yandex do respect it.

robots.txt, llms.txt and Crawl-Control Cheatsheet

robots.txt directives

Index and snippet control (beyond robots.txt)

AI-era files

Frequently asked questions

Does robots.txt stop a page from being indexed?

What is llms.txt?

How do I block AI crawlers in robots.txt?

Does Google respect Crawl-delay?

Soaring Above Search

robots.txt directives

Index and snippet control (beyond robots.txt)

AI-era files

Frequently asked questions

Does robots.txt stop a page from being indexed?

What is llms.txt?

How do I block AI crawlers in robots.txt?

Does Google respect Crawl-delay?

Soaring Above Search

Keep Reading

AI Model Cheatsheet: Cutoffs, Web Access and Citations

HTTP Status Codes for SEO Cheatsheet

Meta Tags and HTML Head Cheatsheet for SEO

Schema.org Structured Data Cheatsheet