AI Crawler Cheatsheet: User-Agents & Whether to Block

Q: What is the GPTBot user agent?

GPTBot is OpenAI's training crawler. It collects web pages to train future GPT models. It respects robots.txt. It is separate from OAI-SearchBot (which indexes for ChatGPT Search) and ChatGPT-User (which fetches a page live when a user asks).

This is a quick-reference to the AI crawlers and assistant bots hitting your site in 2026: who they are, the exact user-agent, what each one actually does, and whether you should block it. Built from the bots I see in my own logs, where AI bots are now 26.3% of all traffic.

The major AI crawlers in 2026 are GPTBot, OAI-SearchBot and ChatGPT-User (OpenAI), ClaudeBot and Claude-User (Anthropic), PerplexityBot (Perplexity), Googlebot and Google-Extended (Google), and Bingbot (Microsoft, which feeds both Copilot and ChatGPT Search). Allow the search and assistant bots: blocking them removes you from AI answers. Blocking pure training crawlers is optional and has little citation downside.

The flip from Crawler to Assistant is the indexed-to-cited conversion.

The AI crawler reference table

Every bot worth knowing, what it does, and the one-word verdict. "Allow" means blocking it removes you from live AI answers or citations. "Optional" means it is a training crawler: a values and IP choice with little citation downside.

Bot (user-agent)	Operator	What it does	Robots.txt	Block?
`GPTBot`	OpenAI	Trains GPT models	Yes	Optional
`OAI-SearchBot`	OpenAI	Indexes for ChatGPT Search	Yes	Allow
`ChatGPT-User`	OpenAI	Live fetch when a user asks ChatGPT	Yes	Allow
`ClaudeBot`	Anthropic	Trains Claude models	Yes	Optional
`Claude-User`	Anthropic	Live fetch for a Claude user	Yes	Allow
`PerplexityBot`	Perplexity	Indexes for Perplexity answers	Yes	Allow
`Googlebot`	Google	Search index; also powers AI Overviews	Yes	Allow
`Google-Extended`	Google	Robots token to opt out of Gemini training	n/a (token)	Optional
`Bingbot`	Microsoft	Bing index; feeds Copilot + ChatGPT Search	Yes	Allow
`Applebot` / `-Extended`	Apple	Siri + Apple Intelligence; -Extended = training opt-out	Yes	Optional
`Amazonbot`	Amazon	Alexa + Amazon AI answers	Yes	Optional
`meta-externalagent`	Meta	Trains Meta AI	Yes	Optional
`PetalBot`	Huawei	Huawei search + AI	Yes	Optional
`Bytespider`	ByteDance	Trains TikTok / Doubao models	Historically ignored it	Block
`CCBot`	Common Crawl	Open dataset many LLMs train on	Yes	Optional
`DuckAssistBot`	DuckDuckGo	DuckAssist answers	Yes	Allow
`Diffbot`	Diffbot	Structured-data extraction	Yes	Optional

Who is actually crawling (first-party data)

Crawler lists are easy to find. Real crawl volume is not. Here is the actual split from one week of my own logs, where AI bots made up 26.3% of all requests.

Who is actually crawling: AI bot requests to this site (one week)

OpenAI61.2%

Meta15.9%

Huawei (PetalBot)10.4%

Amazon5.9%

Apple2.3%

Anthropic1.7%

AI bots were 26.3% of all requests to lawrencehitches.com that week. OpenAI alone was 61% of the bot activity.

What they take vs what they send back

The fair criticism of AI crawlers is that they take a lot and return little. The industry measures this as a crawl-to-referral ratio: pages fetched per visitor sent back. The published numbers are brutal, but mine are far better than average, because content built to be cited gets cited.

Crawler	Scrapes per referral
Google (Search)	~5 : 1
PerplexityBot	~195 : 1
GPTBot (OpenAI, industry)	~1,091 : 1
ClaudeBot (Anthropic, industry)	~23,951 : 1
This site, overall	46 : 1
This site, OpenAI	7 : 1

My overall ratio is 46 to 1, and 7 to 1 for OpenAI, against an industry GPTBot average near 1,091 to 1. Small absolute numbers, but an order of magnitude more efficient than the web average. More on this in clicks to citations.

Should you block AI crawlers?

Short answer: do not block the search and assistant bots. Here is the split that matters.

Search and assistant bots (allow these): OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot, Bingbot, Googlebot, DuckAssistBot. These power live AI answers and citations. Block them and you delete yourself from ChatGPT Search, Copilot, Perplexity and AI Overviews.
Training crawlers (optional): GPTBot, ClaudeBot, CCBot, meta-externalagent, Google-Extended, Applebot-Extended. These build models, not live answers. Blocking is a legitimate IP choice with little direct citation cost, though training presence still helps brand familiarity.
Bad actors (block): Bytespider has a history of ignoring robots.txt. Block it if you do not want ByteDance training on your content.

My take after watching the data: the citation upside outweighs the scraping cost. I allow almost everything. See where ChatGPT gets its data for why Bing access matters most.

How to allow or block a bot

Control AI crawlers in robots.txt at your site root. Block a training crawler while keeping search access:

# Block OpenAI training, keep ChatGPT Search + live fetch
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Opt out of Google AI training without touching Search
User-agent: Google-Extended
Disallow: /

# Block a known bad actor
User-agent: Bytespider
Disallow: /

Note: Google-Extended and Applebot-Extended are robots tokens for training opt-out, not separate crawlers. Disallowing them does not affect Google Search or Siri indexing. To verify who is hitting you, check server or CDN logs by user-agent. To measure the payoff, see how to measure AI search traffic and how to track AI search rankings.

Bookmark this AI crawler quick reference: keep it handy for your next audit.

The same work is packaged as free Claude SEO skills you can drop into Claude Desktop.

Looking for more? Browse the SEO cheatsheet hub in one place.

Part of the AI SEO cheatsheet.

Frequently asked questions

What is the GPTBot user agent?

GPTBot is OpenAI's training crawler. It collects web pages to train future GPT models. It respects robots.txt. It is separate from OAI-SearchBot (which indexes for ChatGPT Search) and ChatGPT-User (which fetches a page live when a user asks).

What is the difference between GPTBot and OAI-SearchBot?

GPTBot trains the model. OAI-SearchBot builds the index ChatGPT Search uses to answer with citations. Blocking GPTBot only stops training; blocking OAI-SearchBot removes you from ChatGPT Search results.

Does ChatGPT respect robots.txt?

Yes. OpenAI's GPTBot, OAI-SearchBot and ChatGPT-User all document that they obey robots.txt directives. You can allow or disallow each one independently.

Should I block AI crawlers from my website?

Do not block the search and assistant bots (OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot, Bingbot, Googlebot): blocking them removes you from AI answers and citations. Blocking pure training crawlers (GPTBot, ClaudeBot, CCBot) is optional and has little citation downside.

Which AI bot crawls the most?

In my logs over one week, OpenAI was 61% of all AI bot activity, followed by Meta at 16% and Huawei's PetalBot at 10%. AI bots were 26.3% of total requests to the site.

Does Bing matter for AI search crawling?

Yes, more than most realise. Bingbot's index feeds both Microsoft Copilot and ChatGPT Search, so a Bing block can quietly remove you from ChatGPT's answers. Allow Bingbot.

AI Crawler Cheatsheet: Every Bot, User-Agent and Whether to Block It

The AI crawler reference table

Who is actually crawling (first-party data)

What they take vs what they send back

Should you block AI crawlers?

How to allow or block a bot

Frequently asked questions

What is the GPTBot user agent?

What is the difference between GPTBot and OAI-SearchBot?

Does ChatGPT respect robots.txt?

Should I block AI crawlers from my website?

Which AI bot crawls the most?

Does Bing matter for AI search crawling?

Soaring Above Search

The AI crawler reference table

Who is actually crawling (first-party data)

What they take vs what they send back

Should you block AI crawlers?

How to allow or block a bot

Frequently asked questions

What is the GPTBot user agent?

What is the difference between GPTBot and OAI-SearchBot?

Does ChatGPT respect robots.txt?

Should I block AI crawlers from my website?

Which AI bot crawls the most?

Does Bing matter for AI search crawling?

Soaring Above Search

Keep Reading

Traditional PR vs Digital PR

What Is Topical Authority in SEO?

What is E-E-A-T? Everything You Need to Know!

How AI Is Changing SEO (and Whether It Is Killing It)