This is a quick-reference to the AI crawlers and assistant bots hitting your site in 2026: who they are, the exact user-agent, what each one actually does, and whether you should block it. Built from the bots I see in my own logs, where AI bots are now 26.3% of all traffic.
The major AI crawlers in 2026 are GPTBot, OAI-SearchBot and ChatGPT-User (OpenAI), ClaudeBot and Claude-User (Anthropic), PerplexityBot (Perplexity), Googlebot and Google-Extended (Google), and Bingbot (Microsoft, which feeds both Copilot and ChatGPT Search). Allow the search and assistant bots: blocking them removes you from AI answers. Blocking pure training crawlers is optional and has little citation downside.
The AI crawler reference table
Every bot worth knowing, what it does, and the one-word verdict. "Allow" means blocking it removes you from live AI answers or citations. "Optional" means it is a training crawler: a values and IP choice with little citation downside.
| Bot (user-agent) | Operator | What it does | Robots.txt | Block? |
|---|---|---|---|---|
GPTBot | OpenAI | Trains GPT models | Yes | Optional |
OAI-SearchBot | OpenAI | Indexes for ChatGPT Search | Yes | Allow |
ChatGPT-User | OpenAI | Live fetch when a user asks ChatGPT | Yes | Allow |
ClaudeBot | Anthropic | Trains Claude models | Yes | Optional |
Claude-User | Anthropic | Live fetch for a Claude user | Yes | Allow |
PerplexityBot | Perplexity | Indexes for Perplexity answers | Yes | Allow |
Googlebot | Search index; also powers AI Overviews | Yes | Allow | |
Google-Extended | Robots token to opt out of Gemini training | n/a (token) | Optional | |
Bingbot | Microsoft | Bing index; feeds Copilot + ChatGPT Search | Yes | Allow |
Applebot / -Extended | Apple | Siri + Apple Intelligence; -Extended = training opt-out | Yes | Optional |
Amazonbot | Amazon | Alexa + Amazon AI answers | Yes | Optional |
meta-externalagent | Meta | Trains Meta AI | Yes | Optional |
PetalBot | Huawei | Huawei search + AI | Yes | Optional |
Bytespider | ByteDance | Trains TikTok / Doubao models | Historically ignored it | Block |
CCBot | Common Crawl | Open dataset many LLMs train on | Yes | Optional |
DuckAssistBot | DuckDuckGo | DuckAssist answers | Yes | Allow |
Diffbot | Diffbot | Structured-data extraction | Yes | Optional |
Who is actually crawling (first-party data)
Crawler lists are easy to find. Real crawl volume is not. Here is the actual split from one week of my own logs, where AI bots made up 26.3% of all requests.
AI bots were 26.3% of all requests to lawrencehitches.com that week. OpenAI alone was 61% of the bot activity.
What they take vs what they send back
The fair criticism of AI crawlers is that they take a lot and return little. The industry measures this as a crawl-to-referral ratio: pages fetched per visitor sent back. The published numbers are brutal, but mine are far better than average, because content built to be cited gets cited.
| Crawler | Scrapes per referral |
|---|---|
| Google (Search) | ~5 : 1 |
| PerplexityBot | ~195 : 1 |
| GPTBot (OpenAI, industry) | ~1,091 : 1 |
| ClaudeBot (Anthropic, industry) | ~23,951 : 1 |
| This site, overall | 46 : 1 |
| This site, OpenAI | 7 : 1 |
My overall ratio is 46 to 1, and 7 to 1 for OpenAI, against an industry GPTBot average near 1,091 to 1. Small absolute numbers, but an order of magnitude more efficient than the web average. More on this in clicks to citations.
Should you block AI crawlers?
Short answer: do not block the search and assistant bots. Here is the split that matters.
- Search and assistant bots (allow these): OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot, Bingbot, Googlebot, DuckAssistBot. These power live AI answers and citations. Block them and you delete yourself from ChatGPT Search, Copilot, Perplexity and AI Overviews.
- Training crawlers (optional): GPTBot, ClaudeBot, CCBot, meta-externalagent, Google-Extended, Applebot-Extended. These build models, not live answers. Blocking is a legitimate IP choice with little direct citation cost, though training presence still helps brand familiarity.
- Bad actors (block): Bytespider has a history of ignoring robots.txt. Block it if you do not want ByteDance training on your content.
My take after watching the data: the citation upside outweighs the scraping cost. I allow almost everything. See where ChatGPT gets its data for why Bing access matters most.
How to allow or block a bot
Control AI crawlers in robots.txt at your site root. Block a training crawler while keeping search access:
# Block OpenAI training, keep ChatGPT Search + live fetch
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
# Opt out of Google AI training without touching Search
User-agent: Google-Extended
Disallow: /
# Block a known bad actor
User-agent: Bytespider
Disallow: /
Note: Google-Extended and Applebot-Extended are robots tokens for training opt-out, not separate crawlers. Disallowing them does not affect Google Search or Siri indexing. To verify who is hitting you, check server or CDN logs by user-agent. To measure the payoff, see how to measure AI search traffic and how to track AI search rankings.
Frequently asked questions
What is the GPTBot user agent?
GPTBot is OpenAI's training crawler. It collects web pages to train future GPT models. It respects robots.txt. It is separate from OAI-SearchBot (which indexes for ChatGPT Search) and ChatGPT-User (which fetches a page live when a user asks).
What is the difference between GPTBot and OAI-SearchBot?
GPTBot trains the model. OAI-SearchBot builds the index ChatGPT Search uses to answer with citations. Blocking GPTBot only stops training; blocking OAI-SearchBot removes you from ChatGPT Search results.
Does ChatGPT respect robots.txt?
Yes. OpenAI's GPTBot, OAI-SearchBot and ChatGPT-User all document that they obey robots.txt directives. You can allow or disallow each one independently.
Should I block AI crawlers from my website?
Do not block the search and assistant bots (OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot, Bingbot, Googlebot): blocking them removes you from AI answers and citations. Blocking pure training crawlers (GPTBot, ClaudeBot, CCBot) is optional and has little citation downside.
Which AI bot crawls the most?
In my logs over one week, OpenAI was 61% of all AI bot activity, followed by Meta at 16% and Huawei's PetalBot at 10%. AI bots were 26.3% of total requests to the site.
Does Bing matter for AI search crawling?
Yes, more than most realise. Bingbot's index feeds both Microsoft Copilot and ChatGPT Search, so a Bing block can quietly remove you from ChatGPT's answers. Allow Bingbot.
Soaring Above Search
Weekly AI search insights from the front line. One newsletter. Six sections. Everything that actually moved this week, with a practitioner's take.