Lawrence Hitches Written by Lawrence Hitches | AI SEO Consultant | June 18, 2026 | 4 min read

This is a quick-reference to the AI crawlers and assistant bots hitting your site in 2026: who they are, the exact user-agent, what each one actually does, and whether you should block it. Built from the bots I see in my own logs, where AI bots are now 26.3% of all traffic.

The major AI crawlers in 2026 are GPTBot, OAI-SearchBot and ChatGPT-User (OpenAI), ClaudeBot and Claude-User (Anthropic), PerplexityBot (Perplexity), Googlebot and Google-Extended (Google), and Bingbot (Microsoft, which feeds both Copilot and ChatGPT Search). Allow the search and assistant bots: blocking them removes you from AI answers. Blocking pure training crawlers is optional and has little citation downside.

The AI crawler reference table

Every bot worth knowing, what it does, and the one-word verdict. "Allow" means blocking it removes you from live AI answers or citations. "Optional" means it is a training crawler: a values and IP choice with little citation downside.

Bot (user-agent)OperatorWhat it doesRobots.txtBlock?
GPTBotOpenAITrains GPT modelsYesOptional
OAI-SearchBotOpenAIIndexes for ChatGPT SearchYesAllow
ChatGPT-UserOpenAILive fetch when a user asks ChatGPTYesAllow
ClaudeBotAnthropicTrains Claude modelsYesOptional
Claude-UserAnthropicLive fetch for a Claude userYesAllow
PerplexityBotPerplexityIndexes for Perplexity answersYesAllow
GooglebotGoogleSearch index; also powers AI OverviewsYesAllow
Google-ExtendedGoogleRobots token to opt out of Gemini trainingn/a (token)Optional
BingbotMicrosoftBing index; feeds Copilot + ChatGPT SearchYesAllow
Applebot / -ExtendedAppleSiri + Apple Intelligence; -Extended = training opt-outYesOptional
AmazonbotAmazonAlexa + Amazon AI answersYesOptional
meta-externalagentMetaTrains Meta AIYesOptional
PetalBotHuaweiHuawei search + AIYesOptional
BytespiderByteDanceTrains TikTok / Doubao modelsHistorically ignored itBlock
CCBotCommon CrawlOpen dataset many LLMs train onYesOptional
DuckAssistBotDuckDuckGoDuckAssist answersYesAllow
DiffbotDiffbotStructured-data extractionYesOptional

Who is actually crawling (first-party data)

Crawler lists are easy to find. Real crawl volume is not. Here is the actual split from one week of my own logs, where AI bots made up 26.3% of all requests.

Who is actually crawling: AI bot requests to this site (one week)
OpenAI61.2%
Meta15.9%
Huawei (PetalBot)10.4%
Amazon5.9%
Apple2.3%
Anthropic1.7%

AI bots were 26.3% of all requests to lawrencehitches.com that week. OpenAI alone was 61% of the bot activity.

What they take vs what they send back

The fair criticism of AI crawlers is that they take a lot and return little. The industry measures this as a crawl-to-referral ratio: pages fetched per visitor sent back. The published numbers are brutal, but mine are far better than average, because content built to be cited gets cited.

CrawlerScrapes per referral
Google (Search)~5 : 1
PerplexityBot~195 : 1
GPTBot (OpenAI, industry)~1,091 : 1
ClaudeBot (Anthropic, industry)~23,951 : 1
This site, overall46 : 1
This site, OpenAI7 : 1

My overall ratio is 46 to 1, and 7 to 1 for OpenAI, against an industry GPTBot average near 1,091 to 1. Small absolute numbers, but an order of magnitude more efficient than the web average. More on this in clicks to citations.

Should you block AI crawlers?

Short answer: do not block the search and assistant bots. Here is the split that matters.

  • Search and assistant bots (allow these): OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot, Bingbot, Googlebot, DuckAssistBot. These power live AI answers and citations. Block them and you delete yourself from ChatGPT Search, Copilot, Perplexity and AI Overviews.
  • Training crawlers (optional): GPTBot, ClaudeBot, CCBot, meta-externalagent, Google-Extended, Applebot-Extended. These build models, not live answers. Blocking is a legitimate IP choice with little direct citation cost, though training presence still helps brand familiarity.
  • Bad actors (block): Bytespider has a history of ignoring robots.txt. Block it if you do not want ByteDance training on your content.

My take after watching the data: the citation upside outweighs the scraping cost. I allow almost everything. See where ChatGPT gets its data for why Bing access matters most.

How to allow or block a bot

Control AI crawlers in robots.txt at your site root. Block a training crawler while keeping search access:

# Block OpenAI training, keep ChatGPT Search + live fetch
User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Opt out of Google AI training without touching Search
User-agent: Google-Extended
Disallow: /

# Block a known bad actor
User-agent: Bytespider
Disallow: /

Note: Google-Extended and Applebot-Extended are robots tokens for training opt-out, not separate crawlers. Disallowing them does not affect Google Search or Siri indexing. To verify who is hitting you, check server or CDN logs by user-agent. To measure the payoff, see how to measure AI search traffic and how to track AI search rankings.

Frequently asked questions

What is the GPTBot user agent?

GPTBot is OpenAI's training crawler. It collects web pages to train future GPT models. It respects robots.txt. It is separate from OAI-SearchBot (which indexes for ChatGPT Search) and ChatGPT-User (which fetches a page live when a user asks).

What is the difference between GPTBot and OAI-SearchBot?

GPTBot trains the model. OAI-SearchBot builds the index ChatGPT Search uses to answer with citations. Blocking GPTBot only stops training; blocking OAI-SearchBot removes you from ChatGPT Search results.

Does ChatGPT respect robots.txt?

Yes. OpenAI's GPTBot, OAI-SearchBot and ChatGPT-User all document that they obey robots.txt directives. You can allow or disallow each one independently.

Should I block AI crawlers from my website?

Do not block the search and assistant bots (OAI-SearchBot, ChatGPT-User, Claude-User, PerplexityBot, Bingbot, Googlebot): blocking them removes you from AI answers and citations. Blocking pure training crawlers (GPTBot, ClaudeBot, CCBot) is optional and has little citation downside.

Which AI bot crawls the most?

In my logs over one week, OpenAI was 61% of all AI bot activity, followed by Meta at 16% and Huawei's PetalBot at 10%. AI bots were 26.3% of total requests to the site.

Does Bing matter for AI search crawling?

Yes, more than most realise. Bingbot's index feeds both Microsoft Copilot and ChatGPT Search, so a Bing block can quietly remove you from ChatGPT's answers. Allow Bingbot.

Find this useful? Add Lawrence Hitches as a preferred source on Google to get my latest in Search and AI results.
Add as preferred source

Soaring Above Search

Weekly AI search insights from the front line. One newsletter. Six sections. Everything that actually moved this week, with a practitioner's take.

Lawrence Hitches
Lawrence Hitches AI SEO Consultant, Melbourne

Chief of Staff at StudioHawk, Australia's largest dedicated SEO agency. Specialising in AI search visibility, technical SEO, and organic growth strategy. Leading a team of 120+ across Melbourne, Sydney, London, and the US. Book a free consultation →