AI-readiness

AI crawler permissions

Explicit allow/disallow rules for GPTBot, ClaudeBot, PerplexityBot, and friends. Default-deny means missing AI citations; default-allow means free training data.

Authority: Per-vendor
Version: robots.txt convention
Jurisdiction: Global
Source: developers.openai.com
Last reviewed: 2026-04-28
Last verified: pending

What it is

User-agent–specific rules in robots.txt that grant or deny access to known AI crawler bots: OpenAI's GPTBot, Anthropic's ClaudeBot, Common Crawl's CCBot, Google's Google-Extended, and others.

Why it matters

An unaddressed robots.txt is ambiguous in 2026 — some bots default to allow, some don't. Be explicit, and decide whether you want to be in the AI corpus.

Who it applies to

Every site that has an opinion about AI training and citation.

How WQI scores it

Web Quality Index considers this standard satisfied when the supporting factor passes.

#	Factor	Status
16	AI crawler robots.txt directives	live

Related standards

See also: llms.txt , ai.txt , robots/sitemap , AI Preferences

Other references

guidance Anthropic — ClaudeBot crawler info
guidance Google — Google-Extended user agent
guidance Common Crawl — CCBot