AI-readiness
AI crawler permissions
Explicit allow/disallow rules for GPTBot, ClaudeBot, PerplexityBot, and friends. Default-deny means missing AI citations; default-allow means free training data.
What it is
User-agent–specific rules in robots.txt that grant or deny access to known AI crawler bots: OpenAI's GPTBot, Anthropic's ClaudeBot, Common Crawl's CCBot, Google's Google-Extended, and others.
Why it matters
An unaddressed robots.txt is ambiguous in 2026 — some bots default to allow, some don't. Be explicit, and decide whether you want to be in the AI corpus.
Who it applies to
Every site that has an opinion about AI training and citation.
How WQI scores it
Web Quality Index considers this standard satisfied when the supporting factor passes.
| # | Factor | Status |
|---|---|---|
| 16 | AI crawler robots.txt directives | live |
Related standards
- See also
- llms.txt , ai.txt , robots/sitemap , AI Preferences
Other references
- guidance Anthropic — ClaudeBot crawler info
- guidance Google — Google-Extended user agent
- guidance Common Crawl — CCBot