AI-readiness
ai.txt
Site-level opt-out signal for AI training, distinct from llms.txt. Where llms.txt is a positive content map for AI consumption, ai.txt is `do not train on this`.
What it is
An emerging well-known file (`/ai.txt`) proposing per-asset opt-out rules for AI training datasets. Originally proposed by Spawning, now overlapping with the IETF AI Preferences working-group output. Often confused with llms.txt — the two are orthogonal: ai.txt restricts training-time use, llms.txt advertises content for inference-time consumption.
Why it matters
robots.txt + AI-bot user-agents covers crawler access but not downstream dataset use; once your content is in Common Crawl or LAION, blocking GPTBot doesn't claw it back. ai.txt (and the IETF AI Preferences successor) is the policy-layer signal that says `even if you ingested this, don't train on it`. Honored by Spawning's data-diligence pipeline and a growing list of ML training shops.
Who it applies to
Publishers who care about how their content is used in AI training, not just whether it's crawled.
How WQI scores it
Web Quality Index considers this standard satisfied when the supporting factor passes.
| # | Factor | Status |
|---|---|---|
| 16 | AI crawler robots.txt directives | live |
Related standards
- See also
- llms.txt , AI crawlers , AI Preferences , C2PA