WQI.web​qualityindex

AI-readiness

ai.txt

Site-level opt-out signal for AI training, distinct from llms.txt. Where llms.txt is a positive content map for AI consumption, ai.txt is `do not train on this`.

Authority
Spawning / community
Version
Draft (Spawning)
Jurisdiction
Global
Source
github.com
Last reviewed
2026-04-28
Last verified
pending

What it is

An emerging well-known file (`/ai.txt`) proposing per-asset opt-out rules for AI training datasets. Originally proposed by Spawning, now overlapping with the IETF AI Preferences working-group output. Often confused with llms.txt — the two are orthogonal: ai.txt restricts training-time use, llms.txt advertises content for inference-time consumption.

Why it matters

robots.txt + AI-bot user-agents covers crawler access but not downstream dataset use; once your content is in Common Crawl or LAION, blocking GPTBot doesn't claw it back. ai.txt (and the IETF AI Preferences successor) is the policy-layer signal that says `even if you ingested this, don't train on it`. Honored by Spawning's data-diligence pipeline and a growing list of ML training shops.

Who it applies to

Publishers who care about how their content is used in AI training, not just whether it's crawled.

How WQI scores it

Web Quality Index considers this standard satisfied when the supporting factor passes.

# Factor Status
16 AI crawler robots.txt directives live

Related standards

See also
llms.txt , AI crawlers , AI Preferences , C2PA

Other references