AI-readiness

ai.txt

Site-level opt-out signal for AI training, distinct from llms.txt. Where llms.txt is a positive content map for AI consumption, ai.txt is `do not train on this`.

Authority: Spawning / community
Version: Draft (Spawning)
Jurisdiction: Global
Source: github.com
Last reviewed: 2026-04-28
Last verified: pending

What it is

An emerging well-known file (`/ai.txt`) proposing per-asset opt-out rules for AI training datasets. Originally proposed by Spawning, now overlapping with the IETF AI Preferences working-group output. Often confused with llms.txt — the two are orthogonal: ai.txt restricts training-time use, llms.txt advertises content for inference-time consumption.

Why it matters

robots.txt + AI-bot user-agents covers crawler access but not downstream dataset use; once your content is in Common Crawl or LAION, blocking GPTBot doesn't claw it back. ai.txt (and the IETF AI Preferences successor) is the policy-layer signal that says `even if you ingested this, don't train on it`. Honored by Spawning's data-diligence pipeline and a growing list of ML training shops.

Who it applies to

Publishers who care about how their content is used in AI training, not just whether it's crawled.

How WQI scores it

Web Quality Index considers this standard satisfied when the supporting factor passes.

#	Factor	Status
16	AI crawler robots.txt directives	live

Related standards

See also: llms.txt , AI crawlers , AI Preferences , C2PA

What it is

Why it matters

Who it applies to

How WQI scores it

Related standards

Other references