WQI.web​qualityindex
Method v1.2.0 86 live / 86 total factors methodology

methodology / AI Readiness / #16

AI crawler robots.txt directives

#16 · Variable · Web Quality · weighted · AI Readiness · weight 1.3% · impl implemented · method v1.2.0

Web Quality factor

This factor is part of Web Quality — the weighted 0..100 score that sits above Web Standards. Its weight depends on what kind of site is being measured. Web Standards items take priority; this factor only enters the score once Web Standards passes.

Base weight
0.4 applied to every site type unless overridden below
Why this weight
Having explicit AI-crawler directives — allow OR disallow — is the citizenship signal. The site has thought about it.

Per-site-type overrides

Site type Weight Δ vs base
Blog 0.3 -0.1
Corporate / B2B 0.5 +0.1
News / Publisher 1.0 +0.6
Personal site 0.2 -0.2
SaaS / Product 0.6 +0.2
Media / Streaming 0.9 +0.5

Site types not listed inherit the base weight.

Same factor, two depths.

What we measure

AI search engines respect robots.txt. Blocking them entirely hides you from AI answers; not addressing them at all is fine but means you have no explicit policy. Sites that ALLOW AI crawlers are more discoverable in AI search.

How to improve your score

Decide your policy. To be discoverable in AI search: don't block (or explicitly allow) `GPTBot`, `ClaudeBot`, `PerplexityBot`, `Google-Extended`, `CCBot`. To opt out: add `User-agent: GPTBot\nDisallow: /` etc.

Implementation

stale · v1 · seeded — no connector publish yet · source: freshcoat-discovery/src/connectors/legacy-audit.ts:scoreAiCrawlerDirectives

Detection method

Reads robots_ai_blocked_count from the audit endpoint's robots.txt parse. INVERTED rubric: any explicit AI-crawler directive (allow OR disallow) is the citizenship signal — the operator has thought about it. No explicit directives = warn.

Detection sources

  1. Audit endpoint robots.txt parser

Scoring bands · soft ladder

Score Condition
100 ≥1 explicit AI-crawler block (GPTBot, ClaudeBot, PerplexityBot, etc.)
80 No explicit AI directives (the silent majority — implicit allow)

Evidence-key dictionary

What every notes string the connector emits means. Surfaces in the per-domain dossier evidence column.

no_explicit_ai_directives
robots.txt has no specific User-agent rules for known AI crawlers.
N_explicit_ai_directives
robots.txt explicitly handles N AI crawlers (allow or disallow).

Applicability

Variable tier. Editorial choice — blocking GPTBot is a legitimate IP stance (NYT, WSJ, Reuters do it). The signal is 'has thought about it,' not 'permits everyone'.

Changelog

  • 2026-04-29 · seed Initial seed from MethodologyRegistry bootstrap.

Facts

Ticket
WEBQ-16
Category
AI Readiness
Status
live
Weight
1.3%
Data source
Service cost
Free — robots.txt parsing
Scoring impl
implemented
Method version
v1.2.0

When this applies

AI crawler directives live inside robots.txt, which this platform doesn't let site owners edit.

Scoring

Scoring formulas are versioned with the methodology. The current method (v1.2.0) maps raw measurements to pass, warn, fail. Factor weights determine how much each contributes to the composite — see the methodology index for the full table.

Cited by these standards

Standards in the Standards Library whose satisfiedBy requirement tree references this factor. Each link goes to the standard's full entry — methodology, scope, and the other factors it relies on.

Version history

Version Change Date
v1.2.0 Factor introduced. Status: live. Scoring impl: implemented. 2026-04-25

← back to methodology