methodology / AI Readiness / #16
AI crawler robots.txt directives
#16 · Variable · Web Quality · weighted · AI Readiness · weight 1.3% · impl implemented · method v1.2.0
Web Quality factor
This factor is part of Web Quality — the weighted 0..100 score that sits above Web Standards. Its weight depends on what kind of site is being measured. Web Standards items take priority; this factor only enters the score once Web Standards passes.
- Base weight
- 0.4 applied to every site type unless overridden below
- Why this weight
- Having explicit AI-crawler directives — allow OR disallow — is the citizenship signal. The site has thought about it.
Per-site-type overrides
| Site type | Weight | Δ vs base |
|---|---|---|
| Blog | 0.3 | -0.1 |
| Corporate / B2B | 0.5 | +0.1 |
| News / Publisher | 1.0 | +0.6 |
| Personal site | 0.2 | -0.2 |
| SaaS / Product | 0.6 | +0.2 |
| Media / Streaming | 0.9 | +0.5 |
Site types not listed inherit the base weight.
What this means for your business
Your site can quietly tell ChatGPT, Claude, and Google's AI to stay out — or to come in. If you're blocking them by accident, you're invisible when customers ask AI for a recommendation in your category.
Plain title: Whether you're letting AI assistants read your site
What we measure
AI search engines respect robots.txt. Blocking them entirely hides you from AI answers; not addressing them at all is fine but means you have no explicit policy. Sites that ALLOW AI crawlers are more discoverable in AI search.
How to improve your score
Decide your policy. To be discoverable in AI search: don't block (or explicitly allow) `GPTBot`, `ClaudeBot`, `PerplexityBot`, `Google-Extended`, `CCBot`. To opt out: add `User-agent: GPTBot\nDisallow: /` etc.
Implementation
stale · v1 · seeded — no connector publish yet · source: freshcoat-discovery/src/connectors/legacy-audit.ts:scoreAiCrawlerDirectives
Detection method
Reads robots_ai_blocked_count from the audit endpoint's robots.txt parse. INVERTED rubric: any explicit AI-crawler directive (allow OR disallow) is the citizenship signal — the operator has thought about it. No explicit directives = warn.
Detection sources
- Audit endpoint robots.txt parser
Scoring bands · soft ladder
| Score | Condition |
|---|---|
| 100 | ≥1 explicit AI-crawler block (GPTBot, ClaudeBot, PerplexityBot, etc.) |
| 80 | No explicit AI directives (the silent majority — implicit allow) |
Evidence-key dictionary
What every notes string the connector emits means.
Surfaces in the per-domain dossier evidence column.
no_explicit_ai_directives- robots.txt has no specific User-agent rules for known AI crawlers.
N_explicit_ai_directives- robots.txt explicitly handles N AI crawlers (allow or disallow).
Applicability
Variable tier. Editorial choice — blocking GPTBot is a legitimate IP stance (NYT, WSJ, Reuters do it). The signal is 'has thought about it,' not 'permits everyone'.
Changelog
- 2026-04-29 · seed Initial seed from MethodologyRegistry bootstrap.
Facts
When this applies
AI crawler directives live inside robots.txt, which this platform doesn't let site owners edit.
- Marked n/a when the detected platform doesn't support canControlRobotsTxt (e.g., Squarespace and Wix can't set custom HTTP headers, so factor #4 becomes n/a there).
Scoring
Scoring formulas are versioned with the methodology. The current method (v1.2.0) maps raw measurements to pass, warn, fail. Factor weights determine how much each contributes to the composite — see the methodology index for the full table.
Cited by these standards
Standards in the Standards Library whose satisfiedBy requirement tree references this factor. Each link goes to the standard's full entry — methodology, scope, and the other factors it relies on.
Version history
| Version | Change | Date |
|---|---|---|
| v1.2.0 | Factor introduced. Status: live. Scoring impl: implemented. | 2026-04-25 |