WQI.web​qualityindex
Method v1.2.0 86 live / 86 total factors methodology

/ / methodology / web-quality

Web Quality

86 factors · 77 matrix rows · 14 scored in Web Standards · 11 site types · weight range 0..1.5

Once a site clears Web Standards, every remaining factor enters a weighted 0..100 score: Web Quality. The weight on each factor depends on what kind of site it is — schema.org structured data is critical for a news publisher and nice-to-have for a personal blog; a Google Business Profile is essential for a local business and irrelevant to a SaaS company.

The heuristic for every cell: "If a [site type] does this factor well, does that meaningfully tell us they're a positive contributor to the web for their visitors?" When the answer is "yes, definitively," the weight is 1.0 or higher. When it's "not really, that's not what makes this kind of site good," the weight is 0.

Reading the matrix

Weight matrix · factor × site type

Tickets without a matrix row default to weight 1.0 across every site type (neutral baseline — new factors are weighted 1.0 until the methodology is tuned). Hover any cell for the rationale row that backs it.

# Factor BlogE-commerceCorporate / B2BNews / PublisherLocal businessPersonal siteEducationGovernmentNonprofitSaaS / ProductMedia / Streaming
1 DMARC enforcement
web-standards
2 DKIM signing
base 1.0 · DKIM signing is required infrastructure for any domain that sends mail. Personal sites usually don't run their own mail.
0.5 1.0 1.0 1.0 1.0 0.4 1.0 1.0 1.0 1.0 1.0
3 SPF record present and valid
web-standards
4 Security headers (HSTS, CSP, X-Frame-Options, Referrer-Policy, Permissions-Policy, X-Content-Type-Options)
base 1.0 · Security headers (HSTS, CSP, XFO) are how a site protects its own users from clickjacking, MITM downgrade, and XSS. Lower for personal/blog because most managed platforms can't set them.
0.5 1.0 1.0 1.0 1.0 0.4 1.0 1.0 1.0 1.0 1.0
5 SSL certificate validity & expiration window
web-standards
6 WordPress REST API user enumeration exposure
web-standards
7 Sensitive path exposure (.git, .env, /admin, xmlrpc.php, wp-login.php)
web-standards
8 Mobile PageSpeed score + Core Web Vitals (LCP, FCP, CLS)
base 1.0 · Mobile PageSpeed reflects how the site performs for the median visitor. Lower for gov/edu where legacy infrastructure is the norm.
1.0 1.0 1.0 1.0 1.0 0.5 0.7 0.7 1.0 1.0 1.0
9 HTTP/2 support
base 1.0 · HTTP/2 is table stakes for modern infrastructure — every CDN gives it free.
1.0 1.0 1.0 1.0 1.0 0.5 1.0 1.0 1.0 1.0 1.0
10 Compression (Brotli / gzip)
base 1.0 · Compression (Brotli/gzip) is universally available and saves real bytes for real users.
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
11 Title, meta description, OG, Twitter cards, canonical
base 1.0 · Title/meta/OG/Twitter/canonical is how the site shows up everywhere it gets shared. Lower for personal sites that don't expect to be discovered.
1.0 1.0 1.0 1.0 1.0 0.4 1.0 0.7 1.0 1.0 1.0
12 Schema.org structured data presence
base 0.8 · Schema.org structured data is how AI assistants and search engines understand pages. Critical for news (Article) and local biz (LocalBusiness).
0.5 1.0 0.8 1.2 1.0 0.3 0.8 0.8 0.8 0.8 1.0
13 H1 tag presence
base 0.7 · An H1 anchors the page for screen readers and search bots. Modest signal — most sites have one.
0.7 0.9 0.7 0.9 0.7 0.3 0.7 0.7 0.7 0.7 0.7
14 Sitemap.xml + robots.txt presence
base 1.0 · robots.txt + sitemap are how the site instructs crawlers. Polite and trivial to provide.
1.0 1.0 1.0 1.0 1.0 0.5 1.0 1.0 1.0 1.0 1.0
15 llms.txt presence
base 0.4 · llms.txt is an emerging norm with sub-1% adoption. High-value for content-rich and reference sites; aspirational elsewhere.
0.3 0.4 0.5 1.0 0.4 0.2 0.8 0.8 0.4 0.6 0.9
16 AI crawler robots.txt directives
base 0.4 · Having explicit AI-crawler directives — allow OR disallow — is the citizenship signal. The site has thought about it.
0.3 0.4 0.5 1.0 0.4 0.2 0.4 0.4 0.4 0.6 0.9
17 Domain age (RDAP / WHOIS)
base 0.3 · Domain age is a weak signal at best — penalises legitimate rebrands. Held low intentionally.
0.3 0.5 0.3 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.3
18 Wayback Machine site age & last snapshot
base 0.4 · Wayback history shows the site has existed and been crawled. Same caveat as domain age.
0.4 0.4 0.4 0.7 0.4 0.2 0.4 0.6 0.4 0.4 0.6
19 Google Business Profile presence + rating
base 0.0 · Google Business Profile — critical for local biz; mostly irrelevant elsewhere. Already gated to n/a for non-local in applicability.ts.
0.0 0.5 0.0 0.0 1.5 0.0 0.0 0.0 0.5 0.0 0.0
20 News mentions in last 30 days
base 0.3 · News mentions — high signal for media/news. Bumped corporate from 0.5 to 0.7 after the panel showed established brands (Stripe, NYT, Apple) routinely have legitimate press coverage and the original weight was under-crediting them.
0.2 0.4 0.7 1.0 0.3 0.0 0.3 0.3 0.5 0.3 0.8
21 Wikipedia entity
base 0.2 · Wikipedia entity — notability correlates with size more than quality, so held under news mentions. Corporate bumped 0.5 → 0.7 because in practice an established corporate having a Wikipedia entry is a real legitimacy signal that legacy scoring captured and the matrix shouldn't fully neutralise.
0.0 0.2 0.7 0.7 0.2 0.0 0.6 0.6 0.4 0.2 0.7
22 DNSSEC validation
base 0.7 · DNSSEC stops cache-poisoning attacks. Higher relevance for institutions whose visitors are most targeted by spoofing.
0.7 1.0 0.7 0.7 0.7 0.7 1.0 1.2 0.7 0.7 0.7
23 CAA records
base 0.6 · CAA records limit which CAs can issue certs for the domain. Modest baseline; matters more for high-trust sites.
0.6 0.9 0.6 0.6 0.6 0.6 0.6 1.0 0.6 0.8 0.6
24 MTA-STS & TLS-RPT
base 0.6 · MTA-STS + TLS-RPT enforce TLS for inbound mail. Moderate baseline.
0.6 0.9 0.6 0.8 0.6 0.6 0.6 1.0 0.6 0.6 0.6
25 BIMI + VMC
base 0.4 · BIMI is a brand-display feature in mail clients. Aspirational — only worth it once DMARC is enforced.
0.4 0.7 0.4 0.6 0.4 0.1 0.4 0.4 0.4 0.4 0.6
26 HSTS preload list inclusion
base 0.5 · HSTS preload list inclusion is a strong signal but requires committing to HTTPS-only forever.
0.5 0.8 0.5 0.5 0.5 0.5 0.5 1.0 0.5 0.7 0.5
27 TLS minimum version & cipher suite quality
base 1.0 · TLS 1.0/1.1 leaves visitors vulnerable to known attacks. Required basically everywhere.
1.0 1.0 1.0 1.0 1.0 0.5 1.0 1.0 1.0 1.0 1.0
28 Subdomain takeover surface
base 1.0 · Subdomain takeover lets an attacker serve content from your domain. Universally relevant.
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
29 Spam / phishing blocklist presence
web-standards
30 HTTP/3 support
base 0.4 · HTTP/3 — aspirational, real benefits mostly on lossy mobile.
0.4 0.6 0.4 0.5 0.4 0.2 0.4 0.4 0.4 0.4 0.6
31 IPv6 support
base 0.4 · IPv6 — civic infrastructure choice; low immediate user impact.
0.4 0.4 0.4 0.4 0.4 0.2 0.6 0.7 0.4 0.4 0.4
32 Image optimization (WebP/AVIF)
base 0.7 · WebP/AVIF saves bytes on image-heavy sites.
0.7 1.0 0.7 0.9 0.7 0.4 0.7 0.7 0.7 0.7 1.0
33 Desktop PageSpeed score
base 0.7 · Desktop PSI — secondary to mobile but still relevant for SaaS / B2B traffic.
0.7 0.8 0.7 0.7 0.7 0.4 0.7 0.5 0.7 0.9 0.7
34 Core Web Vitals from CrUX (Real User Monitoring)
base 0.9 · Real-user CrUX field data is the truer signal than lab Lighthouse.
0.9 0.9 0.9 0.9 0.9 0.4 0.9 0.9 0.9 0.9 0.9
35 Lazy loading on below-fold images
base 0.7 · Lazy loading — high impact on long, image-heavy pages.
0.7 1.0 0.7 0.9 0.7 0.4 0.7 0.7 0.7 0.7 1.0
36 Font loading strategy (FOUT/FOIT/swap)
base 0.7 · Font loading strategy — FOIT punishes readers; swap is the right default.
0.7 0.7 0.7 0.9 0.7 0.4 0.7 0.7 0.7 0.7 0.9
37 Total homepage byte weight
base 0.8 · Total byte weight — first-load size matters most for content-heavy sites.
0.8 0.8 0.8 1.0 0.8 0.4 0.8 0.8 0.8 0.8 1.0
38 Largest unused JavaScript bundle
base 0.7 · Unused JavaScript bloat — main contributor to slow first-loads.
0.7 0.7 0.7 0.9 0.7 0.4 0.7 0.7 0.7 0.7 0.9
39 Schema.org type validity (parsed JSON-LD)
base 0.7 · Schema.org type validity — invalid JSON-LD is silently broken.
0.7 0.9 0.7 1.0 0.7 0.3 0.7 0.7 0.7 0.7 0.7
40 Breadcrumb schema
base 0.5 · Breadcrumb schema helps Google + AI parse site hierarchy. More valuable on deep sites.
0.6 0.8 0.5 0.7 0.5 0.2 0.5 0.5 0.5 0.5 0.5
41 FAQ / HowTo schema (where applicable)
base 0.4 · FAQ/HowTo schema — applicable only when content matches. Connector emits n/a when no candidate.
0.5 0.4 0.4 0.7 0.6 0.0 0.4 0.4 0.4 0.3 0.5
42 hreflang for multi-language sites
base 0.5 · hreflang — only matters for multi-language sites. Connector emits n/a when monolingual.
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
43 Internal link depth (clicks from homepage to deepest content)
base 0.6 · Internal link depth — readers shouldn't need 6 clicks to reach old content.
0.6 0.8 0.6 0.9 0.6 0.3 0.6 0.6 0.6 0.6 0.8
44 AI plugin manifest (.well-known/ai-plugin.json)
base 0.2 · AI plugin manifest — niche emerging spec. Low weight until adoption tips.
0.2 0.2 0.3 0.4 0.2 0.2 0.2 0.2 0.2 0.6 0.2
45 JSON-LD richness score for LLMs
base 0.5 · JSON-LD richness — how completely the site describes itself for LLMs. Maps to citizenship for content sites.
0.5 0.8 0.5 1.0 0.5 0.2 0.8 0.8 0.5 0.5 0.8
46 Cookie banner presence + CMP detection
web-standards
47 Privacy policy page presence
web-standards
48 Terms of service page presence
web-standards
49 Third-party tracker count
base 0.7 · Third-party tracker count — a heavy ad/tracking stack hurts visitors regardless of policy disclosure.
0.7 0.9 0.7 1.0 0.7 0.3 0.7 0.7 0.7 0.7 1.0
50 CCPA "Do Not Sell or Share My Personal Information" link
web-standards
51 Cookie scan — actual cookies set on first load
base 0.6 · Cookie scan: actual cookies set on first load (vs declared in policy).
0.6 0.8 0.6 0.9 0.6 0.2 0.6 0.6 0.6 0.6 0.9
52 Accessibility statement page
base 0.5 · Accessibility statement — strong for institutions; weak for personal sites.
0.5 0.7 0.5 0.5 0.5 0.2 0.9 1.0 0.5 0.5 0.5
53 axe-core / WAVE accessibility scan
base 0.8 · axe-core full audit — beyond the bar. The bar catches the show-stoppers; this catches the long tail.
0.8 0.9 0.8 0.8 0.8 0.4 1.0 1.2 0.8 0.8 0.8
54 Image alt text coverage
web-standards
55 Heading hierarchy validity
web-standards
56 Color contrast (WCAG AA)
web-standards
57 ARIA labels presence and validity
base 0.7 · ARIA labels matter most on app-like and form-heavy sites.
0.7 0.8 0.7 0.7 0.7 0.7 0.9 1.0 0.7 0.9 0.7
58 Skip-to-content link
base 0.6 · Skip-to-content link — keyboard navigation accommodation.
0.6 0.6 0.6 0.8 0.6 0.2 0.8 0.9 0.6 0.6 0.6
59 Yelp presence + rating + review count
base 0.0 · Yelp — local-business surface only.
0.0 0.3 0.0 0.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0
60 Trustpilot presence + rating
base 0.0 · Trustpilot — ecommerce + consumer-services.
0.0 0.9 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.6 0.0
61 Better Business Bureau accreditation
base 0.0 · BBB accreditation — US/CA local biz; jurisdictionally gated in applicability.ts.
0.0 0.4 0.0 0.0 0.6 0.0 0.0 0.0 0.0 0.0 0.0
62 LinkedIn Company Page (presence + employee count + follower count)
base 0.3 · LinkedIn page — strong signal for organisations, useless for personal sites.
0.0 0.5 1.0 0.5 0.5 0.0 0.3 0.3 0.7 1.0 0.5
63 Bing Places
base 0.0 · Bing Places — local-only, low even there.
0.0 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0
64 Apple Maps presence (Apple Business Connect)
base 0.0 · Apple Maps — local-only, low even there.
0.0 0.0 0.0 0.0 0.4 0.0 0.0 0.0 0.0 0.0 0.0
65 Facebook Page presence
base 0.0 · Facebook page presence is not a web-quality signal — it's a marketing channel choice. Detected and surfaced as descriptive site_facts; not weighted in the composite.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
66 Instagram presence (link from site → IG profile)
base 0.0 · Instagram presence — same reasoning as Facebook. Not a web standard; informational only.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
67 Web App Manifest (manifest.json)
base 0.3 · Web App Manifest — only relevant if the site is app-like or installable. Corporate dropped to 0.1 — most corporate marketing sites are read-only landing pages where a manifest adds nothing for the visitor.
0.2 0.7 0.1 0.5 0.3 0.1 0.3 0.3 0.3 0.9 0.5
68 Service Worker / PWA capability
base 0.3 · Service Worker / PWA — same shape as manifest. Corporate dropped to 0.1 for the same reason.
0.2 0.7 0.1 0.4 0.3 0.1 0.3 0.3 0.3 0.9 0.4
69 Analytics tools detected
base 0.0 · Analytics tools detected — pure tech-stack inventory. Whether a site uses GA4, Plausible, or no analytics at all isn't a web-quality signal. Surfaced as descriptive site_facts; not weighted.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
70 Payment processors detected
base 0.0 · Payment processors — only meaningful where money moves.
0.0 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.7 0.0
71 Marketing automation tools detected
base 0.0 · Marketing automation tools detected — inventory only. Running HubSpot/Marketo isn't a web-quality signal; it's a marketing-stack choice. Informational, not weighted.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
72 Customer support tools detected
base 0.0 · Customer support tools detected — inventory only. Live-chat presence is captured by the contact-channel floor; whether the site uses Intercom vs Zendesk isn't a quality grade.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
73 Tag manager presence
base 0.0 · Tag manager presence — inventory only. GTM/Segment usage is a deployment choice, not a quality signal.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
74 Ad networks detected
base 0.0 · Ad networks detected — descriptive only. Whether a site monetizes via ads doesn't grade web quality in either direction.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
75 Branded domain email address (vs free Gmail/Yahoo)
web-standards
76 Email provider class (Workspace / 365 / Zoho / self-hosted / shared)
base 0.6 · Email provider class — Workspace/365 is a citizenship signal, free providers less so.
0.6 0.9 0.6 0.8 0.6 0.3 0.6 0.6 0.6 0.8 0.6
77 DMARC aggregate reporting enabled (rua=)
base 0.5 · DMARC rua= reporting is good operational hygiene but most legit deployments forgo it without a 3rd-party processor. Bonus, not gate.
0.5 0.8 0.5 0.7 0.5 0.2 0.5 0.5 0.5 0.7 0.7
78 Free-email exposure on contact page (gmail/yahoo/outlook visible)
base 0.0 · Free-email exposure on contact page — borderline professionalism check, not a web-standards grade. A site that uses gmail/yahoo for support contact isn't violating any web standard. Informational only.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
79 Newsletter signup form detected
base 0.0 · Newsletter signup detection — marketing tactic, not web quality. Penalizing news/SaaS sites for not running an email list is off-mission. The fact remains in the breakdown for descriptive purposes; not weighted in the composite.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
80 Email Service Provider (ESP) detected
base 0.4 · ESP detection — only meaningful where a newsletter exists.
0.4 0.6 0.4 0.7 0.4 0.0 0.4 0.4 0.4 0.6 0.7
81 Transactional email provider detected (from SPF includes)
base 0.5 · Transactional email provider — high relevance for any site that emails users.
0.5 0.9 0.5 0.5 0.5 0.2 0.5 0.5 0.5 0.8 0.5
82 SPF lookup count (10-limit deliverability check)
base 1.0 · SPF flat-lookup count > 10 silently breaks deliverability. High signal anywhere mail is sent.
1.0 1.0 1.0 1.0 1.0 0.3 1.0 1.0 1.0 1.0 1.0
83 Visible contact form on site
base 0.6 · Visible contact form — overlaps the identity-floor synthetic check, but a thoughtful contact form is a genuine quality signal.
0.6 0.6 0.8 0.6 1.0 0.3 0.6 0.6 0.6 0.9 0.6
84 Mailto: direct contact link present
base 0.5 · mailto: link — most direct contact channel; civic-style for institutions.
0.5 0.5 0.5 0.5 0.5 0.4 0.7 0.8 0.7 0.5 0.5
85 Email forwarding service detected (improvmx, forwardemail, etc.)
base 0.2 · Email forwarding service detection — descriptive; slight positive for solo operators.
0.3 0.2 0.2 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.2
86 Lead magnet / signup incentive detected (free download, ebook, etc.)
base 0.0 · Lead magnet / signup incentive — pure conversion-funnel marketing tactic. Penalizing nytimes, anthropic, stripe, etc. for not running a 'free guide' funnel grades marketing strategy, not web quality. Not weighted; informational only if surfaced.
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
87 Modern cipher suite preference
base 0.7 · Modern cipher preference — moderate baseline, higher for high-trust sites.
0.7 0.9 0.7 0.7 0.7 0.3 0.7 1.0 0.7 0.7 0.7
88 Forward secrecy
base 0.8 · Forward secrecy protects past traffic from future key compromise.
0.8 1.0 0.8 0.8 0.8 0.3 0.8 1.1 0.8 0.8 0.8
89 Certificate key strength and signature algorithm
base 0.7 · Cert key strength + signature algorithm — defends against deprecated SHA-1, weak RSA.
0.7 0.7 0.7 0.7 0.7 0.3 0.7 0.7 0.7 0.7 0.7
90 Certificate chain completeness
base 1.0 · Cert chain completeness — incomplete chains break clients silently.
1.0 1.0 1.0 1.0 1.0 0.5 1.0 1.0 1.0 1.0 1.0
91 OCSP stapling
base 0.6 · OCSP stapling is a polish item — improves connection time + privacy.
0.6 0.8 0.6 0.6 0.6 0.2 0.6 0.9 0.6 0.6 0.6
92 Embedded SCT count (Certificate Transparency)
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
93 Encrypted Client Hello
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
94 Post-quantum key exchange
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
95 Certificate validity-period brevity
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
96 OCSP Must-Staple
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
97 Issuer reputation tier
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
98 TLS handshake latency
no row · default 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

How the weighted score works

For a site of type T:

  1. For every factor not in Web Standards, the connector emits a 0..100 factor score (or n/a).
  2. Each scored factor is multiplied by its weight w(ticket, T) from this matrix.
  3. The Web Quality score is the weighted average: sum(score × w) / sum(w), rounded to 0..100.
  4. Factors with weight 0, factors marked n/a by applicability, and Web Standards items are excluded entirely.

The score is suppressed when Web Standards fails — a site that hasn't cleared the binary check doesn't get a Web Quality number, only a list of what to fix.

Related

← back to methodology