🧭 Universal Diagnostic Model (UDM) — v2 (0–4 scale)

A vendor-neutral baseline describing how intelligent Ops/Observability systems detect, enrich, correlate, explain (RCA), and recommend/remediate incidents — with verifiable evidence.

Last updated: 2025-10-12T12:28:58Z

What changed in v2

Maturity scale normalized to 0–4 (was 0–4/0–5 across sources). Clear acceptance gates per level.
Precise phase names: Signal Detection · Context Enrichment · Event Correlation & Classification · Root Cause Analysis · Recommendation/Remediation.
Atlas alignment: Token→numeric mapping for the Atlas 🔍 Diagnostics capability.
Operational metrics: TTFC/TTRC, Verified-RCA rate, false-positive rate added as non-functional targets.
Exports: Scorecard CSV and machine-readable rubric JSON.

⚙️ Five Diagnostic Phases

#	Phase	Definition	Typical Data	Expected Capability
1	Signal Detection	Identify anomalies or deviations from expected behavior	Metrics (CPU/latency/errors), logs, traces, alerts	Thresholds, anomaly detectors, drift-aware baselines; capture detection reason & thresholds
2	Context Enrichment	Link signals with entities, ownership, deploy/change context	Service maps, k8s/CMDB, deploy metadata	Stable IDs, dependency graph, change/owner joins, SLO context
3	Event Correlation & Classification	Group related signals and classify the probable domain/cause family	Multi-signal events across time windows	Correlation windows, clustering/causal hints, change-aware grouping
4	Root Cause Analysis (RCA)	Produce a testable hypothesis explaining why with evidence	Enriched telemetry + historical baselines + change diffs	Structured hypothesis + verification plan; confidence; negative evidence considered
5	Recommendation / Remediation	Propose (or execute under guardrails) a mitigation with verification	Runbooks, IaC diffs, workflows	Risk-aware plan, preflight checks, approvals; rollback & post-verify steps

🧩 UDM Maturity Scale (0–4)

Level	Label	Acceptance (must satisfy this level and all lower levels)
0	None	No diagnostics beyond raw alerts/logs; no context; no evidence export.
1	Reactive	L1 detection: manual/threshold alerts; minimal labeling; ad-hoc triage notes.
2	Correlated	L2 detection + multi-signal correlation or rule-based classification; entity/service mapping; links to evidence (queries/logs/traces).
3	Intelligent	L3 adds structured RCA with verification steps (counter-tests), confidence scoring, change awareness, and explainable evidence (permalinks/queries included).
4	Autonomous	L4 adds causal reasoning/graphs, automated counter-tests, early-finalize on high confidence, and guardrailed remediation (approvals/rollback), with tracked quality metrics (Verified-RCA rate, FP rate).

Gating rule: A product’s UDM level is the highest level whose acceptance gates (and all below) are met. Any missing gate caps the level.

(Content continues as formatted in prior step…)

UDM Rating Example (v2)

🧭 Universal Diagnostic Model (UDM) — v2 (0–4 scale)

What changed in v2

⚙️ Five Diagnostic Phases

🧩 UDM Maturity Scale (0–4)

Table of contents