🧠 Universal Knowledge Model (UKM) — v2 (0–4 scale)

A vendor-neutral baseline describing how platforms ingest, normalize, index, retrieve, and govern organizational knowledge.

Last updated: 2025-10-12T13:16:15Z

What changed in v2

Maturity scale normalized to 0–4 with explicit acceptance gates per phase.
Typed corpora clarified: capability, knowledge, story, ontology, profile.
Evidence discipline: exportable citations/provenance, schemas/shapes, retrieval traces, and visibility tests are required for higher levels.
Gating rules added so keyword-only search or undocumented RAG can’t be scored like ontology-linked, governed systems.
Quality KPIs added (Precision@k, citation rate, freshness, latency, SHACL violations, governance pass rate).
Atlas alignment: token→numeric mapping for the Atlas UKM snapshot components.

⚙️ The Five Knowledge Phases

#	Phase	Definition	Typical Data	Expected Capability
1	Ingestion & Validation	Bring knowledge into the system and ensure it conforms to schema	KB JSON, runbooks, retros, capability cards, profiles, ontology events	Source adapters, schema validation, dedupe, provenance capture
2	Normalization & Enrichment	Add structure and context	Owner/visibility, tags, entities (service/team/env), versions	Metadata normalization, ontology tagging, PII redaction, versioning
3	Indexing & Linking	Make it searchable and connected	Embeddings, scalar metadata, RDF triples, shapes	Vector index (Milvus-class), metadata filters/expr, graph links (RDF/SHACL), shapes→vectors
4	Retrieval & Composition	Answer questions and assemble context	RAG queries, typed retrieval (capability/knowledge/story), profile hints	Similarity + filters + routing, snippet assembly, Mermaid diagrams, profile-aware results
5	Governance & Evolution	Control access, explain results, and improve quality	Visibility scopes, citations, feedback, freshness	Enforced visibility, citations/provenance, feedback loops, recency windows, deprecation lifecycle

🧩 UKM Maturity Scale (0–4)

Level	Label	Acceptance (must satisfy all lower levels)
0	None	Ad-hoc docs/wikis; no embeddings or schema; no provenance.
1	Indexed	Keyword search across files/pages; basic metadata (owner or tags); no vector index; no graph.
2	Semantic	Vector index + metadata filters; basic provenance (source, owner, updated_at); recency windows; no graph requirements.
3	Ontology-linked	Typed corpora + RDF/SHACL graph; schema-validated ingest; governed retrieval (visibility enforcement); cross-domain joins; diagram assembly supported.
4	Contextual Intelligence	Profile-aware retrieval & story composition; citations on every answer; continuous freshness & feedback; explainable snippet assembly with diagrams; versioned ontology & query packs.

Exportability gate: If citations/provenance cannot be exported with retrieval results, cap at Level 2 (Semantic).

Graph gate: If there is no RDF/SHACL graph with mappings (or no typed retrieval across corpora), cap at Level 2 (Semantic).

Governance gate: If visibility/tenant enforcement cannot be proven with tests (no cross-tenant bleed), cap at Level 2 (Semantic).

Overall rule: Overall UKM level = highest level where all acceptance clauses for that level and below hold.

🔎 Per-phase expectations by level (condensed)

1) Ingestion & Validation

L0: Manual uploads; no validation.
L1: Keyword ingestors; minimal metadata; no schema errors surfaced.
L2: Schema validation with actionable errors; dedupe; provenance recorded.
L3: Validated multi-corpus ingest; SHACL validation for ontology items; DLQ/Bloom dedupe.
L4: Event-driven re-ingest; freshness SLOs; backfill/migrations with change logs.

2) Normalization & Enrichment

L0: None.
L1: Owner or tag only.
L2: Owner/visibility/tags; entity hints (service/team/env); PII redaction.
L3: Ontology tagging; versions; alias policy; normalization of units/formats.
L4: Auto-enrichment from telemetry/ontology; confidence scores.

3) Indexing & Linking

L0: Flat files only.
L1: Keyword index only.
L2: Vector + metadata index; hybrid search (vector ∪ keyword).
L3: RDF graph; SHACL shapes; shapes→vectors; linkable nodes/edges; query library.
L4: RAG-ready embeddings keyed by URIs; cross-domain joins at SLO.

4) Retrieval & Composition

L0: Manual browsing.
L1: Keyword only; no citations.
L2: Similarity + filters; snippet assembly; optional citations.
L3: Typed retrieval (capability/knowledge/story/profile/ontology), cite-back; diagram (Mermaid) support.
L4: Profile-aware retrieval (skills/ownership), scenario/story composition; answer plans with citations & diagrams.

5) Governance & Evolution

L0: None.
L1: Best-effort RBAC.
L2: Enforced visibility/tenant scopes; audit of queries.
L3: Provenance on every item; redaction; per-tenant graphs; feedback loops.
L4: Quality dashboards; deprecation lifecycle; continuous evals; SLA/SLO for search.

🧠 Neutral Knowledge Event (JSON Example)

{
  "knowledge_event": {
    "id": "2f8f6e9b-0b09-4c3e-8d0a-1209bb0a75f1",
    "timestamp": "2025-10-09T12:00:00Z",
    "phase": "ingest",
    "corpus": "capability",
    "item": {
      "key": "oncall_handoff_slack_v1",
      "title": "On-call Handoff via Slack",
      "text": "Standard handoff format and Slack workflow steps…",
      "entities": ["slack","incident","runbook"],
      "tags": ["oncall","handoff"]
    },
    "metadata": {
      "owner": "platform",
      "visibility": "team",
      "source": "repo://kb/capabilities/oncall.json",
      "revision": "a1b2c3"
    },
    "provenance": {
      "created_at": "2025-09-01T10:00:00Z",
      "updated_at": "2025-10-01T09:30:00Z"
    },
    "embedding_ref": "milvus://collections/devops_capabilities/ids/…",
    "graph_refs": ["graphdb://iac#Capability/oncall_handoff"],
    "governance": { "pii_redacted": true, "scopes": ["team"] }
  }
}

✅ Feature Requirements (FR) & Acceptance Criteria (AC)

F1. Schema discipline & ingest logs — Validates and rejects invalid rows with actionable errors.
AC: 100% of ingestors produce structured error records with JSON pointers.

F2. Normalization & enrichment — Owner/visibility/tags; entity mapping; PII redaction.
AC: All items carry owner, visibility and entity metadata; PII masked where applicable.

F3. Indexing & linking — Vector + metadata index; RDF/SHACL graph; shapes→vectors.
AC: Items resolvable by URI; SHACL violations logged; embeddings retrievable by node URI.

F4. Retrieval & composition — Typed retrieval with routing; snippet assembly; diagrams.
AC: Queries route to the correct corpus; results include citations; Mermaid render succeeds or returns diagnostics.

F5. Governance & evolution — Visibility enforced; citations/provenance included; freshness windows; feedback loop.
AC: Cross-tenant/visibility bleed tests pass; answers carry citations; freshness metrics available; feedback captured.

📈 Quality KPIs & Target Bands (guide)

Precision@5: ≥ 0.60 (L3), ≥ 0.75 (L4).
Citation_Rate% (answers with citations): ≥ 95% (L3), ≥ 99% (L4).
Search_Latency_P50/P95 (ms): ≤ 800/1500 (L3), ≤ 500/900 (L4).
Freshness_P95_Age_days (capability corpus): ≤ 14 (L3), ≤ 7 (L4).
Governance_Pass_Rate% (visibility tests): ≥ 99.0% (L3), ≥ 99.9% (L4).
SHACL_Violation_Rate per 1k inserts: ≤ 5 (L3), ≤ 1 (L4).
Index_Coverage% (required corpora present): ≥ 60% (L3, ≥3/5), ≥ 80% (L4, ≥4/5).
Broken_Link_Rate% (evidence/citations): ≤ 1.0% (L3), ≤ 0.2% (L4).

Use the same query set and capture retrieval logs + citations across vendors for fair bake-offs.

🔁 Atlas alignment (UKM token → UKM level)

Atlas token	UKM level
N/L	0
P/L	1
P/M	2
Y/M	3
Y/H	4

🔍 Comparison Template (vs. UKM)

Platform	Ingest&Validate (0–4)	Normalize&Enrich (0–4)	Index&Link (0–4)	Retrieve&Compose (0–4)	Govern&Evolve (0–4)	Overall UKM (0–4)	Precision@5	Citation_Rate_%	Search_P50_ms	Search_P95_ms	Freshness_P95_days	Governance_Pass_%	SHACL_Viol/1k	Index_Coverage_%	Broken_Link_%	Evidence Links
Your Stack
Competitor A
Competitor B

Attach retrieval logs, citations, schema/shape files, and governance test scripts as evidence.

📝 Conformance Checklist

Schema-validated ingest with actionable errors; DLQ/Dedupe.
Normalization adds owner, visibility, entities, versions; PII masking applied.
Vector + metadata index and RDF/SHACL graph with shapes→vectors.
Typed retrieval (capability/knowledge/story/profile/ontology) with citations and diagram assembly.
Governance & freshness enforced; query audit; feedback loops; deprecation lifecycle.

📦 Appendix — Minimal Acceptance Examples

Schema rejection — malformed capability JSON rejected; error returns JSON-pointer to failing field.
Typed routing — corpus=story query returns only stories; score threshold applied.
SHACL validation — invalid resource violates a shape; violation logged and routed to DLQ.
Mermaid rendering — valid code produces PNG/SVG; errors return a diagnostic message.
Governance filter — team-scoped query does not return org/private items; tests included.
Precision@k — for a standard query set, Precision@5 ≥ target with citations present.

🗂️ Field Canonicals (Knowledge)

phase ∈ {ingest, normalize, index, retrieve, govern}
corpus ∈ {capability, knowledge, story, ontology, profile}
visibility ∈ {team, org, private}
provenance: {source, owner, created_at, updated_at, revision}
links: graph_refs[], embedding_ref

UKM — Ratings & Notes (Examples)