🧠 Universal Knowledge Model (UKM) — v2 (0–4 scale)
A vendor-neutral baseline describing how platforms ingest, normalize, index, retrieve, and govern organizational knowledge.
Last updated: 2025-10-12T13:16:15Z
What changed in v2
- Maturity scale normalized to 0–4 with explicit acceptance gates per phase.
- Typed corpora clarified:
capability,knowledge,story,ontology,profile. - Evidence discipline: exportable citations/provenance, schemas/shapes, retrieval traces, and visibility tests are required for higher levels.
- Gating rules added so keyword-only search or undocumented RAG can’t be scored like ontology-linked, governed systems.
- Quality KPIs added (Precision@k, citation rate, freshness, latency, SHACL violations, governance pass rate).
- Atlas alignment: token→numeric mapping for the Atlas UKM snapshot components.
⚙️ The Five Knowledge Phases
| # | Phase | Definition | Typical Data | Expected Capability |
|---|---|---|---|---|
| 1 | Ingestion & Validation | Bring knowledge into the system and ensure it conforms to schema | KB JSON, runbooks, retros, capability cards, profiles, ontology events | Source adapters, schema validation, dedupe, provenance capture |
| 2 | Normalization & Enrichment | Add structure and context | Owner/visibility, tags, entities (service/team/env), versions | Metadata normalization, ontology tagging, PII redaction, versioning |
| 3 | Indexing & Linking | Make it searchable and connected | Embeddings, scalar metadata, RDF triples, shapes | Vector index (Milvus-class), metadata filters/expr, graph links (RDF/SHACL), shapes→vectors |
| 4 | Retrieval & Composition | Answer questions and assemble context | RAG queries, typed retrieval (capability/knowledge/story), profile hints | Similarity + filters + routing, snippet assembly, Mermaid diagrams, profile-aware results |
| 5 | Governance & Evolution | Control access, explain results, and improve quality | Visibility scopes, citations, feedback, freshness | Enforced visibility, citations/provenance, feedback loops, recency windows, deprecation lifecycle |
🧩 UKM Maturity Scale (0–4)
| Level | Label | Acceptance (must satisfy all lower levels) |
|---|---|---|
| 0 | None | Ad-hoc docs/wikis; no embeddings or schema; no provenance. |
| 1 | Indexed | Keyword search across files/pages; basic metadata (owner or tags); no vector index; no graph. |
| 2 | Semantic | Vector index + metadata filters; basic provenance (source, owner, updated_at); recency windows; no graph requirements. |
| 3 | Ontology-linked | Typed corpora + RDF/SHACL graph; schema-validated ingest; governed retrieval (visibility enforcement); cross-domain joins; diagram assembly supported. |
| 4 | Contextual Intelligence | Profile-aware retrieval & story composition; citations on every answer; continuous freshness & feedback; explainable snippet assembly with diagrams; versioned ontology & query packs. |
Exportability gate: If citations/provenance cannot be exported with retrieval results, cap at Level 2 (Semantic).
Graph gate: If there is no RDF/SHACL graph with mappings (or no typed retrieval across corpora), cap at Level 2 (Semantic).
Governance gate: If visibility/tenant enforcement cannot be proven with tests (no cross-tenant bleed), cap at Level 2 (Semantic).
Overall rule: Overall UKM level = highest level where all acceptance clauses for that level and below hold.
🔎 Per-phase expectations by level (condensed)
1) Ingestion & Validation
- L0: Manual uploads; no validation.
- L1: Keyword ingestors; minimal metadata; no schema errors surfaced.
- L2: Schema validation with actionable errors; dedupe; provenance recorded.
- L3: Validated multi-corpus ingest; SHACL validation for ontology items; DLQ/Bloom dedupe.
- L4: Event-driven re-ingest; freshness SLOs; backfill/migrations with change logs.
2) Normalization & Enrichment
- L0: None.
- L1: Owner or tag only.
- L2: Owner/visibility/tags; entity hints (service/team/env); PII redaction.
- L3: Ontology tagging; versions; alias policy; normalization of units/formats.
- L4: Auto-enrichment from telemetry/ontology; confidence scores.
3) Indexing & Linking
- L0: Flat files only.
- L1: Keyword index only.
- L2: Vector + metadata index; hybrid search (vector ∪ keyword).
- L3: RDF graph; SHACL shapes; shapes→vectors; linkable nodes/edges; query library.
- L4: RAG-ready embeddings keyed by URIs; cross-domain joins at SLO.
4) Retrieval & Composition
- L0: Manual browsing.
- L1: Keyword only; no citations.
- L2: Similarity + filters; snippet assembly; optional citations.
- L3: Typed retrieval (capability/knowledge/story/profile/ontology), cite-back; diagram (Mermaid) support.
- L4: Profile-aware retrieval (skills/ownership), scenario/story composition; answer plans with citations & diagrams.
5) Governance & Evolution
- L0: None.
- L1: Best-effort RBAC.
- L2: Enforced visibility/tenant scopes; audit of queries.
- L3: Provenance on every item; redaction; per-tenant graphs; feedback loops.
- L4: Quality dashboards; deprecation lifecycle; continuous evals; SLA/SLO for search.
🧠 Neutral Knowledge Event (JSON Example)
{
"knowledge_event": {
"id": "2f8f6e9b-0b09-4c3e-8d0a-1209bb0a75f1",
"timestamp": "2025-10-09T12:00:00Z",
"phase": "ingest",
"corpus": "capability",
"item": {
"key": "oncall_handoff_slack_v1",
"title": "On-call Handoff via Slack",
"text": "Standard handoff format and Slack workflow steps…",
"entities": ["slack","incident","runbook"],
"tags": ["oncall","handoff"]
},
"metadata": {
"owner": "platform",
"visibility": "team",
"source": "repo://kb/capabilities/oncall.json",
"revision": "a1b2c3"
},
"provenance": {
"created_at": "2025-09-01T10:00:00Z",
"updated_at": "2025-10-01T09:30:00Z"
},
"embedding_ref": "milvus://collections/devops_capabilities/ids/…",
"graph_refs": ["graphdb://iac#Capability/oncall_handoff"],
"governance": { "pii_redacted": true, "scopes": ["team"] }
}
}
✅ Feature Requirements (FR) & Acceptance Criteria (AC)
F1. Schema discipline & ingest logs — Validates and rejects invalid rows with actionable errors.
AC: 100% of ingestors produce structured error records with JSON pointers.
F2. Normalization & enrichment — Owner/visibility/tags; entity mapping; PII redaction.
AC: All items carry owner, visibility and entity metadata; PII masked where applicable.
F3. Indexing & linking — Vector + metadata index; RDF/SHACL graph; shapes→vectors.
AC: Items resolvable by URI; SHACL violations logged; embeddings retrievable by node URI.
F4. Retrieval & composition — Typed retrieval with routing; snippet assembly; diagrams.
AC: Queries route to the correct corpus; results include citations; Mermaid render succeeds or returns diagnostics.
F5. Governance & evolution — Visibility enforced; citations/provenance included; freshness windows; feedback loop.
AC: Cross-tenant/visibility bleed tests pass; answers carry citations; freshness metrics available; feedback captured.
📈 Quality KPIs & Target Bands (guide)
- Precision@5: ≥ 0.60 (L3), ≥ 0.75 (L4).
- Citation_Rate% (answers with citations): ≥ 95% (L3), ≥ 99% (L4).
- Search_Latency_P50/P95 (ms): ≤ 800/1500 (L3), ≤ 500/900 (L4).
- Freshness_P95_Age_days (capability corpus): ≤ 14 (L3), ≤ 7 (L4).
- Governance_Pass_Rate% (visibility tests): ≥ 99.0% (L3), ≥ 99.9% (L4).
- SHACL_Violation_Rate per 1k inserts: ≤ 5 (L3), ≤ 1 (L4).
- Index_Coverage% (required corpora present): ≥ 60% (L3, ≥3/5), ≥ 80% (L4, ≥4/5).
- Broken_Link_Rate% (evidence/citations): ≤ 1.0% (L3), ≤ 0.2% (L4).
Use the same query set and capture retrieval logs + citations across vendors for fair bake-offs.
🔁 Atlas alignment (UKM token → UKM level)
| Atlas token | UKM level |
|---|---|
| N/L | 0 |
| P/L | 1 |
| P/M | 2 |
| Y/M | 3 |
| Y/H | 4 |
🔍 Comparison Template (vs. UKM)
| Platform | Ingest&Validate (0–4) | Normalize&Enrich (0–4) | Index&Link (0–4) | Retrieve&Compose (0–4) | Govern&Evolve (0–4) | Overall UKM (0–4) | Precision@5 | Citation_Rate_% | Search_P50_ms | Search_P95_ms | Freshness_P95_days | Governance_Pass_% | SHACL_Viol/1k | Index_Coverage_% | Broken_Link_% | Evidence Links |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Your Stack | ||||||||||||||||
| Competitor A | ||||||||||||||||
| Competitor B |
Attach retrieval logs, citations, schema/shape files, and governance test scripts as evidence.
📝 Conformance Checklist
- Schema-validated ingest with actionable errors; DLQ/Dedupe.
- Normalization adds owner, visibility, entities, versions; PII masking applied.
- Vector + metadata index and RDF/SHACL graph with shapes→vectors.
- Typed retrieval (capability/knowledge/story/profile/ontology) with citations and diagram assembly.
- Governance & freshness enforced; query audit; feedback loops; deprecation lifecycle.
📦 Appendix — Minimal Acceptance Examples
- Schema rejection — malformed
capabilityJSON rejected; error returns JSON-pointer to failing field. - Typed routing —
corpus=storyquery returns only stories; score threshold applied. - SHACL validation — invalid resource violates a shape; violation logged and routed to DLQ.
- Mermaid rendering — valid code produces PNG/SVG; errors return a diagnostic message.
- Governance filter — team-scoped query does not return org/private items; tests included.
- Precision@k — for a standard query set, Precision@5 ≥ target with citations present.
🗂️ Field Canonicals (Knowledge)
- phase ∈
{ingest, normalize, index, retrieve, govern} - corpus ∈
{capability, knowledge, story, ontology, profile} - visibility ∈
{team, org, private} - provenance:
{source, owner, created_at, updated_at, revision} - links:
graph_refs[],embedding_ref