Stop Chunking Documents: The Open Knowledge Format (OKF) for Enterprise AI
Chunk-and-embed RAG hits a wall at scale. Open Knowledge Format (OKF) feeds a context engine structured, governed knowledge — the Mattrx rebuild with code.
- Author
- Randhir Jassal
- Published
- Reading time
- 18 min read
- Views
- 1 views
Everyone's first RAG pipeline is the same four boxes: documents, chunk, vector DB, LLM. It demos in an afternoon and then quietly betrays you in production — stale answers, no relationships, no governance, and a model guessing from fragments. The fix is not a bigger vector index. It is to stop storing documents and start storing knowledge. That is what Open Knowledge Format (OKF) is.
To be clear up front, because the title is deliberately provocative: OKF does not kill embeddings. Vectors still do the recall. What OKF kills is blind chunking — slicing opaque documents into context-free fragments and hoping cosine similarity reassembles meaning. On Mattrx, our multi-tenant marketing-analytics SaaS, replacing that with OKF + a Context Engine took our assistant's hallucination rate from 18% to 3% and our stale-answer rate from 11% to 1.5%. This post is how, with the real format and real code.
TL;DR
| Dimension | Documents → chunk → vector DB (before) | OKF + Context Engine (after) |
|---|---|---|
| Unit of knowledge | Opaque chunk of text | Typed, governed knowledge unit |
| Structure | None — chunks are islands | Metadata + relationships + schemas |
| Freshness | Snapshot, rots silently | valid_until + live API refs |
| Rules | Buried in prose, ignorable | First-class data the engine enforces |
| Retrieval | Top-k cosine | Hybrid + vector + graph |
| Multi-hop questions | Unanswerable | Answered via relationships |
- Knowledge base restructured from raw docs into ~11,000 OKF units (Markdown + metadata + relationships + APIs + schemas + business rules).
- Hallucination rate 18% → 3%; faithfulness 0.96; answer-relevance 0.91.
- Context tokens per call 14k → 3.5k — structure lets the engine attach the right thing, not everything.
- Outdated-answer rate 11% → 1.5% (metadata freshness +
valid_until). - Multi-hop questions became answerable via graph retrieval over OKF relationships.
- Business rules enforced as data, not buried in prose (e.g., "never recommend a deprecated plan").
- Live data via API references — answers reflect the current catalog, not a stale snapshot.
- "Mattrx Help" deflects ~520 support tickets/month.
- The Context Engine = hybrid search + vector retrieval + graph retrieval + memory + tool calls + prompt assembly.
- Vectors still carry recall; OKF adds the precision, governance, and graph that vectors alone cannot.
The one mental shift: a chunk is a fragment of text with no identity, no owner, and no expiry. An OKF unit is a governed, typed, related piece of knowledge your context engine can reason about. Stop indexing text. Start indexing knowledge.
The running example: Mattrx
Mattrx is a real system — Angular 19 front end, .NET 9 / ASP.NET Core back end (Clean Architecture + CQRS), Azure SQL, Azure App Service, Kafka ingestion, Service Bus for report commands. The AI surface is Mattrx Help (support assistant) and Mattrx Insights (an agentic analyst), with C# owning orchestration and a Python FastAPI service owning embeddings, retrieval, and agents.
Mattrx Help launched on the textbook pipeline: dump product docs, runbooks, and billing policies into a store, chunk them, embed them, retrieve top-k, stuff the prompt. It was confidently wrong 18% of the time. It recommended a plan we had deprecated six months earlier (the deprecation lived in a doc it never retrieved). It could not answer "if a Growth-plan customer downgrades mid-cycle, how is it prorated?" because the answer spanned three documents that chunking had shredded into unrelated fragments.
None of that is a model problem. It is a knowledge representation problem. Here is the before and the after — the exact shape we run now:
BEFORE (naive RAG) AFTER (OKF + Context Engine)
Documents OKF Knowledge Base
| |-- Markdown
v |-- Metadata
Chunk |-- Relationships
| |-- APIs
v |-- Schemas
Vector DB +-- Business Rules
| |
v v
LLM Context Engine
|-- Hybrid Search
|-- Vector Retrieval
|-- Graph Retrieval
|-- Memory
|-- Tool Calls
+-- Prompt Assembly
|
v
LLM
Two halves: OKF is how knowledge is stored; the Context Engine is how it is assembled into a prompt. Let's build both, with the before that broke and the after that holds.
Part 1 — The OKF Knowledge Base
1. From documents to knowledge units (Markdown + Metadata)
Before
A document went in as a blob and came out as chunks with no identity. A 30-page PDF became 60 fragments, each indistinguishable from any other text.
// BEFORE: a document is just text to be sliced.
var text = await ExtractText(pdf);
foreach (var chunk in SlidingWindow(text, size: 800, overlap: 100))
await vectorStore.UpsertAsync(Embed(chunk), ct); // no id, no owner, no type, no expiry
Diagnostic: a chunk has no identity. You cannot say who owns it, when it expires, what it relates to, or whether it's even still true. It is text ripped from context, and the model inherits all of that missing context as uncertainty.
After
The atomic unit is an OKF document: a Markdown body plus structured frontmatter (metadata). Every unit has an id, a type, an owner, a version, and a validity window.
---
id: kb/billing/proration-policy
title: How mid-cycle plan changes are prorated
type: policy
owner: billing-team
version: 4
valid_from: 2026-01-01
valid_until: null # null = current; set a date to auto-expire
visibility: tenant-safe # contains no tenant-specific data
relationships:
- relates_to: kb/billing/plan-tiers
- supersedes: kb/billing/proration-policy@3
- governed_by: rule/billing/no-deprecated-plans
apis:
- GET /v2/billing/plans # live plan catalog, resolved at query time
schemas:
- schema/billing/invoice-line
tags: [billing, proration, plans]
---
When a customer changes plan mid-cycle, Mattrx prorates to the day. The
unused portion of the old plan is credited and the new plan is charged
pro-rata from the change date. Credits never exceed the original charge...
The body is still text we embed. But now the unit carries metadata the engine can filter and rank on, and a typed identity it can reason about.
// AFTER: the knowledge unit is a typed, governed record — not a chunk.
public sealed record OkfDocument
{
public required string Id { get; init; } // "kb/billing/proration-policy"
public required string Title { get; init; }
public required string Type { get; init; } // policy | runbook | playbook | reference
public required string Markdown { get; init; } // the body we still embed
public required OkfMetadata Metadata { get; init; } // owner, version, valid_until, visibility
public IReadOnlyList<OkfRelationship> Relationships { get; init; } = [];
public IReadOnlyList<string> ApiRefs { get; init; } = [];
public IReadOnlyList<string> SchemaRefs { get; init; } = [];
public IReadOnlyList<string> BusinessRuleIds { get; init; } = [];
}
Diagnostic: the moment a unit has valid_until, the engine can refuse to ground an answer in expired knowledge. The moment it has visibility: tenant-safe, you have a governance hook. Identity is what makes knowledge manageable.
Mattrx metric: giving units a validity window dropped our outdated-answer rate from 11% to 1.5% — the assistant stopped citing superseded policies because the engine filters them out by metadata before the model ever sees them.
2. Relationships and Schemas: the knowledge graph
Before
Chunks were islands. The proration policy, the plan tiers, and the downgrade flow lived in three documents with no link between them — so a question that needed all three never retrieved all three.
Diagnostic: vector similarity finds chunks that sound alike, not chunks that are connected. "Proration" and "downgrade flow" may not be similar enough in embedding space to co-retrieve, even though answering one requires the other. Cosine has no concept of "related to."
After
OKF makes relationships first-class. Each unit declares its edges (relates_to, supersedes, governed_by, depends_on), and a schema defines the shape of structured units. Together they form a knowledge graph over the corpus.
public sealed record OkfRelationship(string Kind, string TargetId); // ("relates_to", "kb/billing/plan-tiers")
// Schemas keep structured units consistent and machine-checkable.
public sealed record OkfSchema(
string Id, // "schema/billing/invoice-line"
IReadOnlyDictionary<string, string> Fields); // { "amount": "decimal", "currency": "iso-4217", ... }
// Graph retrieval: start from semantic seeds, then expand along OKF relationships.
public sealed class KnowledgeGraph(IOkfStore store)
{
public async Task<IReadOnlyList<OkfDocument>> ExpandAsync(
IReadOnlyList<OkfDocument> seeds, int maxHops, CancellationToken ct)
{
var seen = seeds.ToDictionary(d => d.Id);
var frontier = new Queue<OkfDocument>(seeds);
for (var hop = 0; hop < maxHops && frontier.Count > 0; hop++)
{
foreach (var doc in DrainLevel(frontier))
foreach (var rel in doc.Relationships)
if (await store.TryGetAsync(rel.TargetId, ct) is { } related && seen.TryAdd(related.Id, related))
frontier.Enqueue(related);
}
return seen.Values.ToList();
}
}
Diagnostic: this is why "which campaigns are affected by the new attribution policy?" became answerable. Hybrid search finds the policy unit; graph retrieval follows relates_to and governed_by edges to the campaigns and rules that depend on it. Vectors find the door; the graph walks the building.
Mattrx metric: multi-hop questions went from unanswerable to routinely answered. The single biggest quality jump after the format change came from graph retrieval pulling in related units that pure vector search consistently missed.
3. APIs and Business Rules: live data and governance as data
Before
Knowledge was a frozen snapshot, and rules were sentences buried in prose that the model was free to ignore.
// BEFORE: the plan list is whatever was true the day someone wrote the doc.
var chunk = "Our plans are Starter, Growth, Scale, and Enterprise..."; // already wrong
Diagnostic: two failures. First, the moment the catalog changes, the snapshot is wrong and nothing flags it. Second, "don't recommend deprecated plans" written in a paragraph is a suggestion — the model may or may not honor it, and you cannot audit whether it did.
After
OKF units reference APIs for live data and link to business rules that the Context Engine enforces as constraints, not prose.
# rule/billing/no-deprecated-plans — a governed unit, versioned and owned
id: rule/billing/no-deprecated-plans
type: business-rule
owner: billing-team
version: 2
applies_to: [billing]
statement: >
Never recommend a plan whose status is 'deprecated'. Offer the successor
plan from GET /v2/billing/plans instead.
enforcement: hard # hard = injected as a constraint the model must obey
// At query time, API refs resolve to LIVE data; rules become hard constraints.
public sealed class ContextConstraints(IBusinessRules rules, IToolBroker tools)
{
public async Task<EngineInputs> ResolveAsync(
AiPrincipal p, IReadOnlyList<OkfDocument> units, CancellationToken ct)
{
// Live data: resolve each unit's api: refs through the governed tool layer.
var live = await tools.ResolveApiRefsAsync(p, units, ct); // e.g. current plan catalog
// Governance: gather the business rules governing these units.
var ruleIds = units.SelectMany(u => u.BusinessRuleIds).Distinct();
var constraints = await rules.LoadAsync(ruleIds, ct);
return new EngineInputs(live, constraints);
}
}
Diagnostic: a business rule expressed as data is enforceable and auditable. The engine injects it ahead of the knowledge, and every answer's audit entry records which rules applied. API refs mean the plan catalog in the answer is the one returned by /v2/billing/plans now, not the one someone typed into a doc last quarter.
Mattrx metric: "recommended a deprecated plan" — once a recurring complaint — dropped to zero after the deprecation rule became a hard constraint instead of a sentence. This is the same governed-tool boundary from our AI-Native Architecture post, now feeding the knowledge layer.
Part 2 — The Context Engine
OKF is how knowledge is stored. The Context Engine is how it becomes a prompt. It is, in effect, the Knowledge Base + Context Layer + Memory + tool layer from our AI-Native Architecture post — but fed structured OKF units instead of blind chunks.
4. Retrieval: Hybrid + Vector + Graph
Before
One retrieval strategy: embed the question, top-k cosine, done.
// BEFORE: a single, blunt retrieval step.
var chunks = await vectorStore.SearchAsync(Embed(question), k: 8, ct);
var prompt = string.Join("\n\n", chunks) + "\n\nQ: " + question; // islands, no rules, no graph
Diagnostic: top-k cosine is one signal. It misses exact-term matches (a campaign id, an error code), it misses related-but-not-similar units, and it has no idea what's authoritative versus stale.
After
Three retrieval modes feed one ranker: hybrid search (lexical BM25 + vector) for recall, vector retrieval for semantic nuance, and graph retrieval for connected units — all tenant-scoped and metadata-filtered.
public sealed class ContextEngine(
IHybridSearch hybrid, // BM25 + vector
IKnowledgeGraph graph, // OKF relationships
IMemoryStore memory,
ContextConstraints constraints,
IPromptAssembler assembler)
{
public async Task<AssembledContext> BuildAsync(
AiPrincipal p, string question, int tokenBudget, CancellationToken ct)
{
// 1. Hybrid recall over OKF units — tenant-scoped, only currently-valid units.
var seeds = await hybrid.SearchAsync(p, question, k: 8, onlyValid: true, ct);
// 2. Graph retrieval: expand along relationships the seeds declare.
var expanded = await graph.ExpandAsync(seeds, maxHops: 2, ct);
// 3. Memory: what we already know about this user/tenant.
var recall = await memory.GetSalientAsync(p, question, ct);
// 4. Live data + governing rules for the retrieved units.
var inputs = await constraints.ResolveAsync(p, expanded, ct);
// 5. Assemble to a hard token budget (rules first — see step 5 below).
return assembler.Pack(tokenBudget, inputs.Rules, recall, expanded, inputs.Live);
}
}
Diagnostic: the onlyValid: true filter is metadata doing security and quality work for free — expired units never enter the candidate set. Hybrid catches the exact-term matches vectors miss; the graph catches the connected units vectors miss. No single mode is enough; the combination is.
Mattrx metric: hybrid + graph retrieval over OKF is the core of the 18% → 3% hallucination drop and 0.96 faithfulness — the model is grounded in the right, current, connected units instead of a bag of plausible fragments.
5. Memory, Tool Calls, and Prompt Assembly
Before
Stateless, and the prompt was a bucket: dump every chunk and hope.
// BEFORE: no memory, no budget, rules nowhere.
var prompt = systemText + "\n" + string.Join("\n", chunks) + "\n" + question; // ~14k tokens, unordered
Diagnostic: no memory means re-asking what the user already told you. No budget means paying frontier prices to confuse the model with a wall of text. Rules nowhere means they're optional.
After
Memory supplies what we already know; tool calls fetch live data the OKF units reference; prompt assembly packs everything to a hard token budget in priority order — constraints first, then memory, then ranked knowledge, then live data.
public sealed class PromptAssembler : IPromptAssembler
{
public AssembledContext Pack(
int tokenBudget,
IReadOnlyList<BusinessRule> rules, // highest priority — never trimmed
IReadOnlyList<MemoryItem> memory,
IReadOnlyList<OkfDocument> knowledge, // ranked; trimmed to fit
IReadOnlyList<LiveValue> live)
{
var packer = new TokenBudgetPacker(tokenBudget);
packer.Add(Section.Constraints, rules); // rules first, always present
packer.Add(Section.Memory, memory);
packer.Add(Section.Knowledge, knowledge); // OKF units, highest-ranked first
packer.Add(Section.LiveData, live);
return packer.Pack(); // guaranteed within budget
}
}
Diagnostic: ordering is a design decision, not an accident. Business rules go first and are never trimmed, so a budget squeeze never silently drops the constraint that keeps the answer compliant. Knowledge is ranked and trimmed last-in. The model receives a small, ordered, governed context instead of a heap.
Mattrx metric: prompt assembly over OKF is how we hold context at 3.5k tokens (down from 14k) while improving accuracy — the engine attaches the right units, the governing rules, and the live data, and nothing else.
A query through the Context Engine
Q: "Can a Growth-plan customer downgrade mid-cycle, and how is it prorated?"
Context Engine
1. Hybrid Search -> seeds: [kb/billing/proration-policy, kb/billing/plan-tiers]
2. Graph Retrieval -> follow OKF relationships:
proration-policy --governed_by--> rule/billing/no-deprecated-plans
plan-tiers --relates_to---> kb/billing/downgrade-flow
3. Memory -> "this tenant is on Growth, billed annually"
4. Tool Calls -> GET /v2/billing/plans (live, current catalog)
5. Prompt Assembly -> [rules] + [tenant memory] + [ranked OKF units] + [live plans]
packed to 3.5k tokens, constraints first
|
v
LLM -> grounded, current, rule-compliant answer
Every box maps to a piece of the diagram you started with. The model at the end is the least interesting part — it succeeds because the Context Engine handed it exactly the right knowledge, current data, and the rules it must obey.
The numbers, in one place
| Metric | Before (chunk → vector) | After (OKF + Context Engine) |
|---|---|---|
| Hallucination rate | 18% | 3% |
| Faithfulness (eval) | — | 0.96 |
| Answer-relevance (eval) | — | 0.91 |
| Context tokens / call | ~14,000 | ~3,500 |
| Outdated-answer rate | 11% | 1.5% |
| Multi-hop questions | unanswerable | answered via graph |
| Knowledge units | opaque chunks | ~11,000 OKF units |
| Deprecated-plan recommendations | recurring | 0 |
| Tickets deflected / month | — | ~520 |
Migration checklist
- Define the OKF unit: Markdown body + frontmatter (id, type, owner, version, validity, visibility).
- Give every unit an identity and an expiry —
valid_untilis your freshness control. - Model relationships explicitly (
relates_to,supersedes,governed_by,depends_on). - Define schemas for structured unit types so they stay machine-checkable.
- Replace embedded snapshots with API references resolved at query time.
- Extract rules from prose into business-rule units with
enforcement: hard. - Build the Context Engine: hybrid + vector + graph retrieval, metadata-filtered to valid units.
- Assemble prompts to a token budget, constraints first and never trimmed.
- Migrate high-value, high-churn knowledge first; leave the long tail as plain chunks.
The honest stuff: when NOT to adopt OKF
OKF is an investment in structure. Pay it where structure exists and matters:
- Small or static corpus. A few hundred stable documents? Naive hybrid RAG is plenty. OKF's authoring overhead won't pay back.
- Genuinely unstructured knowledge. If your content has no relationships, no rules, and no freshness concerns, you're paying for a graph and metadata you'll never query.
- No one to own authoring. OKF lives or dies on metadata discipline. Without owners and review,
valid_untiland relationships rot, and rotten metadata is worse than none. - You haven't nailed vector RAG yet. Get hybrid retrieval + a cross-encoder rerank working first. OKF is the next layer, not the first — don't skip the fundamentals.
- Latency-critical, single-shot queries. Graph expansion adds hops. If a question never needs related units, skip graph retrieval for that path.
- Prototype / throwaway. Don't design a knowledge format before you've validated the use case with the cheap pipeline.
- "Big bang" migration. Converting the entire corpus at once is how this stalls. We converted billing, then product, then runbooks — high-value domains first — and left the long tail as plain chunks indefinitely.
And the honest framing of the title: OKF does not replace embeddings. Vector retrieval is still inside the Context Engine doing recall. OKF replaces blind chunking and adds the structure, governance, and graph that embeddings alone cannot provide. If someone sells you "vectors are dead," walk away.
The model to carry forward
Documents are what you have; knowledge is what the model needs. OKF is the conversion — typed, related, governed units — and the Context Engine is what turns those units back into exactly the right prompt: the relevant knowledge, the current data, and the rules that must be obeyed.
Three habits that make it work:
- Give every knowledge unit an identity, an owner, and an expiry. Metadata is what makes knowledge governable; a chunk has none of it.
- Model relationships explicitly. The graph answers the multi-hop questions vectors structurally cannot.
- Encode rules as data, not prose. A business rule the engine enforces beats a sentence the model is free to ignore.
The four boxes — documents, chunk, vector DB, LLM — were never wrong, just incomplete. OKF and a Context Engine are what the picture looks like once you've felt the limits of the first version in production.
Further reading
- Context Engineering for Enterprise AI, Part 1: Context Management (Why RAG Alone Isn't Enough)
- Context Engineering for Enterprise AI, Part 6: AI & Data Governance
- AI-Native Architecture: The 9-Layer Blueprint Every Enterprise Will Adopt by 2027
Restructuring a knowledge base into something your context engine can actually reason about? I'm always happy to compare notes — reach me at randhir.jassal@gmail.com.
Get the next issue
A short, curated email with the newest posts and questions.