Traditional RAG does retrieval — dump the top-k and hope the answer is in the noise. Coalent does context engineering — the minimum sufficient signal for the decision, with the raw on tap.
Key idea. get() returns ctx.context — the decision-relevant slice for this query — while ctx.evidence / ctx.raw_text keep the full source reachable. Minimal payload, nothing lost.
The coverage gate (auto-escalation)
Every read scores how well the cached unit covers the query. A hit that under-covers automatically pulls fresh raw for that query — no LLM call, no manual signal:
ctx = cache.get("how many leave days for a 5-year employee?")
ctx.coverage # 0.0–1.0: how well the unit covers the query
ctx.escalated # True if it had to fetch fresh raw to cover the question
So a cached unit that's broadly right but missing a specific number doesn't serve a thin answer — it escalates to fetch that detail. The specifics are always there when the model needs them.
Minimum-context projection
ctx.context is a compact, query-shaped payload:
ctx.context["understanding"] # summary + only the query-relevant claims/facts
ctx.context["raw"] # raw included only when needed (see strategies)
Irrelevant claims and facts are trimmed for this query — less noise to the LLM, which is both cheaper and higher quality.
Strategies
Choose how much raw rides along (the full raw is always reachable regardless):
from coalent import ContextStrategy
SemanticCache(retriever, synth, strategy=ContextStrategy.CONTEXT_FIRST) # default
| Strategy | ctx.context["raw"] |
|---|---|
CONTEXT_FIRST (default) | raw only when the read escalated |
CONTEXT_RAW | raw always |
CONTEXT_ONLY | never (understanding only) |
Override per call: cache.get(query, strategy=ContextStrategy.CONTEXT_RAW).
Related units (cross-unit reuse)
get() folds in related cognition units — ones sharing an entity or a source with the match, ranked by relevance to your query:
ctx = cache.get("our leave policy", related=3)
for r in ctx.related:
r.unit_id, r.relation, r.understanding # "shared_entity" | "shared_source"
This is light, lazy multi-hop — enough for cross-document reuse, not a graph engine.
Agent affordances
For agent loops, escalation is also explicit:
cache.drill(ctx.unit_id) # the full raw evidence behind a unit
cache.widen("our leave policy") # a fresh retrieval for a query
Next
- get() & Result — every field on the result.
- The Synthesizer — the structured understanding that context is projected from.