Context intelligence

Retrieval ≠ context. Coalent serves the minimum decision-relevant slice, auto-escalates when it's thin, and folds in related units.

Traditional RAG does retrieval — dump the top-k and hope the answer is in the noise. Coalent does context engineering — the minimum sufficient signal for the decision, with the raw on tap.

Key idea. get() returns ctx.context — the decision-relevant slice for this query — while ctx.evidence / ctx.raw_text keep the full source reachable. Minimal payload, nothing lost.

The coverage gate (auto-escalation)

Every read scores how well the cached unit covers the query. A hit that under-covers automatically pulls fresh raw for that query — no LLM call, no manual signal:

ctx = cache.get("how many leave days for a 5-year employee?")
ctx.coverage     # 0.0–1.0: how well the unit covers the query
ctx.escalated    # True if it had to fetch fresh raw to cover the question

So a cached unit that's broadly right but missing a specific number doesn't serve a thin answer — it escalates to fetch that detail. The specifics are always there when the model needs them.

Minimum-context projection

ctx.context is a compact, query-shaped payload:

ctx.context["understanding"]   # summary + only the query-relevant claims/facts
ctx.context["raw"]             # raw included only when needed (see strategies)

Irrelevant claims and facts are trimmed for this query — less noise to the LLM, which is both cheaper and higher quality.

Strategies

Choose how much raw rides along (the full raw is always reachable regardless):

from coalent import ContextStrategy

SemanticCache(retriever, synth, strategy=ContextStrategy.CONTEXT_FIRST)  # default
Strategyctx.context["raw"]
CONTEXT_FIRST (default)raw only when the read escalated
CONTEXT_RAWraw always
CONTEXT_ONLYnever (understanding only)

Override per call: cache.get(query, strategy=ContextStrategy.CONTEXT_RAW).

get() folds in related cognition units — ones sharing an entity or a source with the match, ranked by relevance to your query:

ctx = cache.get("our leave policy", related=3)
for r in ctx.related:
    r.unit_id, r.relation, r.understanding   # "shared_entity" | "shared_source"

This is light, lazy multi-hop — enough for cross-document reuse, not a graph engine.

Agent affordances

For agent loops, escalation is also explicit:

cache.drill(ctx.unit_id)            # the full raw evidence behind a unit
cache.widen("our leave policy")     # a fresh retrieval for a query

Next