Coalent has a small number of moving parts. Let's meet them in plain language, then see how they fit together.

The four pieces

Retriever — how Coalent fetches your information. You point it at wherever your knowledge lives — a vector database, a set of documents, an API, or a tool. Given a question, it returns the relevant raw pieces (each one is a Chunk). This is the only part most teams write, and it's a single method. → The Retriever

Synthesizer — how raw information becomes understanding. It takes those raw chunks and produces a structured understanding using your LLM. As of v0.4 that understanding is extractive: a query-independent list of atomic, source-grounded claims — not a question-shaped prose summary. Because it isn't written to answer one specific question, a single unit answers many later ones, and it keeps every number (prose summaries dropped ~40% of them in testing). It also notes which sources it actually used. → The Synthesizer

Cognition unit — one cached understanding. The result of the above: a small briefing built from one slice of your knowledge. It keeps the understanding — the list of claims — the raw evidence it was built from, and its sources (so Coalent knows what it depends on). Units are lightweight and independent, built lazily only when a query needs one — the opposite of paying to build a whole graph upfront.

Invalidation — keeping it correct. When one of those sources changes, Coalent marks only the cognition units that used it as stale. They rebuild on the next read — nothing else is touched. → Provenance & freshness

The big picture

You askcache.get("why is the search service slow?")

CognitiveCache

1
Embed & match
Embed the query; reuse a cached unit when the meaning matches.
2
Retriever
On a miss, fetch the evidence for the query (your retrieval).
your vector DB · GraphRAG · tools · APIs
3
Synthesizer
Build structured understanding and cite the sources it used.
your LLM
4
Cache the unit
Store understanding + retained raw + provenance.
memory · SQLite
5
Return
Hand back the minimum decision-relevant context (raw on tap).

⚡Source changeingestion · webhook · CDC · TTL revalidate

A change marks only the cognition units that used that source stale (via provenance). The next matching read rebuilds just those, lazily — nothing else.

First time you ask (a miss): Coalent retrieves, synthesizes a cognition unit, and caches it.
Ask something similar (a hit): it serves the cached understanding instantly — matched by meaning, so a rephrase still hits.
Ask something that spans two units (multi-hop): when the matched unit doesn't fully cover the question, Coalent pools the per-claim memory across all fresh units and surfaces the bridging facts — at zero extra LLM calls.
A source changes: only the units that used it rebuild, lazily, on the next read.

✦

The full read path is a ladder of gates — match, freshness, coverage, cross-unit recall, and the RAG floor — each one a knob with a plain cosine default. See The gate ladder for the exact firing order and settings.

Matched by meaning

You never label your questions or configure routing. Coalent matches a question to cached understanding by its embedding — so "how much annual leave?" and "what's the leave allowance?" land on the same unit. The cache organizes itself around what questions mean.

Multi-hop, for free

Some questions can't be answered from any single unit — the answer is split across two. On v0.4 Coalent handles these with cross-unit claim recall: when the matched unit under-covers the question, it pools the atomic claims from every fresh unit (a MaxSim over their per-claim memory) and surfaces the ones that bridge the gap. This answers multi-hop questions that naive top-k retrieval structurally can't — and it costs zero extra LLM calls. It stays dormant on single-hop questions and appears as result.recalled when it fires. On by default; set cross_unit_recall=False for exact v0.3 behaviour.

New in v0.5 preset="multi_hop" arms the full multi-hop setup — recall plus a hop-2 bridge — in one argument, with calibrated thresholds. And serve="pool" previews the v0.6 read path: instead of one anchored unit, the cache serves a token-budgeted pool of globally-ranked fresh claims, with stale units' claims masked the moment a source changes. See Context intelligence.

Decision-ready, not a data dump

A cognition unit isn't a pile of chunks — it's distilled, structured understanding, and on each read Coalent hands back the minimum slice relevant to your question (with the raw always reachable if the model needs a specific detail). Less noise to the LLM means better answers and fewer tokens. → Context intelligence

The Retriever — connect your data.
The Synthesizer — build understanding with your LLM.
The gate ladder — the full read path, gate by gate.
Examples — wire it to real systems.

How it works

The four pieces

The big picture

Matched by meaning

Multi-hop, for free

Decision-ready, not a data dump

Next