How it works

The whole system in plain words — what each piece is, and one diagram of the loop.

Coalent has a small number of moving parts. Let's meet them in plain language, then see how they fit together.

The four pieces

Retriever — how Coalent fetches your information. You point it at wherever your knowledge lives — a vector database, a set of documents, an API, or a tool. Given a question, it returns the relevant raw pieces (each one is a Chunk). This is the only part most teams write, and it's a single method. → The Retriever

Synthesizer — how raw information becomes understanding. It takes those raw chunks and produces a short, structured understanding — a summary plus the key facts and claims — using your LLM. It also notes which sources it actually used. → The Synthesizer

Cognition unit — one cached understanding. The result of the above: a small briefing for one kind of question. It keeps the understanding, the raw evidence it was built from, and its sources (so Coalent knows what it depends on).

Invalidation — keeping it correct. When one of those sources changes, Coalent marks only the cognition units that used it as stale. They rebuild on the next read — nothing else is touched. → Provenance & freshness

The big picture

You askcache.get("why is the search service slow?")
CognitiveCache
  1. 1
    Embed & match
    Embed the query; reuse a cached unit when the meaning matches.
  2. 2
    Retriever
    On a miss, fetch the evidence for the query (your retrieval).
    your vector DB · GraphRAG · tools · APIs
  3. 3
    Synthesizer
    Build structured understanding and cite the sources it used.
    your LLM
  4. 4
    Cache the unit
    Store understanding + retained raw + provenance.
    memory · SQLite
  5. 5
    Return
    Hand back the minimum decision-relevant context (raw on tap).
Source changeingestion · webhook · CDC · TTL revalidate

A change marks only the cognition units that used that source stale (via provenance). The next matching read rebuilds just those, lazily — nothing else.

  • First time you ask (a miss): Coalent retrieves, synthesizes a cognition unit, and caches it.
  • Ask something similar (a hit): it serves the cached understanding instantly — matched by meaning, so a rephrase still hits.
  • A source changes: only the units that used it rebuild, lazily, on the next read.

Matched by meaning

You never label your questions or configure routing. Coalent matches a question to cached understanding by its embedding — so "how much annual leave?" and "what's the leave allowance?" land on the same unit. The cache organizes itself around what questions mean.

Decision-ready, not a data dump

A cognition unit isn't a pile of chunks — it's distilled, structured understanding, and on each read Coalent hands back the minimum slice relevant to your question (with the raw always reachable if the model needs a specific detail). Less noise to the LLM means better answers and fewer tokens. → Context intelligence

Next