A Synthesizer turns raw chunks into understanding. Where the Retriever fetches the source material, the Synthesizer reads it and writes a short, structured briefing — a summary plus the key facts and claims — using your LLM. It also notes which chunks it actually used; that record is the lineage that drives surgical freshness.
The contract
def synthesize(self, query: str, chunks: list[Chunk]) -> Synthesis:
...
Synthesis carries the understanding, the indices of the chunks it used, and an ok flag:
from coalent import Synthesis
Synthesis(
understanding={"summary": "...", "claims": [...], "facts": {...}},
used=[0, 2], # only these chunks become provenance -> precise invalidation
ok=True, # False = degrade, don't cache fabricated content
)
LLMSynthesizer (recommended)
The built-in choice for real understanding. It owns the envelope: it presents the candidate sources, requires strict JSON, and reads the model's own used citations to build precise provenance.
from coalent import SemanticCache, LLMSynthesizer, OpenAIProvider
cache = SemanticCache(retriever, LLMSynthesizer(OpenAIProvider(), model="gpt-4o-mini"))
What you get:
- Structured understanding —
summary,claims,entities,facts. - Precise provenance — only sources the model cited can invalidate the unit (a change to a retrieved-but-uncited source touches nothing).
- No garbage — a parse failure degrades (
ok=False): the unit keeps its raw evidence and is flagged, never serving fabricated text.
Your prompt, our envelope
You decide what understanding to produce; Coalent always wraps it with the sources, a strict-JSON contract, and the citation list. Pass your own instruction (a string, or a query -> str function) and the fields you want back:
synth = LLMSynthesizer(
OpenAIProvider(),
instruction=(
"Summarize the incident: likely root cause, blast radius, and the next "
"action to take. Be specific and cite the runbook steps you used."
),
fields=["summary", "root_cause", "blast_radius", "next_action"],
)
Coalent still injects the candidate sources and requires the model to cite the sources it relied on — so provenance is captured no matter what you ask for. The built-in default produces a general summary / claims / entities / facts; override it to fit your domain.
Providers
A provider is a thin generate(*, model, system, user, max_tokens, temperature) -> str:
| Provider | Use | Notes |
|---|---|---|
StubProvider | dev & tests | deterministic, no network |
OpenAIProvider | production | coalent[openai], reads OPENAI_API_KEY |
AnthropicProvider | production | coalent[anthropic], reads ANTHROPIC_API_KEY |
from coalent import LLMSynthesizer, AnthropicProvider
synth = LLMSynthesizer(AnthropicProvider(), model="claude-haiku-4-5")
JSONPassthroughSynthesizer — structured data, no LLM
When your source is already structured — a REST/MCP tool returning JSON — there's nothing to summarize. JSONPassthroughSynthesizer treats the JSON as the understanding: no model call, no latency, no key. Objects become facts, other values become claims, and every chunk is cited so provenance and freshness still work exactly as with an LLM.
from coalent import SemanticCache, JSONPassthroughSynthesizer
cache = SemanticCache(tool_retriever, JSONPassthroughSynthesizer())
# a result like {"employee": "A", "annual_leave": 12} is cached as-is —
# and invalidated like a document when the tool result changes.
Reach for it when the data is the answer; use LLMSynthesizer when raw text needs to be understood. See the MCP & tool results example.
StubSynthesizer (no key)
For dev and tests, StubSynthesizer() returns the same structured shape deterministically — so the whole loop runs without an API key (the detail lives in the retained raw).
Custom synthesizers
Any object with synthesize(query, chunks) -> Synthesis works — wrap a local model, or return used=[i for i, _ in enumerate(chunks)] to cite everything. Set ok=False on failure to make Coalent degrade instead of caching bad output.
Next
- Context intelligence — how the structured understanding is projected to minimum context.
- Provenance & freshness — what the citations drive.