This walks through the full loop with zero services and no API key (the in-memory retriever + stub synthesizer). The shape is identical when you swap in a real vector DB and LLM.

1. Create a cache

Coalent needs two things: a retriever (where your sources live) and a synthesizer (how raw becomes understanding).

from coalent import SemanticCache, InMemoryRetriever, StubSynthesizer

retriever = InMemoryRetriever()
retriever.add("confluence:98231", "Leave policy: full-time staff get 21 days of annual leave.")
retriever.add("confluence:44120", "Remote work: up to 3 days per week with manager approval.")

cache = SemanticCache(retriever, StubSynthesizer())

2. Get understanding

There is one read method: get() — just pass the question.

ctx = cache.get("how much annual leave do we get?")

print(ctx.cache_hit)             # False the first time (cold build)
print(ctx.understanding["claims"])  # atomic, source-grounded claims — the substance, not a prose blob
print(ctx.context)               # the minimum decision-relevant payload
print(ctx.raw_text)              # the retained raw evidence — has the "21 days"

Ask it again, phrased differently, and it's a semantic hit — same meaning, served from cache:

again = cache.get("what is the leave allowance?")
print(again.cache_hit)  # True — matched by meaning, not by keywords

3. Keep it fresh

When a source changes, Coalent marks only the units that used it stale. The artifact_id is the source's natural id — the same one your retriever stamped.

result = cache.source_changed("confluence:98231", text="Leave policy: now 25 days of annual leave.")
print(result.dirtied)   # the units that used confluence:98231

The next matching get() rebuilds that unit lazily — everything else stays cached.

You rarely hand-type source_changed. In production it's wired to your ingestion pipeline, a webhook/CDC feed, or a FreshnessPolicy for feed-less APIs. The id always comes from the source itself — see the examples.

4. Inspect

print(cache.stats())    # {'units': 1, 'tracked_artifacts': 2}

5. Go real — extractive claims + multi-hop (v0.4)

Swap the stub for a real LLM and the loop is identical — but the understanding gets sharper. LLMSynthesizer now builds extractive understanding by default (extract=True): a query-independent list of atomic, source-grounded claims, so one unit answers many later questions and keeps every number (understanding["summary"] may be terse or empty — the claims are the substance).

from coalent import SemanticCache, LLMSynthesizer, OpenAIProvider

cache = SemanticCache(retriever, LLMSynthesizer(OpenAIProvider()))

ctx = cache.get("how much annual leave do we get?")
print(ctx.understanding["claims"])   # ['Full-time staff get 21 days of annual leave.', ...]

✦

Both v0.4 defaults are on: extract=True (claims, not a question-shaped prose summary) and cross_unit_recall=True (bridge facts pooled across units). One line back to v0.3 behaviour: LLMSynthesizer(OpenAIProvider(), extract=False) and SemanticCache(..., cross_unit_recall=False).

Multi-hop answers

When the matched unit under-covers a query, the cache pools per-claim memory across all fresh units (MaxSim) and surfaces the bridge facts — answering multi-hop questions naive retrieval structurally can't, at zero extra LLM calls:

cache = SemanticCache(
    retriever,
    LLMSynthesizer(OpenAIProvider()),
    preset="multi_hop",     # v0.5: arms recall + the hop-2 bridge with calibrated thresholds
)

ans = cache.get("can a remote employee still take their full annual leave?")
for r in ans.recalled:      # RecalledClaim(claim, score, unit_id) — claims pulled from OTHER units
    print(r.unit_id, round(r.score, 2), r.claim)

New in v0.5 preset="multi_hop" is the one-argument way to arm this — it turns on cross-unit recall and the hop-2 bridge with calibrated thresholds, replacing the old manual recall_threshold=0.7 guidance (explicit kwargs still override the preset). Also new in v0.5: serve="pool" — an experimental preview of the next read path that serves a token-budgeted pool of globally-ranked fresh claims instead of one anchored unit. See Context intelligence.

Where to go next

How it works — one diagram, the whole system.
The Retriever — connect your real data.
The Synthesizer — swap the stub for a real LLM.
Examples — vector search, MCP/tools, Confluence/Jira.