Coalent caches understanding. Your retriever can also cache retrieval — the two layers compose cleanly. A common pattern: inside retrieve(), check a fast cache first (an embedding/result cache or a warm vector mirror), and fall back to the real vector query on a miss.
The artifact_id is derived from the source document, so it's identical whether a chunk came from the cache or a live query — invalidation stays consistent across both paths.
A caching retriever
There's caching logic wrapped around the lookup, so we implement the Retriever interface directly — it's just a class with a retrieve method, no base class needed (Retriever is a structural protocol).
import time
from coalent import Chunk
class CachingRetriever:
"""Serve from a result cache when present, else do a live vector query."""
def __init__(self, vector_client, embed, *, ttl=60):
self._vector = vector_client
self._embed = embed
self._ttl = ttl
self._cache: dict[str, tuple[float, list[Chunk]]] = {}
def retrieve(self, query, *, namespace=None):
key = f"{namespace}:{query}"
hit = self._cache.get(key)
if hit and time.monotonic() - hit[0] < self._ttl:
return hit[1] # served from the vector/result cache
chunks = self._live_query(query) # fall back to the real query
self._cache[key] = (time.monotonic(), chunks)
return chunks
def _live_query(self, query):
response = self._vector.query_points(
collection_name="docs", query=self._embed(query), limit=6, with_payload=True,
)
return [
Chunk(
artifact_id=hit.payload["artifact_id"], # dynamic — from the document
text=hit.payload["text"],
version=str(hit.payload.get("version", "")),
)
for hit in response.points
]
Wire it
from coalent import SemanticCache, LLMSynthesizer, OpenAIProvider
cache = SemanticCache(
CachingRetriever(qdrant, embed, ttl=120),
LLMSynthesizer(OpenAIProvider()),
)
Two caches, one coherent story
- Your retrieval cache (inside
retrieve) saves embedding + vector-search cost on a miss path. - Coalent's cognition cache saves synthesis cost and keeps understanding fresh by provenance.
When a document changes, invalidate both with the same id:
def on_document_changed(doc):
artifact_id = f"confluence:{doc.id}"
retriever._cache.clear() # or evict just this doc's queries
cache.source_changed(artifact_id, text=doc.body)
Keep the retrieval cache's TTL short and let Coalent own correctness via provenance — the retrieval cache is a speed optimization, the cognition cache is the freshness guarantee.
Next
- MCP & tool results — caching volatile, feed-less sources.
- The Retriever — the contract you're implementing.