Example — Vector search

Integrate a vector DB end-to-end — define your own retrieve(), derive artifact_ids dynamically, and invalidate on re-ingestion.

The most common setup: your documents are chunked into a vector DB, and you want cached, always-fresh understanding on top. Coalent sits above your retrieval — you keep your vector DB and wire it in through one method.

1. Ingestion — stamp natural ids

When you embed and upsert, store each document's natural id and revision in the chunk metadata. These become the artifact_id and version Coalent tracks. (Here, a Confluence page id and version — but any stable id works.)

def ingest(page):
    artifact_id = f"confluence:{page.id}"          # derived from the source
    for chunk_text in chunk(page.body):
        qdrant.upsert(points=[Point(
            vector=embed(chunk_text),
            payload={"artifact_id": artifact_id, "version": str(page.version), "text": chunk_text},
        )])

2. Wire the cache with the shipped adapter

Because step 1 stored artifact_id, version, and text in the payload — the adapter's default field names — the shipped QdrantRetriever maps each hit automatically. No custom class needed:

from coalent import SemanticCache, QdrantRetriever, LLMSynthesizer, OpenAIProvider

cache = SemanticCache(
    QdrantRetriever(client=qdrant, collection="docs", embed=embed),
    LLMSynthesizer(OpenAIProvider(), model="gpt-4o-mini"),
)

ctx = cache.get("what is our parental leave policy?")
print(ctx.context)     # decision-ready, minimum context
print(ctx.raw_text)    # the retrieved source passages

Need hybrid search or reranking? Pass your own search= callable — the adapter is pass-through. For a vector DB without a shipped adapter, extend BaseVectorRetriever.

4. Stay fresh — invalidate from the same place you ingest

A vector DB has no change feed, so the re-ingestion job is the natural trigger. Derive the same artifact_id from the page and fire the change:

def reingest(page):
    artifact_id = f"confluence:{page.id}"          # SAME derivation as ingest()
    ingest(page)                                    # re-embed + upsert
    cache.source_changed(artifact_id, text=page.body)   # only units that used it rebuild

Because step 1 and step 4 derive artifact_id identically from the page, invalidation lines up automatically — no id to track by hand.

i

Use the document id as artifact_id (not the per-chunk point uuid), so all chunks of a page share one id and one change invalidates the unit cleanly.

Next