Example — End-to-end

A complete, runnable app — retrieve, cache, read (cold → warm), invalidate by source, and persist across a restart.

This is a single paste-and-run script (no API key, no services) that exercises the whole loop: build, semantic hit, source change → surgical invalidation, and durable persistence. Swap the in-memory pieces for your real ones and nothing else changes.

The script

from coalent import (
    SemanticCache, InMemoryRetriever, StubSynthesizer, SQLiteCognitionStore,
)

# 1. Sources. In production this is your vector DB / tools — here, in-memory.
#    artifact_id is the document's natural id (derived from the source, not hardcoded).
def doc_id(page) -> str:
    return f"confluence:{page['id']}"

pages = [
    {"id": "98231", "body": "Leave policy: full-time staff get 21 days of annual leave."},
    {"id": "44120", "body": "Remote work: up to 3 days per week with manager approval."},
]

retriever = InMemoryRetriever()
for page in pages:
    retriever.add(doc_id(page), page["body"], version="1")

# 2. The cache — durable, so it survives a restart.
store = SQLiteCognitionStore("coalent.db")
cache = SemanticCache(retriever, StubSynthesizer(), store=store)

# 3. Cold read — builds and caches a unit.
ctx = cache.get("how much annual leave do we get?")
print("hit:", ctx.cache_hit)        # False
print("raw:", ctx.raw_text)         # has "21 days"

# 4. Warm read — a rephrase hits the same unit by meaning.
print("hit:", cache.get("what is the leave allowance?").cache_hit)   # True

# 5. A source changes — derive the SAME id and invalidate.
changed_page = {"id": "98231", "body": "Leave policy: now 25 days of annual leave."}
retriever.add(doc_id(changed_page), changed_page["body"], version="2")   # re-ingest
result = cache.source_changed(doc_id(changed_page), text=changed_page["body"])
print("dirtied:", result.dirtied)   # the unit that used confluence:98231

# 6. Next read rebuilds just that unit — fresh.
fresh = cache.get("how much annual leave do we get?")
print("hit:", fresh.cache_hit, "| raw:", fresh.raw_text)   # rebuilt, now "25 days"

print(cache.stats())                # {'units': 1, 'tracked_artifacts': 2}

What just happened

  1. Sources carry natural ids (confluence:<id>), derived by doc_id() — never literals.
  2. The cache is backed by SQLite, so units persist and the invalidation graph rebuilds on restart.
  3. A cold read builds a unit; a rephrase is a semantic hit (no rebuild).
  4. A source change dirties only the unit that used it; the next read rebuilds just that one, with the new value.

Persistence across a restart

Run the script again (or in a new process pointing at coalent.db) and step 3 is a hit from disk — and source_changed still finds and dirties the right unit, because the indexes rebuilt on load.

Taking it to production

Swap one piece at a time — the rest is unchanged:

Next