This is what makes Coalent more than a semantic answer cache: it knows exactly which cached understanding a source change affects, and rebuilds only that.
Key idea. When the synthesizer builds a unit, it records the exact sources it used (cited). A change to a source marks only the units whose provenance includes it. Surgical, not timer-based.
The one rule: matching artifact_ids
artifact_id is the source's natural id — a document id, row key, URL, or tool+args identity. The contract is simple:
The
artifact_idyour Retriever stamps on aChunkmust equal theartifact_idyou fire on change.
Since both are derived from the same source, they line up on their own — never hardcode it. If a change matches nothing, Coalent logs a warning and reports matched_units == 0, so the mistake is caught immediately.
Reporting a change
result = cache.source_changed("confluence:98231", text=new_text)
result.dirtied # units re-marked stale
result.skipped_unchanged # tracked, but content didn't actually change
Pass the new text (Coalent hashes it) or a version. If the content is byte-identical, the unit is not dirtied — the no-op is skipped (content_hash earns its keep).
Three ways a change reaches the cache
Different sources expose change differently. Pick what fits:
| Source | How change arrives |
|---|---|
| Docs in a vector DB (no feed) | your re-ingestion job calls source_changed (it already knows what changed) |
| App-owned data (DB rows) | invalidate at the write site — right where you mutate |
| Webhook sources (Confluence, Jira, git) | a connector / EventDispatcher with sink=cache.invalidate |
| Feed-less APIs / tools | a FreshnessPolicy — TTL + revalidate-by-hash (below) |
Feed-less sources: FreshnessPolicy
When there's no change feed (a typical API or MCP tool), revalidate on a TTL:
from coalent import SemanticCache, FreshnessPolicy
def revalidate(artifact_id: str) -> tuple[str, str]:
text = refetch(artifact_id) # re-call the API / tool
return text, "" # (text, optional version)
cache = SemanticCache(
retriever, synthesizer,
freshness=FreshnessPolicy(max_age=300, revalidate=revalidate), # seconds
)
On a read older than max_age, Coalent re-fetches and hashes: unchanged → stays fresh, no rebuild; changed → re-materializes. With no revalidate, expiry conservatively rebuilds.
Deletions
cache.source_deleted("confluence:98231") # evicts units that depended on it
Optional: webhook connectors
The bundled event layer parses native payloads into change events and feeds them to the cache:
from coalent import EventDispatcher, JiraConnector
dispatcher = EventDispatcher(sink=cache.invalidate, connectors=[JiraConnector()])
dispatcher.dispatch("jira", webhook_payload) # emits jira:<key> -> invalidate
See the Confluence & Jira example for the full wiring.
Next
- The Retriever — where
artifact_ids are stamped. - Persistence — invalidation that survives a restart.