Provenance & freshness

The moat — how Coalent knows exactly what to forget, and the three ways a change reaches the cache.

This is what makes Coalent more than a semantic answer cache: it knows exactly which cached understanding a source change affects, and rebuilds only that.

Key idea. When the synthesizer builds a unit, it records the exact sources it used (cited). A change to a source marks only the units whose provenance includes it. Surgical, not timer-based.

The one rule: matching artifact_ids

artifact_id is the source's natural id — a document id, row key, URL, or tool+args identity. The contract is simple:

The artifact_id your Retriever stamps on a Chunk must equal the artifact_id you fire on change.

Since both are derived from the same source, they line up on their own — never hardcode it. If a change matches nothing, Coalent logs a warning and reports matched_units == 0, so the mistake is caught immediately.

Reporting a change

result = cache.source_changed("confluence:98231", text=new_text)
result.dirtied            # units re-marked stale
result.skipped_unchanged  # tracked, but content didn't actually change

Pass the new text (Coalent hashes it) or a version. If the content is byte-identical, the unit is not dirtied — the no-op is skipped (content_hash earns its keep).

Three ways a change reaches the cache

Different sources expose change differently. Pick what fits:

SourceHow change arrives
Docs in a vector DB (no feed)your re-ingestion job calls source_changed (it already knows what changed)
App-owned data (DB rows)invalidate at the write site — right where you mutate
Webhook sources (Confluence, Jira, git)a connector / EventDispatcher with sink=cache.invalidate
Feed-less APIs / toolsa FreshnessPolicy — TTL + revalidate-by-hash (below)

Feed-less sources: FreshnessPolicy

When there's no change feed (a typical API or MCP tool), revalidate on a TTL:

from coalent import SemanticCache, FreshnessPolicy

def revalidate(artifact_id: str) -> tuple[str, str]:
    text = refetch(artifact_id)        # re-call the API / tool
    return text, ""                    # (text, optional version)

cache = SemanticCache(
    retriever, synthesizer,
    freshness=FreshnessPolicy(max_age=300, revalidate=revalidate),  # seconds
)

On a read older than max_age, Coalent re-fetches and hashes: unchanged → stays fresh, no rebuild; changed → re-materializes. With no revalidate, expiry conservatively rebuilds.

Deletions

cache.source_deleted("confluence:98231")   # evicts units that depended on it

Optional: webhook connectors

The bundled event layer parses native payloads into change events and feeds them to the cache:

from coalent import EventDispatcher, JiraConnector

dispatcher = EventDispatcher(sink=cache.invalidate, connectors=[JiraConnector()])
dispatcher.dispatch("jira", webhook_payload)   # emits jira:<key> -> invalidate

See the Confluence & Jira example for the full wiring.

Next