Skip to content

Caching Strategy

Category: Architecture · Areas: data, api

Description

Category

architecture

Areas

data, api

Boundary

This concern owns the deliberate use of a cache — the read/write pattern (cache-aside / read-through / write-through / write-behind), the invalidation and TTL policy, the consistency/staleness trade-off a cached read accepts, the protection against stampede / thundering-herd, and the explicit decision of what must not be cached. It owns how a copy of data is kept closer/faster and kept correct-enough. It does not own the performance target the cache serves, the failure-handling discipline a cache may participate in, or the store the cache sits in front of. Three neighbors must stay distinct:

  • The performance NFR (a requirement in the PRD, not a concern) owns the target — the latency/throughput budget the system must meet. Caching is one means to that end, not the end itself. The NFR says p95 read latency ≤ X / sustain Y req/s; this concern says a read-through cache with a Z-second TTL is how we hit it, accepting up-to-Z staleness. Reference the NFR as the thing being satisfied; do not restate the target here, and do not add a cache with no NFR to point at (that is premature caching — see Drift Signals).
  • resilience owns graceful degradation and failure isolation — timeouts, retries, circuit breakers, bulkheads, fallbacks. A cache can act as a fallback (serve last-known-good when the origin is down), and a cache miss storm can threaten resilience (stampede onto a struggling origin), so the two compose — but caching is not resilience. This concern owns the cache’s read/write/invalidation behavior; whether a stale-serve-on-origin-failure is an accepted degradation mode is a resilience decision. Name the overlap; do not fold failure-handling policy in here.
  • relational-data-modeling (and the datastore slot it runs on) owns the system of record — the authoritative store the cache sits in front of. The cache holds a derived, disposable copy; the row in the store is the truth. This concern never makes the cache authoritative; on any doubt the origin wins. Reference the store as the source of truth; do not own the schema.

This concern owns the one thing those do not state: a cache is a deliberate, bounded copy with an explicit invalidation/TTL policy and a named staleness budget — every cache has an answer for how it goes stale, how it is invalidated, what happens on a miss storm, and what is too correctness-sensitive to cache at all.

Components

  • Cache-aside (lazy) — the application checks the cache, and on a miss loads from the origin and populates the cache itself. The cache holds only requested keys; the application owns read-population and invalidation. The common default; the miss path is where stampede risk lives.
  • Read-through — reads go through the cache, which loads from the origin on a miss transparently. The cache (or its client library) owns population, so the application’s read path is uniform. Behaviorally close to cache-aside; the difference is who populates.
  • Write-through — writes go through the cache to the origin synchronously; the cache is updated in the same operation as the store. Reads after a write see fresh data; write latency includes both hops. Strong cache↔store consistency, slower writes.
  • Write-behind (write-back) — writes hit the cache and are flushed to the origin asynchronously later. Fast writes, batched origin load — at the cost of a durability/consistency window where the cache holds writes the store has not yet (data loss risk if the cache dies before flush). The highest-consistency-cost pattern; select it only with that window understood.
  • TTL (time-to-live) — an expiry on each entry that bounds staleness without explicit invalidation: the entry is treated as gone after its TTL and reloaded. The simplest staleness control; the TTL value is the staleness budget.
  • Explicit invalidation — evicting or updating a key when the underlying data changes (on write, or via an event). More precise than TTL but requires knowing every write path that affects the key — the hard part of caching (“there are only two hard things…”).
  • Stampede / thundering-herd protection — preventing many concurrent misses for the same hot key from all hitting the origin at once (on expiry or cold start). Mitigations: single-flight / request coalescing (one loader, others wait), early/probabilistic refresh (refresh before expiry), jittered TTLs (spread expiries), and a brief negative cache for known-absent keys.
  • What NOT to cache — data that must be always-fresh (authorization decisions, balances/limits used to gate an action, anything where a stale read causes an incorrect side effect), per-request unique data with no reuse (caching it just wastes memory), and highly volatile data whose TTL would be so short the cache never pays back.

Constraints

A cache serves a stated performance NFR — no premature caching

  • A cache is added to meet a named performance target (a latency/throughput NFR in the PRD) against a real read-heavy hot path or an expensive computation. The decision records which path/computation and which NFR it serves.
  • Caching where load is trivial or no NFR is at risk is premature optimization and drift — it adds an invalidation/staleness problem for no measured gain. The cache must point at a target; absent one, do not cache (KISS/YAGNI).

Every cache has an explicit invalidation/TTL and staleness policy

  • Each cached dataset has a stated invalidation policy — a TTL, explicit invalidation on write, or both — and a named staleness budget: the maximum staleness a read may serve, justified as acceptable for that data.
  • The consistency trade-off is explicit: the read/write pattern chosen (cache-aside / read-through / write-through / write-behind) implies a consistency level, and that level is recorded against the data’s tolerance for staleness. Write-behind’s durability/consistency window in particular is named and accepted, never stumbled into.

Correctness-sensitive reads are not cached behind a stale copy

  • Data where a stale read causes an incorrect decision or side effect — authorization/permission checks, balances or quotas that gate an action, anything requiring read-your-write correctness — is not served from a cache that can be stale. Either it is not cached, or it uses a strongly-consistent pattern (write-through with synchronous invalidation) whose freshness is proven.
  • The cache is never the system of record: it holds a derived, disposable copy, and on eviction/failure the origin is the truth. Nothing of record lives only in the cache.

Hot keys are protected from stampede

  • A hot key’s expiry or a cold start does not let many concurrent misses dogpile the origin. A stampede mitigation is in place for hot keys — single-flight/coalescing, early/probabilistic refresh, jittered TTLs, and/or a negative cache for known-absent keys — so cache behavior does not become an origin-overload (and, by extension, a resilience) problem.

Drift Signals (anti-patterns to reject in review)

  • A cache added with no performance NFR it serves and no measured hot path → premature caching; remove it or tie it to a target (point at the NFR)
  • A cache with no invalidation policy and no TTL (entries never go stale or never refresh correctly) → state a TTL and/or explicit invalidation + a staleness budget
  • Stale-read-sensitive data (authz decision, balance/quota gating an action, read-your-write requirement) served from a cache that can be stale → do not cache it, or use a proven strongly-consistent pattern
  • The cache treated as the source of truth (data of record lives only in the cache; origin not authoritative on a miss/eviction) → the store is the truth; the cache is a disposable copy
  • Write-behind chosen with its durability/consistency window unstated / unaccepted (silent data-loss risk if the cache dies before flush) → name and accept the window, or choose write-through
  • A hot key with no stampede protection (every concurrent miss hits the origin) → add single-flight/coalescing, early refresh, jittered TTL, or a negative cache
  • Caching per-request-unique or trivially-cheap data → no reuse and no payback; do not cache
  • A cache positioned as the project’s resilience mechanism (rather than a performance means) → caching ≠ resilience; record the failure-handling decision under resilience and let the cache compose as an optional fallback

When to use

A product with a real read-heavy hot path or an expensive computation where some staleness is tolerable — a frequently-read dataset behind a clear latency/throughput NFR, an expensive aggregate/derived view recomputed on every request, a hot lookup that dominates load. High autonomy auto-selects this concern for such products (see workflows/references/concern-resolution.md). It is composable (no slot); areas: data, api scope its practices to the data-access and service layers. Compose with the performance NFR (the target the cache serves), resilience (the cache may act as a fallback; failure policy lives there), and relational-data-modeling / the datastore slot (the system of record the cache sits in front of).

Do NOT select it when correctness needs always-fresh reads (a domain dominated by authorization decisions, balances/quotas gating actions, or strict read-your-write requirements where a stale read is a bug), or when load is trivial and no performance NFR is at risk. Adding a cache there buys an invalidation/staleness problem for no gain — premature caching is a drift (KISS/YAGNI).

Artifact Impact

Selecting this concern requires these artifacts to change (a selected concern absent from them is drift):

  • ADR: cache pattern + invalidation/TTL policy + named staleness budget + consistency trade-off + what is not cached
  • TD: read/write pattern (cache-aside/read-through/write-through/write-behind), stampede protection on hot keys
  • TEST_PLAN: invalidation/staleness behavior + stampede protection (single-flight) on a hot key

ADR References

Record an ADR when introducing a cache: the read-heavy hot path or expensive computation and the performance NFR it serves; the pattern chosen (cache-aside / read-through / write-through / write-behind) and why; the invalidation policy + TTL and the named staleness budget the data tolerates; the consistency trade-off accepted (and, for write-behind, the durability/consistency window); the stampede protection for hot keys; and what is deliberately not cached because it must stay fresh. A material uncertainty about whether the hot path truly needs a cache to hit the NFR is a tech-spike (measure first), not a silent assumption (see workflows/references/concern-resolution.md).

Practices by activity

Agents working in any of these activities inherit the practices below via the bead’s context digest.

These practices govern the deliberate use of a cache — its read/write pattern, invalidation/TTL policy, staleness budget, stampede protection, and the explicit decision of what not to cache. They do not restate the performance target (that is a PRD performance NFR — the cache serves it), the failure-handling policy a cache may participate in (resilience), or the schema of the store the cache fronts (relational-data-modeling / the datastore slot). A cache is always a derived, disposable copy; the origin is the truth.

Discover

  • Add a cache only to meet a named performance NFR against a real read-heavy hot path or an expensive computation. Record which path / computation and which NFR the cache serves. No NFR and no measured hot path means no cache — premature caching buys an invalidation/staleness problem for no gain (KISS/YAGNI).
  • Before caching an expensive computation, confirm the cost is real (measured), not assumed; a material uncertainty is a tech-spike, not a silent cache.

Frame

  • Choose the read/write pattern deliberately and record why in the ADR:
    • cache-aside (lazy) / read-through for read-heavy data tolerant of bounded staleness (the difference is who populates on a miss);
    • write-through when reads after a write must be fresh (strong cache↔store consistency, slower writes);
    • write-behind only when write latency dominates AND the durability/consistency window (the cache holds writes the store does not yet — data-loss risk if the cache dies before flush) is named and accepted.
  • Record the consistency trade-off the chosen pattern implies against the data’s tolerance for staleness — never stumble into write-behind’s window.
  • Record in the ADR what is deliberately not cached (and why it must stay fresh).

Design

  • Each cached dataset has an explicit invalidation policy — a TTL, explicit invalidation on write, or both — and a named staleness budget (the maximum staleness a read may serve), justified as acceptable for that data. The TTL value is the staleness budget when TTL is the only control.
  • When using explicit invalidation, identify every write path that affects a cached key and invalidate/update on each; a missed write path is a stale-read bug. Record the cache keys and their TTLs in the technical-design.
  • Data where a stale read causes an incorrect decision or side effect — authorization/permission checks, balances or quotas that gate an action, anything needing read-your-write — is not served from a cache that can be stale. Either it is not cached, or it uses a strongly-consistent pattern (write-through with synchronous invalidation) whose freshness is proven.
  • The cache is never the system of record: nothing of record lives only in the cache; on a miss, eviction, or cache failure the origin is the truth.

Build

  • For each hot key, ensure expiry or cold start does not dogpile the origin: apply single-flight / request coalescing (one loader, others wait), early / probabilistic refresh (refresh before expiry), jittered TTLs (spread expiries), and/or a brief negative cache for known-absent keys.
  • Treat a miss storm as a real failure mode: where serving stale-on-origin-down is desired, record that as a resilience degradation decision — the cache composes as a fallback, it is not itself the resilience mechanism.

Test

  • Every cache traces to a performance NFR and a real read-heavy hot path or expensive computation; no cache exists without a target it serves (no premature caching).
  • Each cached dataset has a stated invalidation policy (TTL and/or explicit invalidation) and a named staleness budget; no cache with entries that never go stale or never refresh.
  • The read/write pattern is recorded with its consistency trade-off; if write-behind, its durability/consistency window is named and accepted.
  • Correctness-sensitive reads (authz, balances/quotas gating an action, read-your-write) are not served from a stale-capable cache; verifiable that such data is uncached or strongly-consistent.
  • The cache is not the system of record — on miss/eviction/failure the origin is authoritative; nothing of record lives only in the cache.
  • Hot keys have stampede protection (single-flight, early refresh, jittered TTL, or negative cache); a hot-key expiry does not dogpile the origin.
  • What is deliberately not cached (and why it must stay fresh) is recorded in the ADR.

Cross-cutting

Boundary with neighbors

See concern.md for the canonical Boundary (vs the performance NFR, resilience, relational-data-modeling / datastore). The cache is a performance/staleness mechanism — defer failure-handling policy to resilience and the schema of the system of record to the data-modeling / store neighbors.