System Design25 min readIntermediate

Caching: Where, What, When

The single highest-leverage tool in performance work — and the easiest place to introduce subtle bugs.

Why caching works

Most workloads have skew: a small fraction of items account for most of the traffic. A cache stores those hot items closer to the requester or in faster storage. Hits are nearly free; misses fall through to the original source.

Even a 90% hit rate divides the load on your origin by 10. That's the leverage.

🧠
Real-life analogy — Like your short-term memory
When you read a friend's phone number, you keep it in your head for a minute or two — that's the cache. After a while, you forget (eviction). If you needed it again, you'd have to look it up in your phone (the database). Speed matters; memory is finite; staleness happens.
LRU cache (capacity 4)
On each access, the key moves to the FRONT (most recently used). When full, the back element is evicted.
MRU →
A
← LRU (next to evict)
AMISSGET A → MISS, insert A

Where to put a cache

  • Browser cache — closest to the user. Free. Controlled by HTTP headers like Cache-Control and ETag.
  • CDN — caches static assets (JS, CSS, images) and cacheable HTML at edge POPs around the world. Cuts the round trip to your origin from 200ms to 20ms.
  • Reverse proxy cache (Nginx, Varnish) — same datacenter as the app, in front of the app servers.
  • Application cache (in-process) — a Map/dict in the app server's memory. Fastest but per-instance, doesn't survive restarts.
  • Distributed cache (Redis, Memcached) — shared across app instances; survives restarts; can hold gigabytes.
  • Database query cache / materialized views — cache the result of an expensive query.

Cache write strategies

  • Cache-aside (lazy loading) — app reads from cache; if miss, read DB, populate cache. The most common pattern.
  • Read-through — cache itself loads from the DB on miss. Hides the DB behind the cache.
  • Write-through — every write goes through the cache, then to the DB. Cache is always in sync.
  • Write-behind — write to cache first, flush to DB asynchronously. Fast writes, risk of data loss on cache failure.
  • Write-around — writes go directly to the DB, bypassing the cache. Cache only fills on subsequent reads.

Eviction policies

A cache is finite. When it's full and you need to add an entry, you must evict an existing one. The choice of WHICH to evict is the eviction policy.

  • LRU — Least Recently Used. The default in most caches. Works well when access patterns have temporal locality.
  • LFU — Least Frequently Used. Better when some items are persistently popular but accessed irregularly.
  • FIFO — first in, first out. Rarely the best choice but simple.
  • TTL (time-to-live) — every entry expires after a fixed duration. Often combined with LRU.

Cache invalidation

There's a famous quote attributed to Phil Karlton: \"There are only two hard things in computer science: cache invalidation and naming things.\" When the source data changes, your cache becomes stale. The strategies:

  • Short TTLs — accept staleness up to T seconds. Easiest, works for most things.
  • Explicit invalidation on writes — when you update X, also delete cache[X]. Reliable for single-value caches; tricky for multi-key views.
  • Versioned keys — embed a version in the cache key. Bumping the version invalidates everything atomically.

The thundering herd

When a hot cache key expires, every concurrent request misses simultaneously and stampedes the database. Defenses: stagger TTLs, use a mutex so only one fetch repopulates while others wait, or refresh-ahead (re-fetch slightly before expiry).

💡 Tip
When in doubt, START WITHOUT A CACHE. Add caching only after measuring; many performance problems are solved by a missing index, not a cache.