System Design•25 min read•Intermediate

Caching: Where, What, When

The single highest-leverage tool in performance work — and the easiest place to introduce subtle bugs.

Why caching works

Most workloads have skew: a small fraction of items account for most of the traffic. A cache stores those hot items closer to the requester or in faster storage. Hits are nearly free; misses fall through to the original source.

Even a 90% hit rate divides the load on your origin by 10. That's the leverage.

🧠

Real-life analogy — Like your short-term memory

When you read a friend's phone number, you keep it in your head for a minute or two — that's the cache. After a while, you forget (eviction). If you needed it again, you'd have to look it up in your phone (the database). Speed matters; memory is finite; staleness happens.

LRU cache (capacity 4)

On each access, the key moves to the FRONT (most recently used). When full, the back element is evicted.

MRU →

← LRU (next to evict)

AMISSGET A → MISS, insert A

Where to put a cache

Browser cache — closest to the user. Free. Controlled by HTTP headers like Cache-Control and ETag.
CDN — caches static assets (JS, CSS, images) and cacheable HTML at edge POPs around the world. Cuts the round trip to your origin from 200ms to 20ms.
Reverse proxy cache (Nginx, Varnish) — same datacenter as the app, in front of the app servers.
Application cache (in-process) — a Map/dict in the app server's memory. Fastest but per-instance, doesn't survive restarts.
Distributed cache (Redis, Memcached) — shared across app instances; survives restarts; can hold gigabytes.
Database query cache / materialized views — cache the result of an expensive query.

Cache write strategies

Cache-aside (lazy loading) — app reads from cache; if miss, read DB, populate cache. The most common pattern.
Read-through — cache itself loads from the DB on miss. Hides the DB behind the cache.
Write-through — every write goes through the cache, then to the DB. Cache is always in sync.
Write-behind — write to cache first, flush to DB asynchronously. Fast writes, risk of data loss on cache failure.
Write-around — writes go directly to the DB, bypassing the cache. Cache only fills on subsequent reads.

Eviction policies

A cache is finite. When it's full and you need to add an entry, you must evict an existing one. The choice of WHICH to evict is the eviction policy.

LRU — Least Recently Used. The default in most caches. Works well when access patterns have temporal locality.
LFU — Least Frequently Used. Better when some items are persistently popular but accessed irregularly.
FIFO — first in, first out. Rarely the best choice but simple.
TTL (time-to-live) — every entry expires after a fixed duration. Often combined with LRU.

Cache invalidation

There's a famous quote attributed to Phil Karlton: \"There are only two hard things in computer science: cache invalidation and naming things.\" When the source data changes, your cache becomes stale. The strategies:

Short TTLs — accept staleness up to T seconds. Easiest, works for most things.
Explicit invalidation on writes — when you update X, also delete cache[X]. Reliable for single-value caches; tricky for multi-key views.
Versioned keys — embed a version in the cache key. Bumping the version invalidates everything atomically.

The thundering herd

When a hot cache key expires, every concurrent request misses simultaneously and stampedes the database. Defenses: stagger TTLs, use a mutex so only one fetch repopulates while others wait, or refresh-ahead (re-fetch slightly before expiry).

💡 Tip

When in doubt, START WITHOUT A CACHE. Add caching only after measuring; many performance problems are solved by a missing index, not a cache.

← Previous

Scaling: Vertical, Horizontal, and Sharding

Databases: SQL vs NoSQL, Indexes, Replication