Caching: Where, What, When
The single highest-leverage tool in performance work — and the easiest place to introduce subtle bugs.
Why caching works
Most workloads have skew: a small fraction of items account for most of the traffic. A cache stores those hot items closer to the requester or in faster storage. Hits are nearly free; misses fall through to the original source.
Even a 90% hit rate divides the load on your origin by 10. That's the leverage.
Where to put a cache
- Browser cache — closest to the user. Free. Controlled by HTTP headers like Cache-Control and ETag.
- CDN — caches static assets (JS, CSS, images) and cacheable HTML at edge POPs around the world. Cuts the round trip to your origin from 200ms to 20ms.
- Reverse proxy cache (Nginx, Varnish) — same datacenter as the app, in front of the app servers.
- Application cache (in-process) — a Map/dict in the app server's memory. Fastest but per-instance, doesn't survive restarts.
- Distributed cache (Redis, Memcached) — shared across app instances; survives restarts; can hold gigabytes.
- Database query cache / materialized views — cache the result of an expensive query.
Cache write strategies
- Cache-aside (lazy loading) — app reads from cache; if miss, read DB, populate cache. The most common pattern.
- Read-through — cache itself loads from the DB on miss. Hides the DB behind the cache.
- Write-through — every write goes through the cache, then to the DB. Cache is always in sync.
- Write-behind — write to cache first, flush to DB asynchronously. Fast writes, risk of data loss on cache failure.
- Write-around — writes go directly to the DB, bypassing the cache. Cache only fills on subsequent reads.
Eviction policies
A cache is finite. When it's full and you need to add an entry, you must evict an existing one. The choice of WHICH to evict is the eviction policy.
- LRU — Least Recently Used. The default in most caches. Works well when access patterns have temporal locality.
- LFU — Least Frequently Used. Better when some items are persistently popular but accessed irregularly.
- FIFO — first in, first out. Rarely the best choice but simple.
- TTL (time-to-live) — every entry expires after a fixed duration. Often combined with LRU.
Cache invalidation
There's a famous quote attributed to Phil Karlton: \"There are only two hard things in computer science: cache invalidation and naming things.\" When the source data changes, your cache becomes stale. The strategies:
- Short TTLs — accept staleness up to T seconds. Easiest, works for most things.
- Explicit invalidation on writes — when you update X, also delete cache[X]. Reliable for single-value caches; tricky for multi-key views.
- Versioned keys — embed a version in the cache key. Bumping the version invalidates everything atomically.
The thundering herd
When a hot cache key expires, every concurrent request misses simultaneously and stampedes the database. Defenses: stagger TTLs, use a mutex so only one fetch repopulates while others wait, or refresh-ahead (re-fetch slightly before expiry).