System Design Fundamentals
Latency, throughput, availability, durability, and the CAP theorem — the vocabulary you'll use forever.
What system design actually is
When you build a small app, every component runs on one machine: web server, database, files, cache. As traffic grows, no single machine is big enough — you must split the work across many machines. System design is the discipline of deciding HOW to split it: which pieces talk to which, where data lives, what happens when a piece fails.
System design interviews don't have a single right answer. They test how you reason about tradeoffs: latency vs cost, consistency vs availability, simplicity vs scale. The vocabulary on this page is the vocabulary you'll use to discuss those tradeoffs.
The four numbers that matter
- LATENCY — how long a single request takes to complete. Quoted as percentiles: p50 (median), p95, p99. The p99 is what your slowest 1% of users experience — and on a busy site, that's millions of users.
- THROUGHPUT — how many requests per second the system can handle. Often called QPS (queries per second) or RPS.
- AVAILABILITY — fraction of time the system is up. \"Three nines\" (99.9%) ≈ 8.76 hours of downtime per year. \"Five nines\" (99.999%) ≈ 5 minutes per year.
- DURABILITY — probability your stored data survives. Should always be much higher than availability — you can tolerate downtime, you can never tolerate losing data.
Latency numbers every engineer should know
- L1 cache reference: ~0.5 ns
- Main memory reference: ~100 ns
- Read 1 MB sequentially from memory: ~100 µs
- SSD random read: ~150 µs
- Round trip in the same data center: ~500 µs
- Disk seek: ~10 ms
- Round trip across continents: ~150 ms
Memorize these. The order-of-magnitude differences explain almost every system design choice — caching, locality, batching.
The CAP theorem
When a network partition splits your system in two, each side has to choose: serve requests with stale data (Available), or refuse requests until the partition heals (Consistent). You don't get both at the same time. That's CAP: Consistency + Availability + Partition tolerance — pick at most two, and since partitions WILL happen, you really pick C or A.
- CP systems — refuse to serve when uncertain. Banks, distributed databases like ZooKeeper.
- AP systems — serve, accept temporary inconsistency. Social media feeds, DNS, shopping carts.
PACELC — the more honest version
CAP only describes behavior during partitions. PACELC adds: even when there's no partition, you trade Latency for Consistency. Strong consistency means waiting for replicas to agree before responding. Eventual consistency means responding immediately and synchronizing in the background.
Consistency models, from strong to weak
- Linearizable — every operation appears to happen at a single instant; reads see the latest write. Most expensive.
- Sequential — every node sees operations in the same order, but not necessarily real time.
- Causal — operations causally related are seen in the same order; unrelated ones can disagree.
- Eventual — reads may be stale, but eventually all replicas converge if writes stop.
Estimation (back-of-the-envelope math)
Real interviewers want you to estimate, in your head, things like: \"If we have 100M daily active users posting 1 photo each, how much storage per year?\". Practice rounding to powers of ten, knowing seconds per day (86,400 ≈ 10⁵), and being explicit about assumptions.