System Design•25 min read•Intermediate

System Design Fundamentals

Latency, throughput, availability, durability, and the CAP theorem — the vocabulary you'll use forever.

What system design actually is

When you build a small app, every component runs on one machine: web server, database, files, cache. As traffic grows, no single machine is big enough — you must split the work across many machines. System design is the discipline of deciding HOW to split it: which pieces talk to which, where data lives, what happens when a piece fails.

Anatomy of an HTTP request

What actually happens between hitting Enter and seeing the page.

User types example.com — browser starts the request.

System design interviews don't have a single right answer. They test how you reason about tradeoffs: latency vs cost, consistency vs availability, simplicity vs scale. The vocabulary on this page is the vocabulary you'll use to discuss those tradeoffs.

The four numbers that matter

LATENCY — how long a single request takes to complete. Quoted as percentiles: p50 (median), p95, p99. The p99 is what your slowest 1% of users experience — and on a busy site, that's millions of users.
THROUGHPUT — how many requests per second the system can handle. Often called QPS (queries per second) or RPS.
AVAILABILITY — fraction of time the system is up. \"Three nines\" (99.9%) ≈ 8.76 hours of downtime per year. \"Five nines\" (99.999%) ≈ 5 minutes per year.
DURABILITY — probability your stored data survives. Should always be much higher than availability — you can tolerate downtime, you can never tolerate losing data.

Latency numbers every engineer should know

L1 cache reference: ~0.5 ns
Main memory reference: ~100 ns
Read 1 MB sequentially from memory: ~100 µs
SSD random read: ~150 µs
Round trip in the same data center: ~500 µs
Disk seek: ~10 ms
Round trip across continents: ~150 ms

Memorize these. The order-of-magnitude differences explain almost every system design choice — caching, locality, batching.

The CAP theorem

When a network partition splits your system in two, each side has to choose: serve requests with stale data (Available), or refuse requests until the partition heals (Consistent). You don't get both at the same time. That's CAP: Consistency + Availability + Partition tolerance — pick at most two, and since partitions WILL happen, you really pick C or A.

CP systems — refuse to serve when uncertain. Banks, distributed databases like ZooKeeper.
AP systems — serve, accept temporary inconsistency. Social media feeds, DNS, shopping carts.

PACELC — the more honest version

CAP only describes behavior during partitions. PACELC adds: even when there's no partition, you trade Latency for Consistency. Strong consistency means waiting for replicas to agree before responding. Eventual consistency means responding immediately and synchronizing in the background.

Consistency models, from strong to weak

Linearizable — every operation appears to happen at a single instant; reads see the latest write. Most expensive.
Sequential — every node sees operations in the same order, but not necessarily real time.
Causal — operations causally related are seen in the same order; unrelated ones can disagree.
Eventual — reads may be stale, but eventually all replicas converge if writes stop.

💡 Tip

Most apps don\u2019t need linearizability. Asking for strong consistency where you don\u2019t need it is the most common over-engineering mistake at scale.

Estimation (back-of-the-envelope math)

Real interviewers want you to estimate, in your head, things like: \"If we have 100M daily active users posting 1 photo each, how much storage per year?\". Practice rounding to powers of ten, knowing seconds per day (86,400 ≈ 10⁵), and being explicit about assumptions.

Scaling: Vertical, Horizontal, and Sharding