System Design•22 min read•Advanced

Case Study: URL Shortener

The smallest interesting system design problem — and a great showcase of read-heavy optimization.

Requirements

POST /shorten { url } → returns a short URL like https://sho.rt/aZ4f9.
GET /:code → 302 redirect to the long URL, increment click count.
Codes should be ~6-8 chars, URL-safe, hard to guess.
100M new URLs per month. ~10B redirects per month. p99 redirect < 50 ms.

Estimate

Writes: 100M/month ÷ 2.5M sec ≈ 40/sec average; ~400/sec peak.
Reads: 10B/month ÷ 2.5M sec ≈ 4,000/sec average; ~40,000/sec peak.
Read:write ratio is 100:1 — VERY read-heavy. Optimize the read path.

Generating short codes

Counter + base62 — maintain a global integer counter, encode in base62 (a-z, A-Z, 0-9). Sequential, simple, predictable. The challenge: making the counter distributed (use a range allocator: each app server reserves blocks of 1000 ids).
Hash of long URL — md5/sha1, take first 6-8 chars. Same URL gives same code, but collisions need handling (check-then-write).
Random + check — generate a random 6-char string, check if taken, retry on collision. With 62⁶ ≈ 56B codes, collisions are rare.

Storage

The schema is trivial: code → long_url, plus metadata (created_at, owner, click_count). One row per code. 100M new rows/month × 200 bytes ≈ 20 GB/month — easy for any modern database. Sharding by code (hash) keeps load even.

The read path is everything

99% of traffic is GET /:code → 302. Optimize that path mercilessly.

Put a CDN in front — cache redirects for popular codes at the edge. Most clicks never hit your origin.
Add Redis in front of the database — keep top 1% of codes hot. Cache hit ratio of 95%+ is realistic.
Make the database read replicas plentiful — and route reads round-robin.
Click counts → don't update a row on every click. Buffer increments in a queue, flush in batches.

Other concerns

Custom codes — let users pick their own (vanity URLs). Same flow, just check uniqueness.
Expiration — TTL field; periodic job deletes expired entries.
Abuse — rate-limit POST /shorten by IP; scan for malware/phishing in the long URL.
Analytics — separate event pipeline (Kafka → ClickHouse) for per-click data; don't bog down the main DB.

💡 Tip

Always design the read path first when reads >> writes. Caching, CDNs, and read replicas matter more than write throughput here.

← Previous

Case Study: Design a Twitter-like Feed