System Design•22 min read•Advanced
Case Study: URL Shortener
The smallest interesting system design problem — and a great showcase of read-heavy optimization.
Requirements
- POST /shorten { url } → returns a short URL like https://sho.rt/aZ4f9.
- GET /:code → 302 redirect to the long URL, increment click count.
- Codes should be ~6-8 chars, URL-safe, hard to guess.
- 100M new URLs per month. ~10B redirects per month. p99 redirect < 50 ms.
Estimate
- Writes: 100M/month ÷ 2.5M sec ≈ 40/sec average; ~400/sec peak.
- Reads: 10B/month ÷ 2.5M sec ≈ 4,000/sec average; ~40,000/sec peak.
- Read:write ratio is 100:1 — VERY read-heavy. Optimize the read path.
Generating short codes
- Counter + base62 — maintain a global integer counter, encode in base62 (a-z, A-Z, 0-9). Sequential, simple, predictable. The challenge: making the counter distributed (use a range allocator: each app server reserves blocks of 1000 ids).
- Hash of long URL — md5/sha1, take first 6-8 chars. Same URL gives same code, but collisions need handling (check-then-write).
- Random + check — generate a random 6-char string, check if taken, retry on collision. With 62⁶ ≈ 56B codes, collisions are rare.
Storage
The schema is trivial: code → long_url, plus metadata (created_at, owner, click_count). One row per code. 100M new rows/month × 200 bytes ≈ 20 GB/month — easy for any modern database. Sharding by code (hash) keeps load even.
The read path is everything
99% of traffic is GET /:code → 302. Optimize that path mercilessly.
- Put a CDN in front — cache redirects for popular codes at the edge. Most clicks never hit your origin.
- Add Redis in front of the database — keep top 1% of codes hot. Cache hit ratio of 95%+ is realistic.
- Make the database read replicas plentiful — and route reads round-robin.
- Click counts → don't update a row on every click. Buffer increments in a queue, flush in batches.
Other concerns
- Custom codes — let users pick their own (vanity URLs). Same flow, just check uniqueness.
- Expiration — TTL field; periodic job deletes expired entries.
- Abuse — rate-limit POST /shorten by IP; scan for malware/phishing in the long URL.
- Analytics — separate event pipeline (Kafka → ClickHouse) for per-click data; don't bog down the main DB.
💡 Tip
Always design the read path first when reads >> writes. Caching, CDNs, and read replicas matter more than write throughput here.