Case Study: Design a Twitter-like Feed
End-to-end walkthrough of the most-asked system design interview question.
Step 1: Clarify requirements
Before drawing boxes, agree on what we're building. In an interview, ask questions; in real life, write a one-page spec.
Functional requirements
- Users can post short text (≤280 chars), with optional images.
- Users can follow other users.
- Users can view a HOME timeline (posts from people they follow, newest first).
- Users can view a PROFILE timeline (a single user's posts).
- Users can search posts.
Non-functional requirements
- 100M daily active users.
- p99 home-timeline read < 200 ms.
- Posts must be durable; eventual consistency on timelines is acceptable (a tweet showing up 5 seconds late is fine).
- Read-to-write ratio is roughly 100:1 — reads dominate.
Step 2: Estimate
- 100M DAU × 2 posts/day = 200M posts/day ≈ 2,300 posts/sec average; ~10x peak ≈ 25,000 posts/sec.
- 100M DAU × 50 timeline loads/day = 5B reads/day ≈ 60,000 reads/sec.
- Storage: 200M posts/day × 300 bytes ≈ 60 GB/day ≈ 22 TB/year (text). Media is much larger and goes to object storage.
Step 3: API surface
POST /tweets -> create a tweet
GET /tweets/{id} -> read a tweet
GET /users/{id}/timeline -> profile timeline
GET /timeline -> home timeline (auth)
POST /follows { user_id } -> follow
DELETE /follows/{user_id} -> unfollowStep 4: The core architectural choice — fan-out
How does a user's home timeline get built? Two approaches:
Fan-out on read (pull)
When a user opens their timeline, fetch each followed user's recent posts and merge. SIMPLE — write is just an insert. Read is expensive: with 1000 followees, that's 1000 lookups per timeline load. Doesn't scale to read-heavy workloads.
Fan-out on write (push)
When a user posts, push the post id into the timeline cache of every follower. Read is now O(1) — just fetch the user's pre-built timeline. Write is expensive: posting forces N inserts where N is the number of followers.
Hybrid (the real answer)
- Use fan-out on write for normal users (most users have <1000 followers).
- Use fan-out on read for celebrities (>1M followers — pushing 1M times per tweet is absurd).
- At read time, merge the pre-built timeline cache with recent posts from followed celebrities.
This is roughly what Twitter actually does, and it's the answer interviewers want.
Step 5: Data model
- POSTS — append-only table, sharded by user_id, replicated. Includes id, user_id, text, media_urls, created_at.
- FOLLOWS — sharded social graph (user_id, followed_id). Could also use a graph database.
- TIMELINE CACHE — Redis lists keyed by user_id, capped at ~800 post ids. Materialized via fan-out on write.
- MEDIA — large blobs in object storage (S3); only URLs are stored in the relational layer.
- SEARCH INDEX — separate Elasticsearch cluster, populated asynchronously from a Kafka stream.
Step 6: Components
- API gateway — auth, rate limit, route to services.
- Tweet service — write path, validates and persists posts.
- Fan-out worker — consumes new posts, writes to follower timeline caches.
- Timeline service — read path, merges cache with celebrity posts.
- User/social-graph service — owns follow relationships.
- Search service — Elasticsearch + ingest pipeline.
- Notification service — push notifications for new mentions/likes.
Step 7: Bottlenecks and mitigations
- Hot timeline keys (celebrities reading their own profile) — multi-tier cache: per-region Redis + CDN-cached profile pages.
- Hot writer (a celebrity posts) — push to a much smaller \"interesting users\" list, not all followers.
- Search hotspots — partition Elasticsearch by time + user; route queries by recency.
- Storage growth — tier old posts to cheaper storage (S3 Glacier, archive tables); rehydrate on demand.