System Design28 min readAdvanced

Databases: SQL vs NoSQL, Indexes, Replication

How to pick a database, what an index actually is, and why replication is non-negotiable.

Pick a database on first principles

Choosing a database is one of the most consequential decisions you make on a project — switching later costs months. The two big families are RELATIONAL (SQL) and NoSQL, but \"NoSQL\" really means four very different things.

Relational (Postgres, MySQL, SQLite)

  • Schema-on-write — every row has a fixed shape, validated on insert.
  • Joins across tables — let you store data once and combine flexibly.
  • Transactions with ACID guarantees — Atomicity, Consistency, Isolation, Durability.
  • Mature query optimizer — you write what, the engine decides how.
  • Best for: most business systems. Pick Postgres unless you have a specific reason not to.

Document stores (MongoDB, Couchbase)

  • Schema-on-read — documents (JSON-like) can have any shape.
  • Great for hierarchical data that's read together as one blob.
  • Joins are weaker; you often denormalize.

Wide-column stores (Cassandra, ScyllaDB, HBase)

  • Massive write throughput, linear horizontal scale, partition-tolerant.
  • You design the schema around the queries you'll run — query patterns dictate primary keys.
  • Best for time-series, IoT, event logs at huge scale.

Key-value stores (Redis, DynamoDB, etcd)

  • Simplest data model: blob keyed by string. Often O(1) gets/sets.
  • Used for caching, session storage, leaderboards, queues.
  • DynamoDB also serves as a primary database for some apps.

Indexes — what they actually are

An index is a separate, sorted data structure (usually a B-tree) that maps column values to row pointers. Without an index, finding rows where `email = 'x'` requires a FULL TABLE SCAN — read every row. With an index, the database does an O(log n) search in the index and follows the pointer.

Indexes make reads faster but writes slower (every insert/update must also update the index). They also take disk space. Index the columns you frequently filter on, join on, or sort by.

json
// Without index: full table scan, O(n)
SELECT * FROM users WHERE email = \u0027ada@example.com\u0027;

// CREATE INDEX idx_users_email ON users(email);
// Now: O(log n) lookup

Composite indexes

An index on `(country, city)` can serve queries filtering on `country`, or on both `country AND city`. But NOT on `city` alone — because the index is sorted by country first. Order of columns in a composite index matters.

Transactions and isolation levels

A transaction groups multiple reads/writes into a single atomic unit — all succeed or all fail. The classic example: transfer $100 from A to B. You debit A and credit B; if the second step fails, the first must be rolled back.

  • Read uncommitted — can see uncommitted changes from other transactions. Almost never used.
  • Read committed — sees only committed data. The default in most DBs.
  • Repeatable read — repeated reads of the same row return the same value within a transaction.
  • Serializable — strongest. As if all transactions ran one at a time. Slowest.

Replication and backups

Run at least one replica (read replica or hot standby) of any production database. When the primary fails, you fail over. Take regular backups to separate storage — replicas don't protect you from accidental DELETEs and bad migrations. Test the restore procedure; an untested backup isn't a backup.

💡 Tip
Most companies\u2019 \"we need MongoDB for scale\" stories should\u2019ve been \"we need an index on the right column\". Run EXPLAIN on slow queries before redesigning your stack.