Databases: SQL vs NoSQL, Indexes, Replication
How to pick a database, what an index actually is, and why replication is non-negotiable.
Pick a database on first principles
Choosing a database is one of the most consequential decisions you make on a project — switching later costs months. The two big families are RELATIONAL (SQL) and NoSQL, but \"NoSQL\" really means four very different things.
Relational (Postgres, MySQL, SQLite)
- Schema-on-write — every row has a fixed shape, validated on insert.
- Joins across tables — let you store data once and combine flexibly.
- Transactions with ACID guarantees — Atomicity, Consistency, Isolation, Durability.
- Mature query optimizer — you write what, the engine decides how.
- Best for: most business systems. Pick Postgres unless you have a specific reason not to.
Document stores (MongoDB, Couchbase)
- Schema-on-read — documents (JSON-like) can have any shape.
- Great for hierarchical data that's read together as one blob.
- Joins are weaker; you often denormalize.
Wide-column stores (Cassandra, ScyllaDB, HBase)
- Massive write throughput, linear horizontal scale, partition-tolerant.
- You design the schema around the queries you'll run — query patterns dictate primary keys.
- Best for time-series, IoT, event logs at huge scale.
Key-value stores (Redis, DynamoDB, etcd)
- Simplest data model: blob keyed by string. Often O(1) gets/sets.
- Used for caching, session storage, leaderboards, queues.
- DynamoDB also serves as a primary database for some apps.
Indexes — what they actually are
An index is a separate, sorted data structure (usually a B-tree) that maps column values to row pointers. Without an index, finding rows where `email = 'x'` requires a FULL TABLE SCAN — read every row. With an index, the database does an O(log n) search in the index and follows the pointer.
Indexes make reads faster but writes slower (every insert/update must also update the index). They also take disk space. Index the columns you frequently filter on, join on, or sort by.
// Without index: full table scan, O(n)
SELECT * FROM users WHERE email = \u0027ada@example.com\u0027;
// CREATE INDEX idx_users_email ON users(email);
// Now: O(log n) lookupComposite indexes
An index on `(country, city)` can serve queries filtering on `country`, or on both `country AND city`. But NOT on `city` alone — because the index is sorted by country first. Order of columns in a composite index matters.
Transactions and isolation levels
A transaction groups multiple reads/writes into a single atomic unit — all succeed or all fail. The classic example: transfer $100 from A to B. You debit A and credit B; if the second step fails, the first must be rolled back.
- Read uncommitted — can see uncommitted changes from other transactions. Almost never used.
- Read committed — sees only committed data. The default in most DBs.
- Repeatable read — repeated reads of the same row return the same value within a transaction.
- Serializable — strongest. As if all transactions ran one at a time. Slowest.
Replication and backups
Run at least one replica (read replica or hot standby) of any production database. When the primary fails, you fail over. Take regular backups to separate storage — replicas don't protect you from accidental DELETEs and bad migrations. Test the restore procedure; an untested backup isn't a backup.