Databases22 min readIntermediate

Schema Design & Normalization

1NF, 2NF, 3NF — and the practical guideline that covers 95% of real-world tables.

Why normalize?

Normalization is the process of structuring tables so that each piece of information is stored in EXACTLY ONE PLACE. The benefit: updates only need to happen in one row, and inconsistencies become impossible.

An un-normalized table

json
orders
+----+--------+--------------+----------+----------+
| id | item   | customer_name | email   | city     |
+----+--------+--------------+----------+----------+
|  1 | Apple  | Ada          | a@x.com  | Bristol  |
|  2 | Pear   | Ada          | a@x.com  | Bristol  |
|  3 | Apple  | Bob          | b@x.com  | London   |
+----+--------+--------------+----------+----------+
-- Ada's email lives in 2 rows. Update one and forget the other → inconsistent.
-- Move Ada to a new city? Now you must update every order row.

Normalized into two tables

json
customers
+----+------+---------+----------+
| id | name | email   | city     |
+----+------+---------+----------+
|  1 | Ada  | a@x.com | Bristol  |
|  2 | Bob  | b@x.com | London   |
+----+------+---------+----------+

orders
+----+----------+--------+
| id | cust_id  | item   |
+----+----------+--------+
|  1 | 1        | Apple  |
|  2 | 1        | Pear   |
|  3 | 2        | Apple  |
+----+----------+--------+

The practical rule

If you find yourself updating the same value in multiple rows, you should probably split that data into a separate table linked by a foreign key. That's 90% of normalization in one sentence.

ℹ Note
Sometimes you DENORMALIZE for performance — duplicating data to avoid expensive joins. That's a deliberate, eyes-open tradeoff: faster reads, more write complexity. Don't denormalize before you have a measured performance problem.