Data Structures & Algorithms22 min readBeginner

Hash Maps & Sets

How hash tables work — and why they turn O(n²) problems into O(n) ones.

What is a hash map?

A hash map (hash table, dict, dictionary, associative array) stores key-value pairs and provides expected O(1) insert, lookup, and delete. The trick: a hash function turns each key into an integer, which becomes an index into a backing array.

🎟️
Real-life analogy — A coat-check counter
You hand over your coat (the value) and get a ticket (the key). When you come back, you don't search through every coat — you hand the ticket back and the attendant goes straight to the right hook. The hash function is what tells them WHICH hook.
Hash table — how dicts actually store things
Each key is hashed to a bucket index. Two keys in the same bucket → "collision", resolved by chaining.
Hash function
hash("Ada")
= 65662
% 8
→ bucket 6
0
empty
1
empty
2
empty
3
empty
4
empty
5
empty
6
Ada: 36
7
empty

Under the hood

  • Backing array of buckets, often ~75% full.
  • Insert(k, v): compute hash(k), mod by array size to get a bucket, place (k, v) there.
  • Two keys hash to the same bucket → COLLISION. Resolved by chaining (linked list per bucket) or open addressing (probe to next empty slot).
  • When the array gets too full, allocate a bigger one and rehash all entries — amortized O(1).

When the hash function distributes keys evenly, lookups are O(1). When it doesn't (or an adversary picks pathological keys), they degrade to O(n) — every key in one bucket. Modern languages use cryptographically randomized hashes to prevent this.

The classic pattern: turn O(n²) into O(n)

# O(n^2) brute force
def two_sum_brute(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return [i, j]

# O(n) with a hash map
def two_sum(nums, target):
    seen = {}                   # value -> index
    for i, n in enumerate(nums):
        complement = target - n
        if complement in seen:
            return [seen[complement], i]
        seen[n] = i
    return []

print(two_sum([2, 7, 11, 15], 9))   # [0, 1]

The pattern: as you scan, store what you've seen; for each new element, check if its complement is already there. One pass, O(n) extra space.

Sets

A set is a hash map without the values — only keys. Use it whenever the question is "have I seen this?" or for set arithmetic.

# Detect duplicates in a list
def has_duplicate(nums):
    return len(set(nums)) != len(nums)

# First non-repeating character
from collections import Counter

def first_unique(s):
    counts = Counter(s)
    for ch in s:
        if counts[ch] == 1:
            return ch
    return None

print(first_unique("leetcode"))   # \"l\"

When NOT to use a hash map

  • When you need ORDERED iteration by key — use a tree-based map (Java's TreeMap, C++'s std::map).
  • When keys are small integers in a tight range — a plain array indexed by key is faster.
  • When you need to iterate in INSERTION order — Python dicts and JS Maps are fine for this; standard sets are not always.
💡 Tip
The two highest-leverage data structures in interviews are the hash map and the heap. Master both before learning anything fancier.