SQLMedium
What are SQL indexes and how do they actually work?
An index is a separate data structure (almost always a B-tree) that the database keeps in sync with a table, so it can find rows by indexed columns in O(log n) instead of scanning the whole table O(n).
Concrete example
CREATE TABLE users (id BIGINT, email TEXT, name TEXT, created_at TIMESTAMPTZ);
-- 10 million rows. No index.
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';
-- Seq Scan on users — Actual time: 1850 ms
CREATE INDEX idx_users_email ON users(email);
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';
-- Index Scan using idx_users_email — Actual time: 0.04 ms
40,000× faster on a 10M-row table.
Composite (multi-column) indexes — order matters
CREATE INDEX idx_orders_cust_date ON orders(customer_id, created_at);
This index helps:
WHERE customer_id = ?WHERE customer_id = ? AND created_at > ?WHERE customer_id = ? ORDER BY created_at DESC
It does NOT help:
WHERE created_at > ?alone — the leading column isn't filtered
Think of a composite index as a phone book sorted by (last_name, first_name). It is great for "find Smith, John" but useless for "find anyone named John".
What indexes cost
- Disk space — every index is ~10-30% of the table size.
- Write speed — every INSERT/UPDATE on indexed columns also updates the index. 5 indexes ≈ 5× the write cost.
- VACUUM/maintenance overhead.
When to add an index
Add when:
- A column is frequently used in
WHERE,JOIN, orORDER BYon a large table - Query latency matters (user-facing)
Skip when:
- Table is tiny (< ~10k rows — full scan is faster anyway)
- Column has low cardinality (e.g.
genderwith 2-3 values — index won't help) - Workload is write-heavy and reads are rare
Partial indexes — Postgres trick
-- Only index the 1% of orders that are still 'pending'
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';
Tiny index, blazing-fast for queries filtering on status = 'pending'.
How to find missing indexes
-- Postgres: queries doing full table scans
SELECT query, calls, mean_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC LIMIT 20;
Or use the slow query log. Look for Seq Scan on big_table in EXPLAIN ANALYZE output.
Anti-patterns
- Index on every column — wrecks write performance.
- Function in WHERE blocks index —
WHERE LOWER(email) = ...doesn't useidx_users_email. Either index the expression:CREATE INDEX ... ON users(LOWER(email)), or store lowercase at write time. - Leading wildcard —
WHERE name LIKE '%alice%'cannot use a B-tree index. Use full-text search (tsvector) or trigram (pg_trgm).