SQLHard
How to optimize a slow SQL query — step-by-step debugging approach
There is one correct first step and most developers skip it: EXPLAIN ANALYZE. Read the plan before changing anything.
Step 1 — capture the actual plan
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
Look for:
- Seq Scan on big tables — usually a missing index
- Sort with high cost — could be removed with an index
- Nested Loop with high row count — wrong join type for the data
- Actual rows >> estimated rows — stats are stale, run
ANALYZE
Step 2 — narrow down WHY it is slow
| Symptom in plan | Likely cause |
|---|---|
| Seq Scan on million-row table + WHERE | Missing index on filtered column |
| Index Scan but slow | Wrong index, or index unused (function in WHERE) |
| Nested Loop returning millions | Should be hash join — increase work_mem |
| Sort + Disk: ... | work_mem too small; query spills to disk |
Large Rows Removed by Filter | Filter happening AFTER scan; push earlier |
Step 3 — fix in order of cost
A. Add an index (cheapest)
CREATE INDEX CONCURRENTLY idx_orders_created ON orders(created_at);
CONCURRENTLY doesn't lock the table — safe in production.
B. Restructure the query
-- Slow — function blocks index
SELECT * FROM orders WHERE DATE(created_at) = '2026-05-19';
-- Fast — range scan uses index
SELECT * FROM orders
WHERE created_at >= '2026-05-19'
AND created_at < '2026-05-20';
C. SELECT only the columns you need
-- ❌
SELECT * FROM big_table WHERE id = 1;
-- ✅ — covering index possible
SELECT id, name FROM big_table WHERE id = 1;
D. Avoid OR — use UNION ALL
-- Planner often can't index-scan both branches
SELECT * FROM users WHERE email = 'x' OR phone = 'y';
-- Often faster
SELECT * FROM users WHERE email = 'x'
UNION ALL
SELECT * FROM users WHERE phone = 'y';
E. LIMIT early in subqueries
-- Top 10 customers + their last 5 orders
WITH top_customers AS (
SELECT id FROM customers ORDER BY total_spend DESC LIMIT 10
)
SELECT o.* FROM orders o JOIN top_customers t ON t.id = o.customer_id;
Step 4 — verify
Re-run EXPLAIN ANALYZE. Compare the actual time, not just the cost estimate. The cost is the planner's guess; actual time is reality.
Real example
Query: SELECT * FROM orders WHERE customer_id = 5 ORDER BY created_at DESC LIMIT 20;
Before — 1.2s — Seq Scan on 50M-row orders table.
CREATE INDEX CONCURRENTLY idx_orders_cust_created
ON orders(customer_id, created_at DESC);
After — 0.4ms — Index Scan using idx_orders_cust_created. 3000× faster.
The order matters: a composite index on (customer_id, created_at DESC) lets Postgres jump straight to customer 5's most recent rows and return them in order, no sort needed.
Antipatterns that always slow you down
- N+1 queries from the application layer — one query for the list, then one query per row.
- Implicit type casts in WHERE —
WHERE id = '5'(string vs int) often disables index use. - OFFSET pagination on large tables — use cursor pagination with
WHERE id > :cursor LIMIT N. - No
LIMITon potentially-huge queries — fetches gigabytes the app never displays. - Stale statistics —
VACUUM ANALYZEon the table; the planner needs accurate row counts.
Tools
- Postgres:
pg_stat_statements,auto_explain, EXPLAIN ANALYZE - MySQL:
EXPLAIN, slow query log,pt-query-digest - Cloud-managed: AWS RDS Performance Insights, Cloud SQL Insights, Azure Query Performance Insight