SQLMedium
Subquery vs JOIN in SQL — performance and readability tradeoffs
Most query planners (Postgres, MySQL 8+, SQL Server) optimize them identically for simple cases. Pick based on readability. For complex correlated subqueries, JOINs are usually faster and more predictable.
Side-by-side — same result
-- Subquery (IN)
SELECT name FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 10000);
-- JOIN
SELECT DISTINCT c.name
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.total > 10000;
Postgres typically rewrites both to the same execution plan. Test with EXPLAIN.
Where JOIN clearly wins — multi-column lookup
-- Painful as subquery
SELECT id FROM orders WHERE customer_id IN (
SELECT id FROM customers WHERE city = 'mumbai' AND tier = 'gold'
);
-- Cleaner as JOIN
SELECT o.id FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE c.city = 'mumbai' AND c.tier = 'gold';
JOIN also lets you SELECT columns from both tables; subquery cannot.
Where subquery wins — anti-join (NOT EXISTS)
-- Customers with NO orders — cleaner as subquery
SELECT name FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- LEFT JOIN equivalent — works but less self-documenting
SELECT c.name FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;
When subquery is a trap — correlated subquery on a big table
-- ❌ Runs the inner query ONCE PER ROW of customers — N×M scans
SELECT name,
(SELECT SUM(total) FROM orders WHERE customer_id = customers.id) AS total
FROM customers;
Rewrite as JOIN + GROUP BY:
-- ✅ Single scan + hash join
SELECT c.name, COALESCE(SUM(o.total), 0) AS total
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name;
IN vs EXISTS
-- IN — best when the subquery is small or static
WHERE customer_id IN (1, 5, 9, 12)
-- EXISTS — best when the subquery is large; stops at first match
WHERE EXISTS (SELECT 1 FROM big_table WHERE big_table.fk = main.id)
Rule I use in code review
- Use JOIN when you need columns from both tables.
- Use EXISTS / NOT EXISTS for "does a related row exist".
- Use IN with a literal list.
- Avoid correlated subqueries in the SELECT list — they're the silent killer of slow queries.