SQLEasy
SELECT vs SELECT DISTINCT in SQL — when to use which
SELECT DISTINCT removes duplicate rows from the result set. It is not free — the database has to sort or hash every row to find duplicates.
Example
-- All cities (with duplicates)
SELECT city FROM customers;
-- mumbai, delhi, mumbai, bangalore, delhi, mumbai
-- Unique cities
SELECT DISTINCT city FROM customers;
-- mumbai, delhi, bangalore
DISTINCT applies to the WHOLE row
SELECT DISTINCT city, country FROM customers;
This returns unique (city, country) pairs — NOT just unique cities. New devs often expect "distinct on one column" — for that, use GROUP BY.
Performance
-- Slow on a 10M-row table — must sort everything
SELECT DISTINCT customer_id FROM orders;
-- Faster when you can use an index
SELECT customer_id FROM orders GROUP BY customer_id;
Sometimes GROUP BY outperforms DISTINCT because the planner picks a hash-aggregate path. Always EXPLAIN ANALYZE.
When DISTINCT is the wrong fix
-- ❌ DISTINCT hiding a real bug
SELECT DISTINCT c.name FROM customers c
JOIN orders o ON o.customer_id = c.id;
If you needed DISTINCT to fix "too many rows", you have a join cardinality problem — fix the join, not the symptom.
When DISTINCT is right
- Building a dropdown of unique values
- Counting unique entities (
COUNT(DISTINCT user_id)) - Deduplicating a derived list
Production rule
Treat every DISTINCT in a code review as a question: "What's the root cause that needed deduplication?" Often the answer is a missing WHERE clause or a fan-out join.