SELECT vs SELECT DISTINCT in SQL — when to use which

SELECT DISTINCT removes duplicate rows from the result set. It is not free — the database has to sort or hash every row to find duplicates.

Example

-- All cities (with duplicates)
SELECT city FROM customers;
-- mumbai, delhi, mumbai, bangalore, delhi, mumbai

-- Unique cities
SELECT DISTINCT city FROM customers;
-- mumbai, delhi, bangalore

DISTINCT applies to the WHOLE row

SELECT DISTINCT city, country FROM customers;

This returns unique (city, country) pairs — NOT just unique cities. New devs often expect "distinct on one column" — for that, use GROUP BY.

Performance

-- Slow on a 10M-row table — must sort everything
SELECT DISTINCT customer_id FROM orders;

-- Faster when you can use an index
SELECT customer_id FROM orders GROUP BY customer_id;

Sometimes GROUP BY outperforms DISTINCT because the planner picks a hash-aggregate path. Always EXPLAIN ANALYZE.

When DISTINCT is the wrong fix

-- ❌ DISTINCT hiding a real bug
SELECT DISTINCT c.name FROM customers c
JOIN orders o ON o.customer_id = c.id;

If you needed DISTINCT to fix "too many rows", you have a join cardinality problem — fix the join, not the symptom.

When DISTINCT is right

Building a dropdown of unique values
Counting unique entities (COUNT(DISTINCT user_id))
Deduplicating a derived list

Production rule

Treat every DISTINCT in a code review as a question: "What's the root cause that needed deduplication?" Often the answer is a missing WHERE clause or a fan-out join.

SELECT DISTINCT removes duplicate rows from the result set. It is not free — the database has to sort or hash every row to find duplicates.

Example

-- All cities (with duplicates)
SELECT city FROM customers;
-- mumbai, delhi, mumbai, bangalore, delhi, mumbai

-- Unique cities
SELECT DISTINCT city FROM customers;
-- mumbai, delhi, bangalore

DISTINCT applies to the WHOLE row

SELECT DISTINCT city, country FROM customers;

This returns unique (city, country) pairs — NOT just unique cities. New devs often expect "distinct on one column" — for that, use GROUP BY.

Performance

-- Slow on a 10M-row table — must sort everything
SELECT DISTINCT customer_id FROM orders;

-- Faster when you can use an index
SELECT customer_id FROM orders GROUP BY customer_id;

Sometimes GROUP BY outperforms DISTINCT because the planner picks a hash-aggregate path. Always EXPLAIN ANALYZE.

When DISTINCT is the wrong fix

-- ❌ DISTINCT hiding a real bug
SELECT DISTINCT c.name FROM customers c
JOIN orders o ON o.customer_id = c.id;

If you needed DISTINCT to fix "too many rows", you have a join cardinality problem — fix the join, not the symptom.

When DISTINCT is right

Building a dropdown of unique values
Counting unique entities (COUNT(DISTINCT user_id))
Deduplicating a derived list

Production rule

Treat every DISTINCT in a code review as a question: "What's the root cause that needed deduplication?" Often the answer is a missing WHERE clause or a fan-out join.

SELECT vs SELECT DISTINCT in SQL — when to use which

Example

DISTINCT applies to the WHOLE row

Performance

When DISTINCT is the wrong fix

When DISTINCT is right

Production rule

SELECT vs SELECT DISTINCT in SQL — when to use which

Example

DISTINCT applies to the WHOLE row

Performance

When DISTINCT is the wrong fix

When DISTINCT is right

Production rule

WHERE vs HAVING in SQL — what is the actual difference?

Subquery vs JOIN in SQL — performance and readability tradeoffs

What are SQL indexes and how do they actually work?

WHERE vs HAVING in SQL — what is the actual difference?

Subquery vs JOIN in SQL — performance and readability tradeoffs

What are SQL indexes and how do they actually work?

Related questions

Related questions