How to use SELECT DISTINCT in PostgreSQL: Your Guide to Unique Data Retrieval
Hey, SQL enthusiasts! 🌟 Are you tired of sifting through duplicate data and want a cleaner, more concise view of your tables? Enter SELECT DISTINCT
, a fantastic PostgreSQL command to help you retrieve unique values from one or multiple columns. Let’s also introduce you to its sibling, SELECT DISTINCT ON
, which comes in handy for those specialized cases.
What is SELECT DISTINCT?
SELECT DISTINCT
filters out all duplicate rows based on the columns listed in the SELECT clause. It’s perfect when you need a list of unique items from your table.
When Should You Use SELECT DISTINCT?
- Data Validation: When you need to make sure that the data in your column or columns is unique.
- Reporting: When creating reports that require unique attributes.
Basic Syntax
The syntax for using SELECT DISTINCT
is straightforward:
SELECT DISTINCT column1, column2 FROM table_name;
Practical Examples
Example 1: Selecting Unique Genres from tv_series
After adding a few duplicate records to our tv_series
table, let’s fetch the unique genres.
SELECT DISTINCT genre FROM tv_series;
This query will return all unique genres present in the tv_series
table, effectively filtering out any duplicates.
genre |
---|
Science Fiction |
Crime |
Animation |
Fantasy |
Comedy |
Drama |
What is SELECT DISTINCT ON?
Now let’s talk about SELECT DISTINCT ON
. This variant allows you to pick which of the duplicate rows to keep based on a specific column or columns.
When Should You Use SELECT DISTINCT ON?
Use it when you want to keep just one row from each group of duplicates, but you care which one gets kept based on another column.
Syntax:
SELECT DISTINCT ON (column1) column_alias, column2 FROM table_name ORDER BY column1, column2;
Example 2: Selecting One TV Series per Genre Based on Newest Release Year
SELECT DISTINCT ON (genre) genre, name, MAX(release_year) newest
FROM tv_series
GROUP BY genre, name, release_year
ORDER BY genre, release_year DESC;
In this example, we’re selecting one TV series per genre, and we want the one with the newest release_year
.
genre | name | newest |
---|---|---|
Animation | BoJack Horseman | 2014 |
Comedy | Ted Lasso | 2020 |
Crime | Killing Eve | 2018 |
Drama | Chernobyl | 2019 |
Fantasy | The Witcher | 2019 |
Science Fiction | Love, Death & Robots | 2019 |
In the query above, did you notice how we used clauses we have learned so far?
- Aggregate Functions:
MAX()
- Column Alias:
newest
- Group By: For grouping results
- Order By: For ordering results
Tips and Best Practices
- Performance: Using
SELECT DISTINCT
can be resource-intensive, especially on large datasets. Use it wisely. - Ordering:
SELECT DISTINCT ON
requires anORDER BY
clause to function as expected.
Common Errors and Solutions
Error Message:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Cause:
The columns in SELECT DISTINCT ON
must align with the initial columns in the ORDER BY
clause.
Solution:
Make sure the initial columns in your ORDER BY
match the columns in your SELECT DISTINCT ON
expression.
The Takeaway
So there you have it, everything you need to know to get started with SELECT DISTINCT
and SELECT DISTINCT ON
in PostgreSQL. Whether you’re eliminating duplicates or strategically selecting specific rows, you’ve got the tools you need to get the job done right.
Until next time — happy querying! 🚀