PostgreSQL SELECT DISTINCT

How to use SELECT DISTINCT in PostgreSQL: Your Guide to Unique Data Retrieval

Hey, SQL enthusiasts! 🌟 Are you tired of sifting through duplicate data and want a cleaner, more concise view of your tables? Enter SELECT DISTINCT, a fantastic PostgreSQL command to help you retrieve unique values from one or multiple columns. Let’s also introduce you to its sibling, SELECT DISTINCT ON, which comes in handy for those specialized cases.

What is SELECT DISTINCT?

SELECT DISTINCT filters out all duplicate rows based on the columns listed in the SELECT clause. It’s perfect when you need a list of unique items from your table.

When Should You Use SELECT DISTINCT?

  1. Data Validation: When you need to make sure that the data in your column or columns is unique.
  2. Reporting: When creating reports that require unique attributes.

Basic Syntax

The syntax for using SELECT DISTINCT is straightforward:

SELECT DISTINCT column1, column2 FROM table_name;

Practical Examples

Example 1: Selecting Unique Genres from tv_series

After adding a few duplicate records to our tv_series table, let’s fetch the unique genres.

SELECT DISTINCT genre FROM tv_series;

This query will return all unique genres present in the tv_series table, effectively filtering out any duplicates.

genre
Science Fiction
Crime
Animation
Fantasy
Comedy
Drama

What is SELECT DISTINCT ON?

Now let’s talk about SELECT DISTINCT ON. This variant allows you to pick which of the duplicate rows to keep based on a specific column or columns.

When Should You Use SELECT DISTINCT ON?

Use it when you want to keep just one row from each group of duplicates, but you care which one gets kept based on another column.

Syntax:

SELECT DISTINCT ON (column1) column_alias, column2 FROM table_name ORDER BY column1, column2;

Example 2: Selecting One TV Series per Genre Based on Newest Release Year

SELECT DISTINCT ON (genre) genre, name, MAX(release_year) newest
FROM tv_series
GROUP BY genre, name, release_year
ORDER BY genre, release_year DESC;

In this example, we’re selecting one TV series per genre, and we want the one with the newest release_year.

genre name newest
Animation BoJack Horseman 2014
Comedy Ted Lasso 2020
Crime Killing Eve 2018
Drama Chernobyl 2019
Fantasy The Witcher 2019
Science Fiction Love, Death & Robots 2019

In the query above, did you notice how we used clauses we have learned so far?

Tips and Best Practices

  1. Performance: Using SELECT DISTINCT can be resource-intensive, especially on large datasets. Use it wisely.
  2. Ordering: SELECT DISTINCT ON requires an ORDER BY clause to function as expected.

Common Errors and Solutions

Error Message:

ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Cause:

The columns in SELECT DISTINCT ON must align with the initial columns in the ORDER BY clause.

Solution:

Make sure the initial columns in your ORDER BY match the columns in your SELECT DISTINCT ON expression.

The Takeaway

So there you have it, everything you need to know to get started with SELECT DISTINCT and SELECT DISTINCT ON in PostgreSQL. Whether you’re eliminating duplicates or strategically selecting specific rows, you’ve got the tools you need to get the job done right.

Until next time — happy querying! 🚀

Leave a Reply