DISTINCT

Video Transcription

Welcome back, everyone. So let’s continue diving into adding more and more elements into our single table queries. In this video, we’re going to cover the distinct qualifier. So distinct allows us to guarantee uniqueness and the things that actually come out and are result of our query. So by default, we select all things. So all of the values inside of each of the columns that we actually pick. Now, when we use distinct, each of the columns that we specify to be distinct, we verify that those records are unique. So if we have five color, five rows that have the color blue, but if we only want the distinct records, those records that have duplicates will be removed from our result. So like aggregates, all is the default of our select, if distinct, is not specified. So when we do select orders, select star from orders, or select whatever from whatever table source that you have, by default, we’re pulling all of those records out. But if we add distinct in there, we actually guarantee you that each record is unique. So when we add that distinct, each of the tuples or, or the resulting set, it has only unique values.

But let’s take a look at some examples. Because distinct allows us to do some more interesting things, especially when we deal with aggregates distinct by default, our query looks exactly like this, right? So this top query here is equivalent to saying select all. So both of these queries here actually come out with the exact same results. So if we look down here at the two tables, or the two results that come out, there are going to be 100% identical, because all is the default operation. But what happens when we actually change let’s take out one of these here. We see that so here we have a whole bunch of like each year, we have multiple years, right, but if I change this to be distinct. and execute that happens here. So now, actually, let’s go let me actually pull want to show what happens show all as well here, because our records there we go. And then I’m actually going to also add an order O dot customer ID. Let’s make that a capital O. Now let’s run this. There we go. So if we look at our results, here, you can actually see the resulting effect of the distinct keyword.

So in this first set is my distinct query, right. So just select distinct records that have for the columns year, order year, and the customer ID from our resulting set. So we have for 2013, it’s only going to show each customer once and only once. Although year comes out multiple times, but distinct does not apply to an individual column here, it applies to the entire row. And this particular instance, now if we look at the original query down here, towards the bottom, you can see that for 2013 customer, one had made multiple orders and so they show up multiple times. So distinct removes those duplicates, and in this case, right, the duplicates don’t actually add any new information to the results of my query. So distinct here is actually very beneficial for interpreting The results that we actually have before when I showed order by most of the things that I was actually ordering by were in the order by was part of the SELECT clause.

So if I run this though, you’ll see the order ID is no longer part of my SELECT clause. But I am allowed to sort by things, or by columns that aren’t actually projected in the final results, which is kind of a useful thing. But if I actually add the distinct keyword here ah, that no longer works, right? ORDER BY items must appear in the select list if SELECT DISTINCT to specify, right? This is because when we actually try to, when we order by, right, we can’t actually order by a column that doesn’t exist in the SELECT clause for the distinct because not all records are actually included. So our ordering would actually be invalid. Because we don’t actually have all records there, we only have a unique set of records. That order by is actually picking from. So if you want to order by something, right, you can order by anything, is assuming that it’s in the actual table. So you can order by any column in your table if you’re selecting all records, but if you select distinct, you can only order by columns that exist in the SELECT clause.