GROUP BY - HAVING

Video Transcription

Welcome back everyone. Now in this video we are going to be taking a look at group by and having. So before we took a look at simple SELECT FROM clause or from query, along with where so remember, we select columns from our table, and where is there to filter out the rows. So remember that we have the selection and projection operation. So the selection being the from and where clause. So the selection of what rows of data are we going to include in our results, and then the the projection or the select statement in our SQL query is which columns are we going to include in our on our query result. But now we can actually also group those rows together, right. So after we have whichever rows that we actually have in which columns that we have, we can actually group the results by certain conditions. So each of these though, by the way, right, these are all still all optional items. So if you ever look at the actual official documentation, everything in these square brackets here are optional statements that can be included in your SQL query.

But let’s take a look at group by and having it’s a lot easier to start picking this up, as we start to show some more examples. But groupbuy specifically, is going to define a group based off of a set of call or a set of columns or expressions, right. So based off of whatever the expression is, whether it be some comparison, or a specific column, or two or more, this is what we’re going to group our rows by. So it’s defining the defining the rules that we’re actually doing the grouping by. So this does allow aggregation. So if we, let’s say, group by order date, right, we can get the number of orders for a particular date, right, which is a very useful query to run. And there’s a lot of other different aggregations that we can show here in just a second. There are many options available to the group by element. So we can do a list of columns that we group by, we can group by sets, expressions, and there’s also we call queue roll up, those we’re not really going to cover in this class. But if you’re interested, I am more than happy to tack on a video or post some text up, that kind of explains them. But these are some of the standard operations that you’ll see associated with group by. But most of the time, we’re going to be working with aggregates, so max, min, average, and count.

So those are most of the common aggregates that will actually work with. Now aggregates themselves are almost always this particular syntax. So the function, so the aggregate function that we have on on on the right side there, and so let’s say count, and then in parentheses, you will have an some sort of expression that will tell you or tell the function what to count, right? Out of those rows. So all is the default. And then we can do, we can do distinct, so do we want to count duplicates, for example, right? Or no duplicates. duplicates are no duplicates, and then whatever the expression is, so and I’ll show an example of this here in a few minutes. And as I mentioned before, some examples of this are min max, average, sum and count. Although count is similar, but count allows for no expression. So for some, for example, you can’t sum star right? But count is kind of unique there where we can actually put star as a wildcard and say just count all the things right count all the rows, but for like some min max average, you want to know what specific thing that you are summing up or averaging or finding the maximum right you want to know the specific column out of the group that you’re actually going to apply that function to count is a little bit different count that aggregate function actually returns the number of records or rows that are in that group. But this can be utilized and with the over clause, when we are utilizing partitions. But I’m going to kind of skip over partitions for now. And we will save that topic for another time. But just kind of be aware that it is there in case you see this as you’re looking at this, or reading about this online, but we will talk about partitions in a later lecture.

In this case, no values are ignored by default. So if there is no value of is not included as part of the aggregate, let’s take a look at some examples of the group by Alright, so now I got got myself put up towards the top of the screen. And this simple query, here I am selecting all the customer IDs from the orders and grouping them by the customer ID. So when I group by the customer ID, I am essentially grouping all of the records that are associated for that specific customer. And so if I run this, you actually can end up figuring out how many customers we actually have, or how many unique customers that we actually have here. So this is all of our customers that have placed an order with us. And of course, I can try to add columns here. But this becomes a little bit tricky. So let’s do au dot order date here, I run that I get an error. So this is one of the weird things are not to say or weird things, but not initially intuitive things about the group by clause. So I cannot project an element or an attribute or column. When we have a group buy, and if that column is not inside that group, so I cannot project a column that is not part of the grouping. Because I’m grouping by a specific condition, write an expression right here, I’m grouping by just the customer ID. And so each record, I mean, let me take this off real quick. When I get my query results here, the rows that are fed to the select clause are what you see here on the screen. And so when we have like 1234, for the customer ID, there is no date associated with each of those customer IDs in this case, because those columns have been filtered out already by the group by now I could put the date back in here. Okay, now, let me do this way.

So you can add columns here. Like if I wanted to do an O dot Sales Person ID, and put this up here. And rerun this here. This actually works, because I have the me actually sources here, right. So this will group all of for so for the salesperson, it’ll group them together and all of the actual customer. So if, if customer 531 had 10 orders with this particular salesperson, all of those records will show as one row. But what about the date? Well, the date comes across a little bit easier when we actually do aggregate functions. So if, for example, I take off the salesperson here and put back the ODOT order date, and then instead tack on a let’s say aggregate function now, what happens here? Uh huh There we go. So let’s put that as first order here. But so what I’m doing here is I’m grouping by our customer ID, and then I am going up here in my selection and say, Hey, give me the smallest order date for this customer ID. So, for this group, give me the smallest order out of that group. So for each of the customers I get the date of their first order in this particular table. When I have a group by I cannot project a column if it is not part of the grouped by, but I can project a column if it is part of the group by or I can project it if it is an aggregate, if it is an aggregate, so min max, average sum count all of those sorts of things. So I could here, I can say count, and then star. And this will tell me how many? No orders? So how many orders? Has that customer actually made? So, for each customer, when was their first order? And how many orders did they actually make, right? That is what I’m actually associating here.

So, this is group by group by itself is a very powerful expression. And it really does help combine and aggregate database results in that’ll be a really common operation that you’ll see as we start moving through the class. But let’s talk about how we could actually filter those results. So I showed how you can group your results. So group those rows. And then what about filtering those because the where clause doesn’t actually filter the groups the where clause filters the individual rows, before we get to the group by so the having clause or having element is what we can use to actually filter records after they’ve been grouped. So basically, just like the where clause accepts any Boolean expression that you express, but aggregates, aggregates can actually be used here, right? The WHERE clause cannot use the aggregates because where the where clause is a single row by row operation, the having clause is a group operation. So you have a group of things that you can apply this filter to. So therefore, you can also use aggregates as part of the filtering process. So let’s take a look at an example here, I’m going to replace the query that I had before. Do or here, and sorry, for my bad syntax here, let’s fill this in with count. Okay, so I’m going to run this and we can see what happens.

And I’ll kind of work on explaining this here. So we have our counts here. So this is the number of orders again, as I say, order count. So we have order counts. And then we have our men date here. And I’m actually going to add these to a new line. So they’re a little bit easier to read. And so we have as a first date. And then let’s have that as order year, that’ll work. This is last date. And this is first There we go. Okay, so now I got names on all my columns here. We’ve got order count, first date, last date, order year. So what I am actually grouping by here is the actual order year. And again, I am using a shorter syntax here, converting the order into just the year and grouping by that. But again, though, you can, you can group by the actual raw dates. But here we can actually, we can’t group by a range of dates here, because we can’t do group give me a group that goes from one 120 16 to 1231 2016. Because there’s a lot of dates in between that range. And so you have to give one value that is represented representative of that particular group, we can group by more than one column, by the way, as I showed earlier, but here, we’re just grouping by year.

So for each year, when was the first order? When was the last order and then what was that particular year. And that was our group by, but lots of things that we can actually do with this right? The, with the group by alright, we can actually have, we can actually filter now out specific groups, right, so we can actually filter out specific groups. So let’s take off the semicolon here, and actually add in our having clause now. So select from where group by so from orders where the picking completed when is not no so it’s actually If the order has been completed, fulfilled, group those by year. And for each of those for each of those groups that we have, I am going to specify that I only want the years where we were very successful. Okay. So I want the years are that we successfully completed, let’s say 20,000 orders. And if we look at our results there, that should give us two rows.

So if we run this, oh, yeah, there we go are two rows. So this particular this particular filter, we would not be able to successfully do in the in the where clause, because the where clause is row by row, where the having is group by group. So in having clause we can actually apply a aggregate function to filter out groups that only have certain things there. So that’s a very useful feature there. But that pretty much concludes for the series, the example SQL queries that I’m going to show here in the following video, I’m actually gonna take a quick a short amount of time here to actually talk about the processing order. So as I’ve been talking, I’ve been jumping around in these actual SQL queries, talking about what each part actually does, but there’s actually a very specific order that these actually get executed in so we’ll take a look at that next