Chapter 3.2

Single Table Queries Part 2

  • HAVING
  • ORDER BY
  • DISTINCT
  • TOP
  • OFFSET…FETCH…
  • Logical Processing Order

Subsections of Single Table Queries Part 2

Single Table Queries Part 2

YouTube Video

Video Transcription

Welcome back everyone. In this video series, we’re going to continue our talk on single table queries. Now last time, we covered a lot about the SELECT clause, which can be paired with from to choose which tables you’re picking from, and the where clause, which can filter which rows are being picked. And then we also did a little bit with group by and the having clause which filters those groups. So we’ll review those a little bit today. And then we will cover order by distinct Top Offset fetch, and also the process or the order in which those are logically processed inside SQL. So let’s first start out with the having clause, this is what we ended up on in the previous video. So remember, the having clause is a post group filter, right? So meaning the having clause can be used with in place of the where clause which filters row by row. So we have SELECT FROM WHERE, so the from clause picks which tables, so the data source, and then the where clause will filter out any rows that you don’t want. And but where clauses only row by row, likewise, so having is grouped by group, so having an were cannot be used interchangeably. So having can’t be used to filter out rows, or individual rows, and the where clause can’t be used to filter out groups. So that’s why they are separate there.

So the big benefit there that we can have with having versus the where clause is that we can now actually use aggregates, so things like count, average sum, and all those sorts of things can’t be used inside of the where clause, because it is again, right single rows at a time, where the having clause has groups at a time that it can actually filter. And so we can actually use aggregates like counts, like we, like we did last time to filter out groups that don’t meet minimum specifications. And I’ll show another example of that here in just a moment. But I also wanted to introduce again, a nother clause that we didn’t get to last time is the order by now order by does allow us to as it sounds, order our results, which helps out quite a lot in providing some consistency. So by default, your queries aren’t necessarily sorted in any particular order. So for the most part, the results that you retrieve from your database are going to be retrieved in the order that those results were actually entered into your database. But not necessarily always the case, because sometimes that order can get flip flopped or shuffled. So if you are ever concerned about needing a specific order, the order by element or clause is going to be something that is a must to enable that consistency.

So the ORDER BY clause can do ascending and descending orders. So those are the two supported sorts, the ASC or D, S, C are the key words there can follow the expression. So order by ascending order by descending are the keywords that you’re looking there. But ASC or the ascending keyword is the default, the default order, so you can just do order by column without actually specifying which order you want. And by default, it’ll go ascending. So typically, if you ever want to sort anything other than that, you will need to use this, this ending, but the ascending keyword is not required. But nonetheless, let’s take a look at a couple of examples. So I did want to review slightly here, lists is one of the queries that we started to cover last time. So we have order year order count, first order date, and last order dates for all groups of orders by year. So we again remember the from clause is executed first, then the group by in this case and then the SELECT clause, which is again, right not the typical processing order compared to your programming languages.

But here, we’re just grouping all orders by year, which enables us to count the number of total orders for each year. And then we can also pick the min and max date there. But again, remember that we cannot have any columns in the SELECT clause that are not in the group by if we do have columns in the SELECT clause that are are not in the group by, then those must be presented as aggregates as we have here, with the count min, and Max, which are all aggregates, even though the order date, in its raw form are not as not part of the group by or even the star here, things are this is not included in the group by but I can actually do this aggregates per group and project those in the SELECT clause. Again, remember here with those, we have projection, this the set operation projection which is handled by the select that is which column so which vertical selection, the vertical parts of my set, and then the from and group by are going to deal with more of the selection operation, specifying which rows we’re going to have in our result. We also talked about the having clause last time, again, just as a quick review, remember, the having clause is going to filter your groups, right, the having cannot filter out row by row, but it can filter out group by group. And so the benefit there is that we can actually use aggregates here as a result, so we can execute this. There we go. So, now, we only have the order the order years that have more than 20,000 orders in that particular year.

So pretty useful operations here that we can actually start to do some more interesting things with the data that’s stored in our database. Let’s take a look at the order by now as well. So order by is new, we did not have that last time. So order by is going to come after your FROM clause, it will, you can also order but you can also order your groups by the way as well. So if you want to order your groups, the order by is going to come after group by and having clauses. But let’s for sake of simplicity, let’s take the grouping out and try to run this. So here, if I order by my order ID, you can see that I have 1234567, so on and so forth. Now I can show you what that looks like without it. And you can also see here, right that my order ID actually kind of ends up coming out the same. So I didn’t really change much there. This kind of highlights the differences here. But my order here is not guaranteed. Like for example, assuming that, you know my orders are entered in canonical ordering. So order one is first and order, you know XYZ is next. But if I actually end up going back, and let’s say deleting a couple or modifying a couple, let’s say this wasn’t supposed to be order for this is order 20. Well, the ordering then is still is now not guaranteed. So even though in this situation here, my order doesn’t actually change in my results. Adding the ORDER BY clause will actually guarantee the ordering of my results rather than betting on chance.

Now we can actually take out the A S C here, and it gives me the same ordering as I mentioned just a little bit ago. ASC is the default ordering. But we can actually we can order in descending order as well. So we can get the last first instead of the opposite way. We can also let’s say we want to also order by multiple things. So let’s switch this to order date. And let’s order in ascending order there. And then I want to do O dot two customer ID and run that. So this becomes a little bit more powerful right I think can order multiple columns, and I can actually switch this up as well. I can order in descending order and one and ascending order and the other. So it does not matter how, which one is which, necessarily, so you can order ascending or descending on multiple columns. Let’s go back real quick and talk about the processing order here. For these as well, we have seen most of the major parts of our select class, right, so the most common things that you’ll actually see a SELECT query, we’ve seen most of them now. So the standard processing order is not again, it’s not top to bottom, even though we’ll write our queries top to bottom, so select from where group by having order by, we are not going to the sequel is not going to actually be processed in that exact same order.

So from clause is first. So we pick our data source, the tables that we’re pulling from, then we can filter the rows out of those tables. So we can do a first pass of filtering, again, the where clauses single row by row, then we can group those rows together based off of some expression. So group by color, for example, or group by order year. And then we can filter the groups using the having clause which filters, which enables us to use aggregates in our filters. So counts, average sums, things like that, then we the selection, or the SELECT clause happens after that. So we can we pick the columns that we want. And again, this is the projection operation. So all of the things that we actually have from one through four here, there so far are the are the selection operation, as far as sets goes, and then we do the projection, so we pick which columns we want. And then we order, right, even though order by is listed after that, it is actually in this situation, the last thing that is executed, but we do not order before we actually pick the columns, because otherwise, again, if we if we think about this, as far as efficiency goes, there’s no sense in ordering more things than what we actually need to. So we pick the columns that we want. And then we can pick which columns we want to actually order.

But let’s take a look at a couple of examples of things that don’t quite work as far as our processing order is concerned. So again, these are going to be larger queries. What kind of query do we actually have going on here. So we are selecting the order year order month and order counts from the Sales Orders Table, where the year or the order date is between 2015 January, one, one, and January one one of 2017. So I’m giving giving me all of the orders between 2015 are that are in 2015 and 2016. Right? Excluding all orders in 2017, then we’re going to group by the year and month having a count more than 1000. And then we’re going to order by the year and then order by the month. So again, here, what we’re essentially doing is given me the total number of orders for each month, between 2015 and 2017. So if we actually run this here, there we go. So we can see here, here’s all of the orders for 2015. So we go from January all the way to December. And it looks like all of the months there and 2015 actually had a order count that was more than 1000. And we can scroll down here and look at 2016 2016 wasn’t such a great year we only had five months where we had more than 1000 orders.

But also notice here that it again it’s ordered by year and then ordered by month so when when we have multiple columns and our ORDER BY clause, it is sorted in order of from left to right so the year First here and then the month. But even though it is not executed SELECT FROM WHERE group by having order by the order that we actually put them here actually does matter. So if I cut out the from clause, for example, and put this first, you’ll see that it gives me incorrect syntax. Even though logically, the from clause is executed first, syntactically, it does not come first, syntactically, we put the SELECT clause first. So let’s undo that here. So order of which we actually write our query matters, we do not write it in logical processing order, we write it syntactically and this order, but when the when the query actually executes, logically, the from clause happens first, in this case, then the where clause, then the group by then the having, and then we jump all the way back up, we pick out our columns, and then we order them. Okay. So really, the highlight of this is that the position of each element is mandatory, right, the order that the elements are actually listed inside of the query is required by the syntax of SQL.

But the logical processing of each of those statements is different. So let’s try another example here. Do actually, let’s modify this example here real quick. What happens if I try to you can I use aliases here and the ORDER BY clause? Well, since the ORDER BY clause happen is logically executed after the Select I can write. So aliases are perfectly valid to be used. The aliases that are declared in like the SELECT clause, and the from clause can be used anywhere after that, after that statement has been logically executed. Right. So, for example, in the front, since the front clause is logically executed, first, I can use the alias for the tables in all statements that are executed after the from clause. Likewise, with the SELECT clause, on my order year order month order count, I can actually use those down here. And my order by because the order by happens after the select. So let’s do order year here. And you can see that it even comes up in IntelliSense there, and then let’s replace this with or run that. And we get the same results as same results out. So let us look at this last query here for the segment.

What’s wrong here? Well, if we read it top down, we’re selecting order ID order date, customer ID, the year of the order as ordered the order year, from the Sales Orders Table, where the order year is 2016. And then we order by the order year. So what’s wrong with this? Well, if we run it, we get an invalid column name order year. Well, as we showed in the previous example, the aliases only work in statements that are logically executed after the statement where the alias is actually defined. So here, my order year alias is defined in the SELECT clause, but my my WHERE clause is executed logically before the SELECT clause, right? So we go from where select, then order by, so we can use order year here, but we cannot use order by order year in the where clause. So instead of that, we have to actually use the year function here. And we can only use the columns that are provided for us through the from clause because that’s the only thing that the wearer is actually aware of. So we do your order date there, and then this query will execute. But that’s just a little bit of a Just a couple of examples of why this processing order matters right. So, syntactically, our SQL statements are programmed or listed in this particular order. So select from where group by have an order by, but logically they are processed from where group by having select and then order by

DISTINCT

YouTube Video

Video Transcription

Welcome back, everyone. So let’s continue diving into adding more and more elements into our single table queries. In this video, we’re going to cover the distinct qualifier. So distinct allows us to guarantee uniqueness and the things that actually come out and are result of our query. So by default, we select all things. So all of the values inside of each of the columns that we actually pick. Now, when we use distinct, each of the columns that we specify to be distinct, we verify that those records are unique. So if we have five color, five rows that have the color blue, but if we only want the distinct records, those records that have duplicates will be removed from our result. So like aggregates, all is the default of our select, if distinct, is not specified. So when we do select orders, select star from orders, or select whatever from whatever table source that you have, by default, we’re pulling all of those records out. But if we add distinct in there, we actually guarantee you that each record is unique. So when we add that distinct, each of the tuples or, or the resulting set, it has only unique values.

But let’s take a look at some examples. Because distinct allows us to do some more interesting things, especially when we deal with aggregates distinct by default, our query looks exactly like this, right? So this top query here is equivalent to saying select all. So both of these queries here actually come out with the exact same results. So if we look down here at the two tables, or the two results that come out, there are going to be 100% identical, because all is the default operation. But what happens when we actually change let’s take out one of these here. We see that so here we have a whole bunch of like each year, we have multiple years, right, but if I change this to be distinct. and execute that happens here. So now, actually, let’s go let me actually pull want to show what happens show all as well here, because our records there we go. And then I’m actually going to also add an order O dot customer ID. Let’s make that a capital O. Now let’s run this. There we go. So if we look at our results, here, you can actually see the resulting effect of the distinct keyword.

So in this first set is my distinct query, right. So just select distinct records that have for the columns year, order year, and the customer ID from our resulting set. So we have for 2013, it’s only going to show each customer once and only once. Although year comes out multiple times, but distinct does not apply to an individual column here, it applies to the entire row. And this particular instance, now if we look at the original query down here, towards the bottom, you can see that for 2013 customer, one had made multiple orders and so they show up multiple times. So distinct removes those duplicates, and in this case, right, the duplicates don’t actually add any new information to the results of my query. So distinct here is actually very beneficial for interpreting The results that we actually have before when I showed order by most of the things that I was actually ordering by were in the order by was part of the SELECT clause.

So if I run this though, you’ll see the order ID is no longer part of my SELECT clause. But I am allowed to sort by things, or by columns that aren’t actually projected in the final results, which is kind of a useful thing. But if I actually add the distinct keyword here ah, that no longer works, right? ORDER BY items must appear in the select list if SELECT DISTINCT to specify, right? This is because when we actually try to, when we order by, right, we can’t actually order by a column that doesn’t exist in the SELECT clause for the distinct because not all records are actually included. So our ordering would actually be invalid. Because we don’t actually have all records there, we only have a unique set of records. That order by is actually picking from. So if you want to order by something, right, you can order by anything, is assuming that it’s in the actual table. So you can order by any column in your table if you’re selecting all records, but if you select distinct, you can only order by columns that exist in the SELECT clause.

TOP Results

YouTube Video

Video Transcription

Welcome back everyone. In this video, we are going to take a look at how we pick certain specify the number of rows that we actually want to return. And this is without the actual where filter and things like that all the even though, we can actually include that. So the first one that we’re going to talk about here is top. Now, top is not standard in SQL, you will only find it in certain database management systems. For example, SQL Server actually has the top keyword or top filter, I will talk about the anti standard here in just a minute. But top filters are rows based on order. So you can say, let’s say I want the top five rows by my results. And so based off of the ordering of the results, it will take the first x or n number of rows from it. So if I did top five, it gives me the top five rows. So you say top and then there’s, you specify the number or using an expression that you want to pick. And then you can specify whether you want the top in rows, or top in percent rows, right.

So if I say top five, but I could also say give me the top 5%. So depending on your use case, that would be an extremely useful bit of information. And we can also do ties as well. By default, it’s, well, I’ll actually show show the results here, what tarp does by default, but we can do with and without ties. So if there is, let’s say, I want the top two rows. But the third row is the same as the second the second row, we we can exclude or include that row. So as I mentioned, right top is non standard, and, and the following video here, I will actually cover the standard way of doing this particular filtering in your results of the query. But first, let’s take a look at how we run top. Let’s apply our top filter. Now, as I mentioned, when we were discussing tarp, remember that top is dependent on the ordering. So execution here is going to go from group by select and then if there is an ORDER BY clause, it would do the order by and then the top filter. So top happens last in this case. So let’s execute this, see what we get.

There we go. Alright, so what I am doing here is from sales from the sales orders table, grouped by the year, so give me all of the orders by year, and then give me the top two years in terms of order count. So our selection has the order year or account, first date and last state. But it’s going to essentially give us the two years that gave us the most orders, which is an extremely useful bit of information. But it doesn’t necessarily do that. At least that’s my intent here with the top two. If I take off the top two, and execute this query here again, we actually notice that the top two, so the the years with the most sales is actually 2014 and 2015. But when I add the top two here, you see my my default ordering is 2013 first 2016, then 2014 and 2015. And so that is the order that the rows come out by default. And so that’s what top depends on.

But if we want to be explicit, which is a very important thing to be, when you’re dealing with SQL queries, we actually need to expressly tell SQL what we actually want to order by so here we can order by our new year. Order. Count, right? I want the top the to the most The the most productive years and torn a most productive two years in terms of sales. So if I run this, ah, there we go, ooh, there we go, what happened here? Well remember, our ordering is ascending by default. So that didn’t really change anything. But if I do that, there we go, that’s the result that we want. So we have order year 15, and 14. So the most sales is 2015. And then the second most sales is 2014. And then if of course, if we change this number here to be, let’s say three, and run that, now I get 2013, which had 19,000, sale and 19,000 orders. Now, let’s explore this a little bit more. So by default, let’s run this.

Okay. So this gives me the top 10 orders with ties. So, top or sorry, not top 10 orders, the top 10 customers based off of their order count. So give me the most loyal customers, right the customers, the top 10 customers that made the most orders, essentially what we’re what’s going on here, very similar to the previous query that we did. But the new thing that I added here is the width ties. And so if we scroll down here, this is kind of important, because if we look here, 10, the 10th customer had 140 orders, but we had three more customers that also had 140 orders. So without the with ties specification here, I run this. So you know, see number 10 is 598 585 80. Right? See that? My my 10th customer, there is no longer 598. So if I add this back in and run this again, 10 is now 598. And we have 11, which is 580. So this really highlights the fact that top by itself is non deterministic, right.

So when there are ties, you’re not going to be guaranteed to get a specific row, right, because there when when the last row has ties, the sequel, the database management system has no idea which one to choose. And so it’s just going to pick one and go with that. And so the width ties will give you all of the things that tied with the final row, which makes things a little bit more deterministic, right. So it’ll give you all the things you won’t get to differentiating results between runs of the query. We can also tack on a percent to this as well. So let me take off the ties and give me the top 10% of the customers so 10% turns out to be quite a few customers as a result. So if we switch over here, we got 67 customers so 10%, the the top 10% of our order base, we have 67 customers and all of our order counts there. So the top 10 The percent is a is a useful feature, I find that I use it a little bit less than the raw number. But nonetheless, it is a useful feature to add to associate with the top function or the top filter. But that’ll be it for at least covering top for now. And the next video, we’ll be taking a look at the ancy standard offset fetch

OFFSET-FETCH

YouTube Video

Video Transcription

Welcome back everyone. So in this video, we are going to take a look at the offset fetch filter. So the offset fetch filter is the anti standard, equivalent to talk. So before we cover top, but top is going to only be specific to a certain set of database management systems. A SQL Server, for example, supports the top command. But not all, SQL is created equal. But offset fetch is standard. So you’ll find it in the majority of database languages that implement SQL. So that’s a benefit there. But the there is a little bit of a difference here. So top itself doesn’t actually support an offset. Although as offset fetch sounds like it does support that so we can offset by in number of rows. So if we want to skip a certain set certain number of rows, we can do that with the offset. And then we the fetch part is more similar to the top command, where we fetch n number of rows from our SQL results, right? Again, though, this is a filter that is determined that is based off of the ordering of our results.

So this happens after the order by if the order by is present in your query. Otherwise, the offset fetch and top filter the rows on the order that they rows actually appear in your database. Now, the syntax here can be kind of wonky to read. But more or less, the extra syntax here is primarily just for readability. So you see that we have row rows first, next, these are purely just for readability. And just give you some flexibility in writing your SQL. So for example, if you have, if I want to offset by only a single row, it’s kind of weird to read offset one rows. So you can actually right offset one row and first versus next. First and next are completely interchangeable. So fetch first 100 versus fetch next 100 does the exact same thing kind of just depends on the user’s preference of which one to actually use. But let’s take a look at a few examples of offset fetch. And we can kind of compare that to what top would do. So in this example, here, we have order ID order date, and customer ID as our columns, all from the sales orders table, and then we’re ordering by the order ID and fetching offsetting offset by zero, so we’re not skipping any rows, fetch the next 1000 rows only.

And so the next 1000 rows gives us the the next 1000, or the first 1000 customers are the customer sorry, the first 1000 orders. So if I replaced, again, if I replaced next, with first, functionally, those are identical as far as the results go. So depending on which one makes more sense for you, and you can use first or use next, both can be used interchangeably. Now, let’s go ahead and let’s say I wanted to skip the first five orders, for some reason, right? I can actually do an offset there. So you now see that my query, my query results actually start producing rows starting at order six, because we skipped the first five. Now, this functionality is not something that we can achieve with the top filter, top is top is able to do this. So I could just do top 1000. But I cannot achieve this functionality with the top command which is this, which is why offset fetch in some ways, can be a little bit more superior of a command to use. Now fetch and its own is optional, right fetch by its own is optional. So I can actually take out our fetch. And by the way, the dash dash is the document or the way you can document your code and SQL. So dash dash, that that text after the dash dash is ignored by the SQL compiler. But if we run this now, notice, we get, we get same query out, we start at order of six.

But now instead of getting only 1000 orders, I’m actually now pulling all of the orders after after order five, so order six and on instead of order six, to 1006. So that is the offset and fetch. And we will be using offset and fetch and top in a variety of ways as we start getting into some more complicated queries later in the course. But let’s do a quick review here of what we have covered so far. So we covered most of what we would see an enormous possible on a select statement. So we have SELECT FROM WHERE group by having order by offset and fetch. Now, inside of the select, we have distinct and top top in particular, is going to be unique, somewhat unique to Microsoft SQL Server, it is not standard, but everything else you see here is as part of the SQL standard. Now, as just to kind of drive home the processing order again, alright, our unlike our Python code or Java code, we don’t execute our query from top to bottom right, we are executing it from a logical processing order. So even though we are required, with the SQL syntax, to go select from where group by having and so on, we cannot change up the ordering there.

But it can be a little bit hard to get used to writing queries in this way, because the logical processing order is the order in which the data is actually utilized. So that really kind of kind of impact your results a lot. Depending on which statement actually gets executed first, this will become even more apparent the need when we start covering how we join tables together. So from happens first, just as a review, then our WHERE clause, so we select the data first, so the data source, so which table or tables that we want, then we can filter the rows out of that table that we don’t want, we can optionally group those rows together by some condition, then we can actually filter the groups and remember, the where filter is row by row the having filter is group by group, and they cannot be used interchangeably. Then after our having clause, our select clause will be processed. And along with the SELECT clause, the distinct clarifier will actually be executed along with that, because remember, by default, all is all is the all is the behavior of this the default behavior.

So all rows will be included, unless we specify that we only want distinct rows. Then, after the sinks, we order so once we have projected the columns that we are actually going to see in our results, then we can order our results by certain columns. The benefit here in the order by is that we can actually utilize the column aliases that we we define inside of the SELECT clause, and then top or offset fetch. Now offset fetch is not a separate operation. Offset fetch happens alongside the order by so order and then offset fetch happen at the same time. It’s part of the same clause. But top is a completely logically separate clause that actually gets executed differently. After completely after what the order by the ordering still, if you have offset fetch, the ordering still happens first. But logically, it’s all in the same actual part of the execution. And then I’m not going to cover this too terribly much. But you can see all of the, but you can see all of the syntax that we’ve covered so far with our basic single table queries, and we’ll be utilizing this syntax throughout, really a lot of the course We are only going to introduce more and more to this especially where we have from groupbuy where we can join and select from multiple tables grouped by multiple things and actually introduce more complex expressions