Chapter 7

Window Functions

Introduction to Window Functions

Video Transcription

Welcome back everyone. And in this video series, we’re going to be taking a look at window functions. So with window functions, we’re actually going to be applying a function or an expression over a subset of rows from your query. And so the subset is actually referred to as the window. So hence the term window functions because we’re feeding the that small window of rows that saw a small subset of rows, and applying a function on top of that to calculate some value. But up to this point, we can actually accomplish some of this functionality using things like group by and sub queries. But window functions do allow us some added flexibility and some added expression that we can’t get otherwise. So when comparing this to group by, we can like said we can accomplish similar results, except for the fact that with group by we actually lose some of the detail, meaning that when we actually do the groupings, the all the rows that belong to a group, all those details are actually lost, because the only values that are retained are the columns that belong as part of the group. And of course, we can add the aggregates, as well. But all of the other details, the individual row information is not included as part of the grouping. Then we can also, with window functions, we can also calculate things across groups, were grouped by when we do aggregates, those aggregates belong to that group only. And so we can’t do an aggregate or calculations between groups where with window functions, we can achieve that functionality, similar ideas, similar story that we achieve with sub queries. But the primary benefit here with window functions with sub queries is that all of the the actual query that the underlying query, so in this case, like the outer query, all of those filters and rows are preserved. And so we don’t actually have to, to, to actually achieve some of the similar functionality. With sub queries versus window functions, we actually have to duplicate our sub query.

And so we have, we have to have multiple polls of the tables, multiple joins, multiple filters, and all those sorts of things to actually achieve the same result, as a window function, where the window function, we only have to pull those things exactly once. And I’ll show an example of these here in a few minutes. But nonetheless, we get a lot of benefit from window functions, reducing the amount of sequel that we need to achieve certain tasks, and overall making our query a little bit more efficient as a result. So also with window functions, ordering is kind of weird, in the sense that ordering within a window function is not necessarily going to be the ordering, and the overall results of the query. Meaning so if you order by a specific column, outside and then in the primary query, and then when the when you do the window function, certain window functions allow you to order within the window itself. So within that subset of rows, but the ordering of the subset of rows is not going to be you know, that same column that you order those subsets by the results of the query won’t necessarily be ordered by that same set. So, or the, or that same column. So order by is a little weird with window functions. And I can mention as well, only certain window functions allow ordering. But we’ll talk about those here in a few. So inside of a SQL query, what does a window function contain? So the syntax of a window function is going to typically have your window function over and then these three parts. So our over clause is going to be the really big keyword here as part of our what kind of helps define our window function. But this over clause contains three different parts partitioning, ordering and framing. So the partitioning is actually going to be what defines our window.

So how do we group our subsets of rows? So are we going to partition by for example, customers or order IDs? What is the condition that we’re actually going to group those rows into? ordering of course, is the ordering of the rows within the window. So the ordering here is not the overall ordering of the results of my query, but it’s going to be the ordering with In the actual window itself, and then within a window, we have frames. And so let’s say we have a window that has 10 rows as part of it. Well, we can frame that window such that it excludes a certain number of rows. So we can actually specify a range of rows inside of that window, that that the function the window function is actually applied to. So for example, if we want to skip the first and last row of our window, we can do that with framing. I’ll show some, I’ll show some examples of what this looks like here in a few minutes as well. As far as window functions go over all we have three different categories of window functions that will utilize ranking window functions, analytic window functions, and aggregate window functions. So ranking window functions, sound as they are, we can actually ranking functions or going to more or less number our rows.

So let’s say we wanted to rank all of our salespeople by month. All right, so let’s say I, you know, Bob, Jill and Jane are salespeople, and who came out on top every month. So in order of sales, right, so maybe Bob was first in January, and then Bill and then maybe and February, Jane was first and then Bob. So that’s what we can achieve with the ranking function. And there’s a couple different types of ranks with that will show case in the following videos and some examples, analytic window functions, this is primarily going to be a form of an offset. So we’ve done offset fetch so far, right, which simulates kind of like a top 10 sort of thing. But we can also do offsets with inside window functions and subsets so we can get, let’s say, we can lead or lag so we can get the previous row or next row, that sort of thing very similar to what we did with the offset fetch. aggregates are very sound very similar to what we’ve actually achieved so far with group by, so sums counts, that sort of thing. Those are going to be the kind of functions that we can apply using aggregate window functions. But nonetheless, we’ll cover partitioning, ordering and framing and the three different types of window functions in the following videos.

Rank Window Functions

YouTube Video

Video Transcription

Welcome back everyone, In this video, we’re going to be taking a look at rank window functions. So this is the first out of the three types of window functions that we’ll be covering. And you can see the syntax here, where we have our general function. So we have our function here over. And we have three different options that we’ll we can include as part of our window, our of our over clause. So partition, which is going to define how we group our window, our rows into our windows, order by is the ordering of the rows within those windows. And then we have the rows clause, which is the framing portion here. And so that’s how we define what rows inside of the window we actually include, as part of that we include to apply to our window function. So in our case, what we’re going to focus on in this video here is what kind of functions that we can apply here on the left hand side of the over clause. So for ranking window functions, we have four primary functions that we’ll utilize here, we have rank, DENSE RANK, row number and n tile. So rank and dense rake, operate mostly the same. So it’ll rank the rows within a window. So starting out, typically, like 1234, with with the rows inside, now, rank is a true ranking system. And then if there is a tie, the rank, the numbering system actually skips the number. So if we have a tie, for third place, for example, will it’ll go 123. And let’s say there’s a two way tie for three, it’ll skip four, and then go to five, and also show an example here in a few minutes. DENSE RANK is a little bit different in that matter, where DENSE RANK is actually not going to skip those numbers. And so with DENSE RANK, you’ll actually see, three show up twice. For example, if we had a two way tie for three, where rank would actually skip the number, row number is going to just be a sequential number within side inside of the window. So rank is not necessarily going to be a straight 1234, depending on if we have time ties, for example, or depending on what we’re actually ranking on, right. So if we’re ranking on sales, right, the person who has, let’s say, the highest sales is going to go is going to be rank one, and the lowest sales is going to be the last rank, where as the row number isn’t necessarily going to be correlated to that particular value. So just a straight sequential numbering.

And then in tile is going to be doing a similar manner of of these, but we’re going to be assigning row numbers based off of a subdivision of n. So let’s say we wanted to do a quartile ranking. So and this may be for like a year, right, because we have quarters, and within each quarter, we want to analyze our sales. And so we can analyze our sales within every quarter by using in tile of four. So subdivides, our our window and two, or main chunks, or our windows into four main chunks. And then along with our ranking functions here, the over clause is going to have a little bit of different functionality here. So partitioning is supported. Ordering is actually required here in this situation. Because if you think about rank, DENSE RANK, row, numbered and tile, all of these will have different results depending on the ordering of the rows within each of the windows. And so without that ordering, this is non determinant. And so we want to make sure that those are ordered in order to get a consistent result here. So ordering is required. But framing is allowed, but he relevant in this situation. And now I can show an example of how that works. So let’s look at a few things here. But first off, I actually do want to take a brief moment here to show the benefit of why we actually include a window function versus No window function. So in this little example here, I just have, I mean, connect to a rope. So now we’re connected to our database. But here is just, you know, grouping orders and getting the order totals, right. We’ve done this query before already. But what if I wanted to also include the lot the actual lines from the order, so all the order line information, so we can calculate a line total for each item that was actually ordered. And so that becomes a little bit more of a difficult query. But we can achieve that with a sub query. More specifically, we can achieve that with multiple sub queries. So this becomes a little bit more chaotic.

So we can actually get within an order. So here is customer ID eight. Out here, here’s an order that has multiple lines. So we can actually calculate the line price. So how much did this particular item cost with the number of items that they purchased of it, the order total, and then the total for, for the customer itself. So we can actually start out by, we can expand what we can do with just the base grouping by adding sub queries, but with the sub queries, notice that I’m actually pulling from orders and order lines and multiple multiple cases. And so the sub query solution for this particular problem is not as not very efficient overall. But we can achieve better results or, or more efficient results using window functions. So with window functions, let’s go ahead and showcase our first example here. This is a simple ranking function or an example of utilizing a simple rank window function. So we have a order total CTE here. So let me go ahead and run that. So simply justice, just the exact same query that I had before, that just groups by the order ID and gets the total amount for that particular order. And then, with the ranking, what we actually do is, we pull the order ID and order total. And then we have four different window functions here. So we have row number, rank, DENSE RANK, and quartile. And just this is just primarily to show you how the how each of these ranking functions work. So let me expand that real quick. Cool. Alright. So over here, and I’m actually going to highlight a couple of examples. So as we start to go on we row number is purely sequential right? Row number is going to start at one and then increase as we go from our table results. Rank is going to increase as we go as well, an order of our rows that we have here. So order total. And it happens to be the same same ordering as our resulting query, because I’m ordering by order total here and ordering by order total here, just as an example. And I’m not actually partitioning here yet, either. So keep that in mind, I just have the window function applied over all rows. So since I’m not partitioning, my window is the entire result of the query, right? So that that’s a something that we can achieve and do with this window function. So partitioning not required but is an optional feature that we can add in here. But nonetheless, let’s keep on going here. DENSE RANK. You can see here it is, apparently, so far, the same as rank and row number. And quartile is all one here.

Now if we look over here in our messages, we can see that we have 96 rows, and since we are doing in tile of four so we’re doing core tiles that means roughly speaking, since we have 96 rows, roughly speaking to one every 2423 24 ish rows, our quartile will increase. So if we scroll down here, we can see that here’s our separation of our quartiles. So row number 24. And row number 25. Or number 25. This is where we increase into quartile number two. So the first 24 rows, and so on and so forth. And so that’ll be the similar issue, similar case for our third and fourth quartiles. But I want to skip down to rows number 40, and 41. Particularly, because here is where we have our first tie. So our order total here is 19 $1,944. And so our row number is still sequential. So regardless of the tie, our row number keeps on increasing within our window. And remember, here, we’re not partitioning so our window is the entire result set of the query. But with our rank and DENSE RANK, you can see that both are listed as 40. So when we have a tie, the ranking or the rankings are going to be the same. But the important difference is what happens in the in the row after. So after that tie is broken. So we have a different order total. Notice the difference between rank and DENSE RANK now. So with rank, we actually skipped 41, because there was a tie here. So there was one tie. So 40, and this would be the 41 row. But since this is a tie, it actually skips 40, the rank 41 and goes straight to 42. DENSE RANK, however, does not skip numbers. So if there is a tie, it still does 4040. But then in the next order total, we actually just increase to the next rank, which would be 41. Instead of skipping the number of ranks that were actually tied for here, we can actually go and see another example down here for rows 47 through 49. So you can see here, same order, total 4748 49 row number, then rank and DENSE RANK are all the same. So 4747 47. And remember, we skipped, we skipped a rank. And so the rank here is one higher than DENSE RANK, because DENSE RANK did not skip a number.

But you see here now, rank increases by three in the next order total, because we have a three way tie here. So we skip three numbers, we go up a couple of numbers before when we increase to our next rank, whereas here with a dense rank, we just increase by one because we don’t skip the ties. That’s the primary difference here between rank and DENSE RANK. And I quartile is relatively self explanatory, right where we’re chunking our windows into an N in number of groups. So we can do this by threes. We can do this by twos any in that is one or more. And then the quartiles will be numbered within within such general thought that comes to mind here. What comes first in processing order with window functions. So does the grouping or window function happen first? Well, window functions are going to primarily exist inside of our select clause for most of our use cases. And so our group by when we consider our CTE up here is actually going to execute this group is actually going to execute first before we get to our select clause. And so group by the CTE in general is not necessary here. Now, if we were utilizing this as part of if we’re utilizing this in with things like sub queries, CTE would make a little bit more sense here. But with our particular case, for this scenario, we can rewrite this same query without using the CTE. So here is that exact same one. But we just do our join orders, order lines here, and are grouped by and then we have our rankings up here. And remember, in this case, I’m using group by because my window function is being applied. applied across all rows, instead of partitioning, each partitioning the rows into smaller subsets, but we can use partitions, right, we can use partitions. So let’s go back to an example with our CTE here, because I’m going to change things up a little bit. So we have our group by again here, but now I’m actually going to group by the salesperson.

So we’re going to get the order, order date, and the salesperson ID in sales total. So if we actually run this real quick, so you can see the results of that salesperson ID order date, and then the amount of sales. So the for every order date, how much did each salesperson actually make in that day. So that’s what that CTE covers, and then our query down here, we’re applying a window function to actually calculate the rank of every salesperson for that particular day. So we can expand this here just a little bit. So here is all ranked by Okay, so here is here is our first window. Because we are actually partitioning by the order date. So our original CTE here is grouping. But then we can actually apply a partitioning over all of those groups, which makes things a lot more expressive, we can calculate more interesting bits of information apart from using group by by itself. So partition by order date, order by sales total, and then we’re going to rank. So rank is going to be applied over these windows, right for every window, rank the rows inside of it. So here’s our first window, our first order date, one 520 15. And then our sales total is ordered in descending order, and then our ranking as applied as such, right, so the person with the highest sales total is ranked one, all the way down to rank 10. This is different than the sequencing, right? Because the row numbering is a sequential sequential selection, so always 12345, so on and so forth.

Here, the rank is going to reset within each row or within each window function. So if we once we go down here to our second window, here’s our 10 salespeople. And we have or we have 10. Salespeople that sold things on the sixth, and so again, ordered our sales totals and then we have our salesperson ranks here. But that is how the partitioning is going to work here. So our rows are first. So if we think about the order of operations here, our rows are partitioned first, and then those partitions, those windows are ordered by the specified columns. And then the ranking function is then applied over that window. The third step of the over clause, if we had framing included here, the framing would actually reduce what rows within that window are actually included when applying the rank the function that we’re actually applying. So in this case rank. But if we do not include the framing clause, by default includes all rows within a window. And remember with with ranking type functions, ranking type window functions, framing is irrelevant. That concludes our examples on ranking window functions. So we have our four different ranking functions that we have here rank DENSE RANK, row number and tile over and then we can include we can do partitioning and ordering which ordering is required partitioning is optional. If partitioning is not included here, it applies the ranking function over all rows, and then framing in this case, is he relevant to our ranking functions because the ranks are going to apply to the entire window, regardless of the framing. Next video, we’re going to talk about our other types of window function

Aggregate Window Functions

YouTube Video

Video Transcription

Welcome back everyone. And in this video, we’re going to be taking a look at aggregate window functions, aggregate window functions are going to have similar syntax as previous window functions. So you have the function applied over and then we can specify and define our windows that the function is being applied over the window being defined using the partition by clause, order of the rows within that window. And then which rows within that window are actually where what the function is actually being applied to. So we have a variety of different aggregate window functions. Typical the typical aggregate functions that we used as part of our previous sequel experiences apply just the same. So all of the group, all the experience that we have writing group by statements, so max, min, average count, some, all those sorts of things will apply here with aggregate window functions, although distinct is one of the clauses that are qualifiers that is not allowed as part of an aggregate window function. Although you can do a simple workaround here that will will show so as part of our aggregate window function, what is actually supported. So using the over clause, we can actually utilize both all three partitioning, ordering and framing as a result. So none of these are we can have all of these, or actually none of them for aggregate window functions, which is a really powerful, expressive way of utilizing these, the use cases for aggregate window functions are probably more common when compared to the other types of window functions. Although I would probably rank, you know, Agria as being the most common, and then you have rank and then offset. But aggregate window functions are very useful for things like running totals, period totals or quartile, things like that. So monthly totals year to date totals, moving averages, all of those calculations that require subdividing our results and things that we apply our functions on to. But nonetheless, let’s take a look at some examples of these inaction. So I have a quite a large query up here. Our CTE that we have been using so far, and our window function examples is the same thing here, we just have our order total CTE. But I also have an additional column here that I did last time I have salesperson ID, along with the Order ID, but I don’t have customer here. Now, if I go back down here to my actual query that’s utilizing utilizing the CTE, we have a salesperson order ID order total.

And then we have a couple of different window functions. So we have count over. So total order count, count over and then salesperson order. Order counts, total sales in the salesperson sales. But notice here, I have a few of my over clauses that are empty, I don’t define a window at all. So no window and no framing, no ordering within there. And so when you have an empty over clause, the window that is actually included as part of that is actually going to include all rows, and it’s going to it’s not going to have any order as well, in order in that sense, especially when you’re doing things like count and sum and average, things like that order doesn’t necessarily matter. For the end result of the calculation, this line here, it would be just equivalent to just a normal count star when so select count star from table. So that’s essentially what we’re doing here with the empty overbuy. But I can do counts based off of a can do account and apply that to a specific window instead of just all rows. So if I want to define a window, remember I use the partition by and so count star over and then my window is being defined as the salesperson ID. So this counts the number of rows per salesperson, right? So partitioned by the salesperson ID. So this is the total number of orders that a salesperson has. So if we look at salesperson to salesperson to actually has two orders, and so they have a order count of two here, and then we have the total sales. So some Have all orders, over right all orders, meaning all rows because my window from my window is empty. And then we have a. And then we have the total of sales just for the salesperson, right, so total sales across all salespeople, and then total sales for just this salesperson.

So if we go down here and look at our first salesperson, salesperson two, we can see that the total sales across everybody doesn’t, doesn’t change, right? But the salesperson totals do change, right? We have 3307, for salesperson 25647 For salesperson three, and you can see that actually salesperson three has quite a few orders. So here is salesperson three, we have six rows, so salesperson order count is six, the total number of orders that we have across all of the salespeople in on 2016, one one is 47. And we have again, the total sales, which is all rows. So that doesn’t change between windows or between salespeople, but the salesperson sales actually does. Now, you may be thinking here, like what’s the point here, because a lot of these columns don’t actually change from row to row. So it doesn’t actually provide a lot of extra detail. But if you actually start to apply these, as part of other aggregate expressions, things become a little bit more expressive and more meaningful. So this is the identical query that I had before same CTE same slot clause. But now instead of having the just plain total sales, I’m actually converting that to percentages. So that makes it a little bit more interesting to compare to the total sales that were done. So here is my total sales, sorry, here’s the total sales, total sales over all orders. And then we can actually get the, then we can actually get the percent or the percent of this one particular order compared to over over all sales. And then we can also do that for the salesperson. So for salesperson to this is easiest to apply to their first order that they did here on that day, that first order was 90% of their daily sales for that particular date, then the second order was little over 9% of their sales. And so with this, converting it to a percentage actually starts to make a little bit more meaningful impact as the results of the query and stuff, reporting just the raw values.

But the point of this is that we can actually use window functions as part of other expressions as as as other calculations, particularly here as shown with our select clause, we can do this same thing as well with framing. So let’s take a look at example of that. This one not any different right here we’re calculating sales person running total running total, excluding the current row. And then the running total overall interesting thing here running is this one right here, because this framing, and this framing are not as different before, but notice that we have unbounded preceding and current row. We don’t have unbounded here again, because if we had unbounding here on the right hand side, then that would include all rows in my window from in my window beneath the current row. But I go everything before the current row including the current one. So that’s what we have there. Now here we have unbounded preceding and one preceding so all rows from the beginning of my window up to one preceding the current row. So since I’m doing one proceeding the current row it excludes the current row from the application of the function so the sum excludes the value of the current row total. And then here, with my last run, running total, this gives me all rows in the window function including the current row, but no rows after the current row. And that’s the important part here with these two here, these are not the default behavior, because the default behavior would include all rows in the window. But here, we only include up to the current row. And here we include are up to and including the current row. And then here we include up to the row, but excluding the current row. But let’s take a look at what this means for our values here. So it’s a little less meaningful here to show for.

So let me run this again, it’s a little less meaningful here to show on salesperson two, but we can go ahead and go through here, we still have order total, the first three columns are the same. But here we have the salesperson running total. So the first order is 3005. The second order is 302. So the running total is 3005, plus that order total, which gets us there. And that’s why also, this row is actually this value is actually no because there’s nothing up to that point. And so there’s nothing that we can include there, because it excludes the current row excludes the current row. Here, we include the current row and the running total. But notice that this running total, is unbounded, preceding unbounded preceding, but it continues to go right and continues to go. Because my window Ah, that’s one thing I didn’t actually note here, right? What’s the window here? Right, what’s the window here? Well, my first window, my first window was the salesperson ID, my second window for for my, my window for the salesperson running total is exclude and current is also based off of the salesperson ID, but just the running total. I’m not partitioning anything. And so this includes all rows. So this is kind of the neat thing with that we can kind of bring out with window functions, because we can apply this window function across groupings across Windows. And this is how it can be much more expressive than the traditional group by because with group by, we can’t do any calculations between groups very easily like this. So our running total just keeps as as you normally would calculate in something like Excel, we have a running total that just keeps on going up as we go down as orders keep getting added on. But the other two columns, you can see here with order, what salesperson three, the salesperson total, keeps going up and up and up and up until we hit salesperson six, and then it resets because the salesperson running total is restricted to just the that window which is defined by the salesperson. Same thing with the salesperson total salesperson running total excluding the current row. But this is just an example that is a little bit more expressive and a little bit more meaningful kind of showcases some more of the powerful things that you can achieve using the window functions, particularly with aggregates and playing around with how you define the windows. And how you define the framing order by the ordering here also does matter as well, right?

Because the order if depending on which rows are flipped, which way the running totals are going to be different for those rows. So just something to keep in mind as you’re working with these. But let’s take a look at our last example here. So our last example is going to do this, using this daily total CTE. So this is going to give me the total amount of sales per day across all order across our entire table. But the kind of reason is this in general, including the including the order year and order month doesn’t initially make a whole lot of sense, because I have the order date there already as it is. But it makes a little bit more sense as I go down here and to my main function, which is going to calculate month to date sales and year to date sales, including the total for that day. So this is the this is the primary two columns that I was kind of pulling verbatim from my CTE and then these two columns here are my window functions. So We have a sum over and sum over my first partition. This is something that we haven’t done before, we have so far, we’ve only defined our windows our partitions using one column. But now we can actually also define our windows using two columns, which is kind of neat. So, partition by order year and order month. So within a year, partition by month, right, and then here is just partitioned by year. So this is a year to date total. And this is a month to date total. So if we go down here, my total on month to date and year to date will all be the same. Until let’s see, here, we scroll down just enough here. There we go. So you can see where the windows actually stop, and the next window begins. So my first column, my first window function month to date, sales, increases, increases, increases and increases as we go until we get to the end of the month. When that window in so my first window stops here on 131 2013. And then you can see my month to date, sales, resets.

And so that the total the total and month to date sales for row 28 match because that’s when the window got reset into a new window. And our year to date continues, right, our year to date continues. Because now, my my window that I’ve defined for my year to date sales, actually just goes and goes all the way through 2013. So this highlights that I can actually have windows that overlap with each other. So I can have Windows within Windows or Windows that overlap. And that is also something that is a lot more expressive that we can achieve here with window functions that we can’t achieve. With group by because group by we’re not kind of, we can do groups within groups, if we do sub queries and things like that. But that becomes very complicated. And it’s really not very expressive. We can’t do a whole lot of things with sub queries in that sense. But here, when we can allow Windows to overlap with each other, we can do a lot of expressive things here, particularly around things like running totals and aggregate functions. But that is going to conclude our examples for aggregate window functions. We showcased a bunch of different aggregate functions being applied over a slew of different kinds of partitions, Windows and frames. And the lack thereof, right, we can apply these aggregate functions over over all rows, we can specify a particular we can specify a particular window. We also saw the framing and a more meaningful action here where we specified what that function is actually being applied over. So all rows within a window or rows up to the current row excluding the current row preceding that sort of thing. Said and like we showed here, the common use cases for aggregate functions primarily include things like running totals, and moving averages. And we did period running totals, where we show that we can actually overlap our windows that we apply our functions over, but this will conclude our video series on window functions.

Offset Window Functions

YouTube Video

Video Transcription

Welcome back everyone, In this video, we’re gonna be taking a look at offset window functions. So our syntax is very similar to other window functions. So we have our function that we are going to apply over our window that is defined here. And remember, we can define our window using a partition. Without the partition, our window is going to encompass our entire query set, then we can order the rows within our window. And then we can also define which rows we are pulling, which are which rows are being applied to within that window. So we are primarily going to cover four different offset window functions here, these are going to differ a little bit when compared to offset fetch, which primarily just focused on pulling a top five or top 10 rows off of our query set, or, you know starting five rows. And instead of starting with the first row, we can do similar things with offset window functions. But in this case, the offsets are being applied by window instead of the entire query set. Although we can achieve identical results if our window is the entire query set, but we have four different functions here lag lead first value last value, first and last sound just as they are they obtained the first value inside of my window or the last value inside of my window. And then we also have lag and lead lag is going to be a before the current row and lead is after the current row. So whatever we’re looking at, we can lag in number of rows behind or lead in number of rows ahead. And we can calculate things based off of that which can be really beneficial in terms of things like running totals, these window functions, just like the previous ones that we covered are applied using the over clause and the offset window function, we are able to partition so we can define our window. Ordering is actually required, again, similar to what we saw with rank. And framing is also supported. But framing is only supported for first value and last value, not for lag and lead.

Because lag lag could actually go outside of your your frame. So we don’t want to use framing inside lagging lead. Without further ado, let’s take a look at some examples. Because I think that makes things a little bit more clear, rather than just talking about things. So here is our first example. And in this example, we’re going to be taking a look at lag and lead. So remember, lag is going to be before the current row and lead is after our current row. So similar kind of thing that we had before we have a CTE that’s going to calculate every order total, along with pulling out the customer ID and order ID. And then down here, I actually have the query that applies my window functions. And I have quite a few different window functions here. So I have two lags and two leads. One, the first one here is lag one lag to lead one lead to so the parameters for the functions here for the window function on on lag and lead. The first one is the column that you want to pull or use as part of the apply to as part of the function and then the number of rows you want to pull that value from. So here, I’m going to pull the order total from one row before the current one. And so that’s what this previous value is. And then here I’m going to pull the order total two rows before the current row. And then lead does the same way except I’m pulling the pulling the value one row after and two rows after. And so the column that we use as part of the lead can be any column that you have as part of that you can put as part of the SELECT clause. Lagging lead can be a little bit confusing at first because we actually end up as a result with a lot of different null values.

So we’re actually partitioning our query results by our customer ID. So every window that we apply lead and lag on to is actually the same here. We could have different windows that we apply to each each window functions, why it’s very powerful and or expressive. But in this case, to make things easier to read and understand, I have the same window defined for each one. And the same ordering defined for each one, the only thing that’s different is which window function is being applied to. But if we partitioned by customer here, see that here. This is one window. Since we’re partitioning by my customer ID in all cases here, then that means this whole these, these first three rows is my first window. Now, if I look down inside here, we can see my order total. And then the results of each of my window functions. Now row one, you can see here has a no value for the previous value column. That’s because there is no previous row right row one, the first row here is the first row in my result set. And so there is no previous row to pull the order total from. And so that will be no same thing for previous two. But next value, and next to value. So this is 3092. That is this row here. And then next, next to value 188, is right here. So that’s where those two columns are the values are being actually pulled from, we can go down to the second row in this window. And so we have our previous value now works because we have a row to refer to. So 2225, here is pointing or pulling the value from the previous rows order total. Previous two is no because again, that’s all the way up here, and we don’t have two rows before me to actually compare to. And then we have next value, which is this row right here. And then next second, notice that doesn’t actually pull eight 980 97 here, that’s because that goes outside of my window. So the lead and lag applies only to the rows within the window that is being applied to. So if you run out of rows, either in the lag or the lead, then the value that actually gets pulled as a result of the window function will be no. And then last here for our last row and our window function, we have previous value in previous to value actually has, can actually pull value now because we have two rows. And then our next and next two values are both No, because this is the last window our last row in our window.

That’s the lag in lead, I’m not going to go through every every window here. But you kind of get the general gist here for lagging lead. And that can be any number of rows that can lag or lead. And you can kind of see the beginning benefits of this and creating unique columns to pull information from. But let’s look at a another example here for first value in last value. So same CTE that I had order total. But now instead of lead and lag, I have first value and last value. So just two window functions, I have these same partition. So this is the window that I’m defining. So my window is based off of the customer ID, I’m ordering based off of the Order ID and then my frame is a pretty long one here rows between unbounded preceding and unbounded following. So essentially, what this is, is the default behavior of our framing in all cases of window functions. So if I do not specify a frame, then my the rows that are included as part of my window are all rows. And this is just a long winded way of saying give me all rows within the window. You can say a number here. So rows between unbounded order like preceding one or following one, or two or three or so on and so forth, which essentially says cut off the the first or the first or last row or first or last in rows. I will include some more documentation on on this or the framing as part of the uploading canvas. But more or less I find myself specifying the framing less often than I actually Need to. But there are situations where you do need to expressly identify the frame when you don’t want all rows inside of the window. But unless, let’s go and give us a quick run, and you can see here is again, same windows that I had before, I have my first window here for customer one. And you can see the first value is the first order total. And the last value is the value in the last order for this customer. So 188. And those values stay the same throughout the entire window. Because first and last don’t actually change at all. But we could modify this to change the frame to exclude rows from this calculation as well. But let me just to showcase that this is truly the default behavior, just to show that this is the default behavior. Let me actually comment out one of these for the frame, and execute this again. And you can see that the results are identical to what I had before. So I included this in here just so you can see what framing can look like. And I’ll include all the different options and things that you can actually place as part of defining the frame and notes. But nonetheless, that will conclude our brief examples for lag lead first value and last value. Offset window functions are, I think a little bit simpler and easier to understand when compared to ranking and aggregate functions, but they can still be very useful and expressive in writing our SQL queries

Window Functions

Subsections of Window Functions

Introduction to Window Functions

Video Transcription

Rank Window Functions

Video Transcription

Aggregate Window Functions

Video Transcription

Offset Window Functions

Video Transcription