Correlated Subqueries

Video Transcription

Welcome back everyone. And in this video, we’re going to continue our discussion on sub queries. So previously, we learned that we could nest one query inside of another. And that led us to having outer queries and inner queries or sub queries, and the those results of the inner query then being passed back for processing and the outer query. And we also learned that with sub queries, we have two primary variations, self contained and correlated sub queries. With each of those having three potential types of returns single values, multiple values with one column or table valued, meaning that we have more than one column with zero more rows. Table valued sub queries are going to be reserved for another video. But what’s the difference between self contained and correlated sub queries? Well, remember that self contained sub queries are able to be executed in isolation, meaning that there’s no dependency between the outer query or the inner query with the outer query. Logically, self contained queries are executed only once, and the result is then passed back to the outer query for use. correlated sub queries, on the other hand, do have a dependency on the outer query. So the inner query here in case of a correlated sub query is going to have some reference to the outer query.

Usually, this is in form of an attribute or column reference to a table in the outer query. Logically, correlated sub queries are executed on a per row basis from the outer query. So for each of the rows inside of the outer query, the inner query gets executed. So correlated sub queries can be used for both scalar and multivalued queries. We learned last time though there are some quirks with self contained sub queries, and the differences between a single valued sub query or a single valued sub query and a multivalued sub query. With correlated queries. These are used in a variety of contexts. And they’re a little bit more difficult to rewrite when compared to a self contained sub query, for the most part anyways. So we can use this to usually do things like running aggregates versus the common use. And another benefit here is that it can be used in both the Select and where clauses. The common way to replace a correlated multivalued sub query is usually using the where exists clause, which uses two valued logic, unlike the in operator, remember, with Boolean values in SQL, we have that weirdness where we have true false and undefined. Well, the exists is yes or no, right? It does exist or doesn’t exist, exist operates on a slightly different context. It’s either it either exists, or it doesn’t. Unlike in where we have a possibility of, yes, it’s in this subset. No, it’s not in this subset, or I don’t know.

So, particularly in in situations where we’re dealing with no, but I think it’s a little bit more straightforward to showcase some examples. So let’s start taking a look at some correlated sub queries. So the query that I have up here on the screen was the last one that we covered in the previous video, which was a rewrite of a self contained sub query. So remember, we have the self contained sub query here, that pulls out the customer ID for for the orders were the order is on May 30 2015. And if I execute that, right there, we get all the customers who placed an order on that one particular day, and then we use the in clause to to incorporate that result back into our outer query. I could rewrite that though, using just an inner join, and then using the distinct qualifier, so we take out any duplicate customers. Now this is in contrast to a correlated sub query. Note with my with my sub query down here at the self contained, I have no reference to anything out in the outside in the outer query. My sub query as I did just a little bit ago, can be executed in itself by itself without any information from the outer query. But if I bring in a correlated sub query like this. So similar tasks that we tried to do before, similar to the first query we covered when we started talking about self contained sub queries. But here inside, I have the sub query where we have a select Max order ID, from sales orders where C O dot customer ID equals O dot customer ID.

So this is the trick, right? So this oh, well, where to table Oh, come from? Well, that’s all the way up here, and the outer query. So in our outer query, we have from sales dot orders, oh, where oh dot order ID equals the max order ID, where the customers actually match. So this is, you know, think of this as kind of like a join, right, we are joining the outer table, the outer query with the inner query using this predicate, essentially. So the order ID from the or the customer ID from the rows on the outer query are being fed into this inner query. And since this inner query here is executed, after our FROM clause, we actually have access to the tables that are available to us in the from clause in the outer query. So let’s go ahead and give this a run and see what happens. There we go. So logically, right, what happens is that my outer query starts first, of course, so from clause and our outer query, then our WHERE clause and the outer query executes. But the right hand side evaluation is done every single for every single row in this table here. Alright, so for every single row that is produced from the front clause in the outer query, the sub query, the correlated sub query in the where clause is executed for each time, correlated sub queries, in general, are more expensive to run, because of this fact, because they are ran against every single row, and in the Related sub and the related outer query that they’re associated with.

So do be careful when you’re executing this. But essentially, what we’re what we have here is, I am retrieving the last order for every customer. So what was the last order, so the biggest so order IDs are from smallest to the larger the order, the later the order was actually made. So this gives me the last order for every customer. Now let’s take a look at another example. But in this, but this time, we’re going to take a look at a correlated sub query that’s in our select clause instead of our WHERE clause. So this query is identical to the one that we previously executed, so we get the similar results. So let’s actually pull this one out. So just so we can see the differences here. So remember, our previous one, so customer ID, order ID, order date, salesperson, and this time, and the previous one, we actually attached the salesperson, and this one we are doing similar issue, similar, similar idea. But now we’re just doing customer ID, customer name. And last order ID similar, not quite identical. But up here we have the order date. So let’s take this back out, and go back to just this one here. So getting all the customers along with their order ID of their last order. But right now, we’re not filtering in the where clause, our last order is being pulled out of this correlated sub query. So select Max, order ID, from sales orders, where oh, customer ID, which is the inner query, equals see customer ID, which is the outer query. So I’m not actually using this to filter any rows out from the outer query, I’m just actually adding new information in that I don’t have any outer query.

So this is another common example that you may see using a correlated sub query. But let’s work on a new example here. So what if I wanted to show all customers with the number of orders placed in a particular year? And this sense we want customer ID customer name, and then we also want a new column called In order to account for that particular year, so I’m going to pick on 2015 here. So I’m going to go ahead and type this out. See? So we wanted Yeah, let me skip down here to the from clause real quick. So we have our table sales dot, customer, there’s Steve customer see. And then I want to see customer name. And then let’s also do C dot customer ID. And then we’re going to have a sub query here. And I’m going to put parentheses there for now. And I’m going to do as last. Sorry, let’s do 2015 Order count. And it’s probably going to yell at me for having a number there. So let’s put this in square brackets. There we go. Okay, so um, I also want to order by that as well. So make things a little bit easier to read order by, and then 2015. Order account, there we go. Okay, so how do I get off all of the customers with orders placed in 2015 and count the number of orders. So I only want customers that were in 25th, that placed an order in 2015. Now, of course, I could do a join here. But let’s do this with a sub query. So we have select, and now we’re trying to get the order count. So of course, we need to start with counts, right count star, I want to count all of the rows.

So let’s pull this down. So it’s a little bit easier to read. There we go. Okay, we want where do we get all of the orders. So that’s sales. Dot orders. There we go. Let’s go orders orders, oh, and then we have a where clause here that we need. So we want all of the orders that are between 2015, January, one, one and December 31. And we all but we first need to also we also need to consider linking the inner query to the outer query. So let’s go ahead and do the order date first, because that one is straightforward. ODOT order date is between and then we want only those in 2015. So 2015 Dash 101 Dash o one, and 25th teen three, one. Okay. So this gives me, let’s Well, let’s go ahead and give this a quick run. So what does this query give me as it is right here. So this gives me all of the how many orders that were between one 120 15, and 1231 2015. So all of the orders in 2015. So there’s a total of 23,329 orders in 2015. But we only want the count for that particular customer. Because if I ran this as it is that same value, this is a self contained sub query. And so that value is that sub query is executed once and is copied across all of the rows.

But what if we wanted to do just one individual customer? Well, we just add an and clause to our were here and link our inner query Oh, customer ID to the outer query see customer ID. If we run this now, of course, it is now correlated. So our correlated sub query has no knowledge. If I ran it in itself, right? We get an exception, right? Because it has, it depends on sales customer see, so this table here. And so since it has no knowledge of that, we have to run this run both of them together. But that’s the primary difference between self contained and correlated. But let’s give this a run. And ooh, now we got but we got some got some, some people who didn’t place any orders in 2015. And then we got some people that placed a few and then a bunch of people that place a lot of orders. So if we actually sort this descending, where you can see who had the most. So the top two were the tail spins and spins. And that’s just a quick walk through example of how we might complete that query. So again, this is all customers with the number of order or their with their numbers of orders placed in 2015. So I’ll have a few more examples of this for you to try on your own. But for now, that will stop for this video. And in our next video, we’ll talk about more correlated sub queries, but specifically multivalued correlated sub queries