Video Transcription
Welcome back everyone. And in this video, we’re going to be taking a look at some more complicated joins. So we’ve covered cross joins and Inner Joins so far. And remember, the inner join is just a cross join, but with a filter. But we haven’t really added a whole lot to these queries yet. So let’s take a look at a few different things that we can actually accomplish with them. In this video, we’re going to be talking a little bit about composite joints, where so far with our predicates that we included in our inner join us just enter join on one column. So our predicates are relatively simple. But we can actually have a little bit more complicated join there as well. So if we have table A and Table B, we can actually match those records on more than one column from each of the tables using Boolean operations. We can also join ourselves. So this sounds kind of weird at first, but there are a lot of good reasons where we would want actually want to join a table with itself in order to extract different kinds of relationships. And then we also have non equal joins, non equal joins are going to use an operator other than equality in inside of an inner join. So instead of matching, inner joining table a table B, where column one and Table A equals table, column one and Table B, maybe we have greater than or less than, or something like that. And then the last thing that we’re going to be working with today is going to be multi join queries. Now multi join queries are just as it sounds, right, we’re going to join more than two tables together. Now, we can actually mix and match joins as well. So we don’t just have to use all exclusively Inner Joins, or cross joins, or outer joins, which we’ll cover here later. But the big point here is that the join operations are executed from left to right.
So if we have from Table A, join table B, join table C, Table A gets joined with Table B first, and then the resulting table of that join then gets joined with table seat, and so on and so forth, we have more joins there. But we can also change that order up if we need to. So we can enforce order of operations, using parentheses just like what we do with math. So that makes things a lot, a lot nicer to actually work with. But without further ado, let’s go ahead and take a look at some examples of all of these types of join variations. Up here on my screen, I have a query that we’ve relatively seen before, right, we’re enter joining orders with the order lines, and then we’re getting the total sum of the sales are the order subtotal for each of the orders that were matched. And then we’re filtering out those orders, such that on those order on those orders and order dates, such that the price, the total subtotal is more than $25,000. So if we run that we see there’s 14 Records down here, all the different order IDs, order dates, and then we have the lines from the so how many items, how many line items were in this particular order, along with the total.
Now I can actually filter this even farther if we want to add another predicate, to our inner join. So for example, let’s give me only the orders where the order IDs match. And the items, each of the items that were actually picked, matched, or were picked on the same day that the order was completed, right. So this gives me only the orders such that all of the order if all the items from the order were fulfilled, the same day it was completed. So if we give this a run, sorry, there there yep, that’s much better. And so gives me a similar results, except now there’s only 12 rows instead of 14 because I filtered out a couple extra were the ordered all the entire order wasn’t fulfilled on the same day. But this is just an example of a composite join. Composite joins are really useful, especially when we have two tables that have a multi multi multiple columns that are actually joined on, so are related on. So if we have more than one foreign key, for example, between two tables, that can be a very useful composite join can be a very useful thing to leverage that type of relationship. But let’s then take a look at a self join. So here is a self join. A self join, as I mentioned before, is a join where we join on ourselves. So from sales customers, inner join on sales customers, where customer ID equals bill to customer ID. So this is going to give me all of the customers were who have only billed to themselves.
So for example, the person who the person who actually submitted the order, build it to themselves rather than going to a different company or a different customer. So this is really useful. This is a valid example to see what his self join, a useful self join may look like. Let’s take a look at another self join. Because again, self joins are kind of weird and awkward at first when you start getting on them. But this next one, we’re going to work with the people table as part of this database. Again, we’re still in the wide world importers database, but we have 1111 people inside the database as a whole. But let’s look for people who have shared email addresses, we want the names of the person and the one sharing the email. So whoever is actually sharing the email and the names of all those people who share, so we can join people on people. So from people pee, inner join on people SP on SP dot email address equals P dot email address, and then we order by the address. So let’s take a look at this here. So here we go. And you can actually see, since we’re ordering by the email address, and we joined on the email address, you can see all the different names associated with the same email address, which is pretty useful there. And so we have 1280. So almost almost 200 people who actually shared email addresses. Now, this, we can actually change this up a little bit more.
So very similar query, as I showed just a little bit ago. So we’re still doing a self join P, people p and r join people SP on email address. And P dot Person ID is less than SP dot Person ID. And then we’re grouping by email or ordering by email address here. So this is a composite join. Alright, composite being that I have more than one predicate that I’m joining on, it is a self join, because I’m joining on the on the table, the table A and Table B are the exact same table. And then I’m all I also have a non equal join because one of my predicates is a less than instead of an equal sign. So let’s give this a run and see what we get as a result. There we go. So Aha, so here are at five different rows. So essentially, what I get here is that the we’re kind of leveraging the person Id just a little bit. In general, the person that I’m sharing with is after that comes after the after myself, essentially, is what assuming that an email address is unique at first initially unique and then it can be shared with other people after the email address exists, which is why we’re using the less than for the person ID here. Because otherwise, we can actually end up matching to ourselves. So we can get duplicate there. But we have Abel, Abel spear spear, Leah and tarrazu being shared together. So, this is the person who the email address originated with and this is the person who they shared with and so vice versa right because we have you know, since this person existed in the database first because the person ID is smaller than that means this person shared with that person, which is the whole reason for this non equal join.
So the non the less than for the person ID those are self joins, composite joins. and non equal joints. So no new type of join, necessarily, it’s just a variation of Inner Joins there. But let’s take a look at a multi join query, very similar query that I had before, right, we have just our sub order subtotals, between orders on order lines, and grouping on the Order ID and order dates and having some greater than 25,000. So we just seen this query already before. But in general, when we’re working with multi joins, or joins in general, it’s important to focus on the join and the table structure at each time. So just as we mentioned in a previous video, you have to think about what the data looks like what a row looks like, after each logical processing phase. So what is the what does a row look like after the first join, then after the second join, what does a row look like after the from clause is finished, what does a row look like after the group by and so on, and so forth. So taking a pause to really break down and think about what rows and tables look like after each logical processing phase. And then that helps, in general, understand these more complex queries.
What if I wanted to add a, let’s say a customer name, to this. So for one, I would have to actually, of course, I either have to add this, I’d have to add, just add this to my group by if I wanted to add a customer name. So let’s, let’s go ahead and do that. Oh, so Well, I can’t do customer name here. Because remember, customer name is, well, we only have customer ID in the orders. So that means we actually have to add a another join in order to pull that in. Since we only have the actual order ID, we need more than what we actually have. So then let’s tack on here. Sales dot customers. That’s the table we want. Oops. And then we are going to do enter. Let’s go and take this down on the next slide, an inner join. And then that’s going to be inner joined into sales dot orders. And then we need a sales call as customers see. And then our predicate here is going to be on. And then we want see customer ID equals O dot customer ID. Aha. So there we go. So now we have, we’re joining all the customer information with the order first, right. So after that join, we have a table whose rows are all of the order information, and then all the customer information. And then that information then gets joined with the order lines. And so now we have all of that information paired with every single line from the order. And then let’s go ahead and change up.
Now let’s go and run this, we’ll run this real quick. Make sure everything is working. There we go. Oh, and we need to actually add the customer name here. So let’s put that here. CDOT, customer name and order date. Oh, yep. So remember, we can’t have something in our slept clause that is not in our group by unless it’s an aggregate. So we have to actually go down here and group by customer name. And there we go. So let’s give that a run. Ah, there we go. So still similar, still same number of rows as a result. But now we just have the customer name. And remember that this group by right, the grouping does not change. If I take out order date and customer name here, the groups are going to end up being the same because order ID is unique. So since order ID is unique to the individual order, all these all the information for each order gets grouped into the same row regardless. So that’s, that’s a benefit there. And we could change this up to group by just the customer name for example, as a result, so that’s that’s also something that we could actually look like look Let’s take a look at a slightly different query here. So this is going to join customers with orders and order lines, for all orders in 2015, grouping by customer ID, and we’re ordering by sales, so total sales there.
So let’s go ahead and give this a run. And now we’ve got 637 rows. So this is all customers who purchased something in 2015, and the total number of orders they had, and the total number of sales that they had. So just another example of how we can do multi joint query and extract more information just by joining in that customers table. But what if we only wanted the, maybe the top 25 customers or top 10 customers or something like that. So give me all the customers who bought the most things, and 2015 or something like that, right? So remember, we have top or offset and fetch those two commands to be very useful in that scenario. I did want to highlight again here with this distinct, we can add remember from our lecture on single table queries, and we talked about distinct, distinct can be used on a entire selection clause. So we can put distinct here right with the select. So all rows must be unique. But here, we can also add this inside of an aggregate function, where all of the things that are being included in aggregate must be unique. So we’re only counting unique order IDs, instead of counting duplicates. So because remember, if we don’t do that, then the order actually appears more than one time, because there could be more than one order line associated with each order.
So if I actually took this out, you can actually see the order count go up significantly. So this isn’t the order count. This is the number of order lines that the customer actually had. So that’s a number of items at the number of different items that the customer actually ordered. But let’s take a look at another example here. This is the same same exact query, except that you noticed that I’ve added in some parentheses, so sales customers, enter join sales orders, enter join order lines on customer ID. So really, the only change here is that I am first joining orders with order lines, and then I’m joining that with the Customers table. And in this particular example, the order does not matter does not actually affect the end result of the query. But as we get farther into the class, we’ll have some more complicated examples where the order absolutely does matter. And it does impact the end result of our query. That is even more important when we start talking about outer joins. Because those types of joins are dependent or the relationships are single direction not bi directional. And since inner joins are bi directional, the ordering in general does not matter as much. But when we start talking about outer joins, and later much later, when we start talking about things like sub queries, order will definitely start to matter a lot more. But that is going to be it for this video. So we’ve talked about self joins, composite joins, non equal joins and multi joint queries. So in the next video, we’ll be taking a look at outer joins our last join that we’ll be covering for this class.