Intersect

Video Transcription

Welcome back everyone. And in this video, we’re going to be taking a look at our next set operator intersect. So the the query syntax is very similar to what we’ve seen so far, we have an input query on the left and input query on the right the set operator in between, and an optional order by at the end. But do note that intersect does not have an intersect all option. So let’s take a look at what the intersect actually does. So the intersection between two sets results in the elements that are shared between the two sets. So the elements in B and the elements in A that are shared, that is what is left after the intersect actually fully happens. Now, as I mentioned, the intersect itself does not have an intersect all in T SQL, although we can do a little bit of a workaround here to get that to happen, if that is something that is needed. But let’s take a look at how this works with our Venn diagram. So intersect works very much like the end operator in Boolean algebra. So if we had a and b, we had the things that A and B shared. So for intersect A intersect B is going to result in a set such that the values and set that are in set A and in set B are returned. And we can actually also draw a Another similar example of we as we did with Union. So let’s get that going to have a set one, two, and three here, intersect a set that has three, four, and five, which will result in a set where that only has the numbers that are shared between sets A and B. So only three. All right, so we just did union here just a little bit ago. And let’s take a look at what this looks like with intersect. And I’ll be using the same exact, mostly the same exact query throughout all of my union examples just so it’s a little bit more consistent. So this is the exact same query that I ran with Union, except now we have intersect in between. And remember, the order of operations here is the input query one, followed by input query two, and then the intersect operation between the results of those two queries.

And then the ORDER BY THE ORDER BY is not part of the second query. So let’s go ahead and give this a run. Sorry, there we go. So we get our email addresses again, and you can see that we actually only have 26 rows now. So the intersection between customers who are the primary contact person, their emails that are the primary contact, and emails that are alternate contact, there are only 26 customers who have the same contact for both. So email addresses that are both the primary and alternate contacts. And again, we have done this already before using joints. But just kind of showing you an example of how intersect will work between the result of two queries. Now, as we mentioned earlier, the intersect operator does not have intersect all, but we can simulate the effect of intersect all so something like this here. Now, this example what Wide World databases are while Wide World importers database doesn’t actually work in full. So when you run this, you actually still get 26 rows. But it kind of highlights how this work round would work if we had data in our database that actually had the overlap. But primarily there is there is no the reason why there’s only 26 years because there isn’t an email that is used more than once for the primary and alternate, so it’s not there’s no duplicates here. So that is the primary reason there but we can actually look at the ending results here and showcase this so you remember the over clause, we are doing a window function right? Row number over.

And then here’s our window function write partition by email, and then order by. And then this is essentially going to select the, the first value as part of our window and then sort by that. So essentially, it’s sorting by our sorting by sorting our rows, sorting our numbers here, the row ID, the weird thing here with this, we can’t really order as we normally would, right. So that’s kind of why we have this, this, this order by here. So if we actually take this out, see here. And run this. Alright, we can actually run row number, so we’re actually going to order by here, so order by select zero, and give this a run. And what that select Order by select zero is going to let us do is actually order the window that we’re actually picking. So if we run just this individual query, this will actually be a little bit more informative. Here, we can actually already see some of the results here, if I scroll down just a little bit, rows six and seven, we have able, at example, comm here, and we have row ID one and two, you can kind of see how this query would actually end up helping us doing a intersect with duplicates, as a result. So if we wanted to, if we wanted if our second query here had that same email address, so if I actually, let’s go ahead and select this, run that and you see Abel doesn’t actually appear appear here, because Abel does isn’t actually used as an alternate contact email. So that’s why this intersect. Trick doesn’t actually work with this example, because there’s just no duplicates to join on or to include as part of the intersect all. But if we did have duplicates here, this trick with the row number would actually get as good as there, because it works here, you can actually see that we have emails that are used more than once there. But if we don’t include this window function, then the intersect ends up pulling that out, because it combines all the same email address into one. But that unique numbering between duplicate emails works pretty well.

And so if you find yourself in a situation where you need to do an intersect all this would be a trick to actually try, I can show the end result here, if we actually let me let me undo here. Let’s go back to this result. And I’m going to actually change the top and bottom to be the exact same query just to show what can happen with intersect all so here we have 623 rows with the intersect between these two queries. And so just one of these queries by itself, remember a 663 emails, so we have roughly 40 ish speaking that are duplicates. And so if we wanted to do that exact same thing, but do the intersect all, you know, I’m going to do the exact same thing as I did just a little bit ago. And I’m going to put the same query down here on the bottom. That way, when we run this, we actually get the duplicate emails. So now you see I have all 663 rows, and Abel gets actually put out here twice now. So the row the emails that are used twice as a primary contact. So just a little example of how we could get around the SQL limitation without the intersect all but again, by default, the intersect is just intersect distinct, and even in this case, this is still intersect distinct, I am just kind of exploiting the the intersect operator by making the duplicate emails not unique. So they get left in using the intersect operator. Of course, generally speaking, we wouldn’t actually want the row ID to be in the end result of our query here. So we could strip that out by using a CTE so we can actually use a set operator and inside of a look for a CTE as well. So, with intersect all CTE, our CTE there is accepted. Now that we just ran, and now we can run this and strip out the strip out the row ID not including that. We don’t want to strip that out inside here. Because without that row ID the intersect all wouldn’t work. So we want to leave that row ID for the actual set operator. And then if we don’t want it as returning that results to our users or whatever, we can strip that out by utilizing a CT and an additional query