Union

Video Transcription

Welcome back everyone. And in this video, we’re going to be taking a look at our first set operator union. So the syntax looks very similar to the generic syntax that I’ve shown so far. We have an input query on the left union with an optional all a qualifier. And the right hand side was, which is the input query to add an optional ORDER BY clause. So as remembering as well, the input query, and the two input queries will execute in full and their normal processing order before the union starts to happen. So let’s talk about what union actually contains, the union of two sets is going to create a set containing all members from A and B. So A union B is take everything from a and combine it with B, and then we come and then we end up with one set with everything from both. So what does that mean with our actual operations and databases? Well, in databases, it’s a little more confusing overall. So the mathematical way of representing this is we have A union B here. So the U is the actual relational set operator in algebra. But in databases, we actually have two union operators, we have union all and union distinct. So union distinct is going to be the default. So let’s, this is the default. So if the all keyword is not specified, the distinct union is going to be done. So the actual end result here is going to be the same. So when we union A with union B, we actually get all of a, all of B, and union all gets all of a all of B, and then all of the bits that they share together. So that’s what my Venn diagram here represents. And you should be a little bit familiar with what this Venn diagram looks like. We, we’ve shown this style of Venn diagrams when we were talking about Boolean operators, in CC 110. So the primary difference between the two here is that distinct, does very similar things that you would do a select distinct for. And so union all can actually have duplicate values. So if you have 123, and set A and 123, and set B, then we have double one, two threes, and the result, so we have 123123.

And our final result, if we have 123123 with with a b, then the duplicates actually get removed, and so you’re left with only one of each. So if let’s say here we have 123, and 345. Union, right, this is going to result in a set, that is with all 123345. And again, the ordering is not guaranteed, right, because sets are unordered. So the results can differ depending on the original ordering of the set, or the operation that’s actually done. So let’s do a little wiggle line here. And then over here, let’s do the same exact set operation 123. Union, then we have 345. This is going to result in a set that is 123. Or by this is the primary difference for our distinct union versus our all union so all will contain duplicates, if there are the same value in both the set A and set B. But in our union distinct if had duplicate values exists. So the same value exists in set A that exists in set B, then those those duplicate values are removed and we’re only left with a unique set. Not all of the set operators that will come Today we’ll have the distinct and all options. So most of them will actually have one or the other. But union is a unique situation where we actually have the union all and union distinct. But let’s take a look at a few examples of SQL running the union set operator. Alright, so let’s take a look at this example. Here, we have just a simple query here, where we are are actually selecting the email address from our customer table joined with the people table and trying to figure out who is the what are the emails for the people who are primary contact persons, and what are the emails for the alternate contact persons.

And so if I run each of these separately, let me connect to the database here, we get all of the email addresses as a result here. And we have 663, email email addresses, because there are 663 customers, right. And then the people who serve as alternate contacts, we can check that as well. So we have 402 rows of alternate contacts. So just over just a little over 1000 email addresses, overall. So let’s take a look at the result of actually doing a union here. So this is our input query. One, this is input query, sorry, this is input query two. And then we end on an ORDER BY, and we end on an order by. So that’s the kind of weird operation here because this looks like this is this looks like the order by actually belongs to this query here. But it actually belongs to the result of our union. Alright, so this order by actually comes after the union actually finishes, so the, this query executes, this query executes. And then the results of those two queries are then are then applied the union operator, so then we get the result from our union operator that is then ordered. So input query, then input input, query one input query to union, and then order by so let’s go ahead and give this a run. There we go. And we have a little under 1000 rows, so 984 rows here as a result, so that our that’s all of the email addresses of the primary contacts and the alternate contacts. But notice that it is a little less than 1000. Right? We had, we should have over 1000 email addresses.

Well, the default behavior here is union distinct, right? So if I don’t include so distinct here, we don’t actually have to, we don’t actually write distinct here. And actually, if you try to write distinct here, it’s going to give you a syntax error. So we don’t want to put that there. But that is the default behavior, right? Union distinct is implied if the all key word is not. So if we do union all we get the duplicates. So if we do union all so we have 984 rows with Union distinct, and then if we run this again, we get a Yeah, 1065, for union all so that means we have 80 ish, 80 ish emails that are duplicates, that means that there’s about 80 people who serve as both the primary contact person and the alternate contact person, at least for the email addresses. So that is, the is the essence of a union operator, I actually find union pretty useful if you just want to combine the results of two different queries. It’s actually a quite expressive way of doing so. You can like this particular query here, we’ve done this, but this operation before, we can actually rewrite this using joins, and we could also rewrite this using sub queries and CDs and all sorts of other things. But this is just one other way of actually achieving the same result. And it’s not necessarily more or less efficient. Sometimes, it can be more efficient to do a join here because he In this case, I’m actually having to pull all of the customers twice. And so union between these two is a little less efficient because of that fact. But in some scenarios, you’ll actually find it very difficult to write, rewrite a union operation as a join. So sometimes this actually is impossible to do. So the set operators would be the preferred way to go in those scenarios. But just remember here as well, we have the difference between union distinct which is the default behavior or union all if you actually specify the all key word here next to the union. That will do it for our union set operator. But here Next, we’ll start talking about intersect and accept