Super Keys

Video Transcription

Welcome back everyone. In this video, we’re gonna continue our discussions on how we might design a good database. So last time we talked about third normal form Boyce Codd Normal Form, and how those normal forms can be defined using functional dependencies, and keys. And in particular, we’ve spent a lot of time talking about functional dependencies and what that entails. And we also defined how you might calculate the closure sets, and how those closure sets could help us find all functional dependencies, along with some other techniques for finding those as well. But in this video, we’re going to focus on the second part of what makes up our normal forms, particularly with keys. So we’ve talked about primary keys, and what that entails with our database. So if you remember, right, primary keys are a column or set of columns that help uniquely identify a row inside of a relation or table. And so there are I’m going to take that just a little bit further and talk about keys and super keys in particular. So what is a super key? So a super key is a set of attributes a one through a n, such that your, I can spell this out here says such that for any other attribute B, we have a one through a n implies B. And remember from our previous videos, the underline the arrow is the implies symbol, to avoid duplicates, a, it’s a very important notion in a database to refer to or have a key of some kind. And that key being a primary key, or in other words, a super key that helps identify all other columns, that in turn, helps uniquely identify any given row inside that relation. So I have another super key that I want to define here.

A minimal super key is a key that is a set of attributes, which is a super key, and has no sub set of attributes, that is also a super key, right? So meaning a key or a super key that has no super key or set of attributes inside of it, that is also a super key. So if we have attributes, you know, a one, a two, a three, a four implies B, so for a super key. But if you know a one and a two is also a super key, then that’s a one, a two, a three, a four is not a minimal super key. So we want typically, when we talk about primary keys, with database design, a primary key is going to be a minimal superkey. Or at least a good primary key will be a minimal superkey. Because we don’t, you know, if we don’t need the additional attributes to make it a primary key a key that uniquely identifies a row, then it’s it’s kind of a waste, right? Both in terms of design, which complicates your queries. And to in terms of efficiency, and, and storage. We’ll talk about later in this class about indexes, and how that impacts you know, your your, the speed of your query, and those keys are going to be indexed. And so if those keys are are larger, it’s going to increase the amount of time that it takes to actually index your tables. But indexing will be a talk for a later lecture.

But for now, let’s talk about let’s talk more about our super keys. So previously, we talked about how we could compute our closure sets, so x plus our closure sets for all attributes in our relation. But primary reason why I’m bringing this back here is that if the closure set of x is all attributes, then x is a super key, right? So if the closure set of you know student ID is all attributes meaning that this that you can imply all other columns with that attribute, then that column is a super key, or set of columns is a super key. And remember, we want only the minimal super keys in the end. So we will spend time to calculate all super keys, and then we want to reduce those super keys down to the minimal possible number of attributes that still maintains a super key property. So let’s take a look at an example of what this may look like. So here we have an example, relation called enrollment that has students address course room and time columns. And for the sake of example, let’s say we have these functional dependencies here. So we have student implies address room and time implies course. And then we have student course implies room and time. So with those functional dependencies, how would we calculate what keys we have here? Well, and and when I say keys, I mean minimal super keys, minimal superkey. So how do we calculate our super keys? Well, we may also have, we may have more than one key here as well, right?

That’s totally viable and reasonable as part of the table, there may be more than one key, or more than one possible key, we will want to only when we actually define our tables in Microsoft SQL Server, or whatever it may be, we want to make sure that the we only define will only have one primary key. So in this sense, we want to start out by trying to figure out what set of columns implies all others? Oh, let’s take, let’s start breaking this down. By by student here. So student implies address. So and what can we get from address? Well? Nothing really. So initially, we’re just going to have students as a as part of the key. But students alone isn’t enough, right? Because we can only imply address from students. How about adding, and address or address implies nothing? So address doesn’t want to be address shouldn’t be part of the key. So let’s say so we have student not address how about course here. So if we have course, and students, right, so student course, implies room and time, right. And then so let’s that adds to our key. And then room and time implies course. And so we get essentially everything that we need from from student and course. Right. But since since we have and I’m going to, let’s go here, I’ll go to my pen here. Since students room and time, right be highlighted attributes here are going to be our minimal superkey. But the attributes that I have underlined in red, there are going to be a another super key because with student stute implies address, so we got that. And then room and time implies course. And so all the other dependencies are all the other attributes are trivial, right, trivial functional dependencies. And so we get two super keys out of this relation, yeah, student room in time, and then student course.

And so really, typically, what we’ll want here is a set of attributes that are a minimal superkey that we want to use as our primary key, and in general, will typically want to go with the key that has the fewest number of attributes. And this is, in particular, it makes your queries a little bit easier to write because you have less columns to worry about to uniquely identify a row and improves your indexing because you have to index on fewer columns. What and so we’ll talk about that in a future course. But here you can see a little bit of an example that I use to show how you can go from your functional dependencies to calculating our keys. And remember, we want for our keys in our table, we want a minimal superkey not just a super key. So if we had if we had a super key. So for example, if I had course as part of my key here on the left. So if I had if I added course over here to student room and time, that would not be a minimal superkey because A student and course is also a super key so it is not minimal so we want to reduce that to make sure that our keys are minimal