Database Subclasses

Video Transcription

Welcome back everyone, In this video, we’re gonna be taking a look at subclasses. Now subclasses seem kind of an odd thing for databases. subclasses are an object oriented programming idea, right? So we have a parent class serves as our base class. And we have child classes that inherit, you know, attributes and behaviors from the parent. But we actually could represent similar ideas inside databases, typically, we’re going to have three different kinds of approaches here, the object oriented approach, which is really one, one direct mapping to what you would expect in like Java, or Python, where you have a table for each class or type. So you have a, a table for, let’s say, an animal, a table for a dog, and so on and so forth. You have things like nullable columns. So you can represent the class structure by just having columns that you can know out for certain scenarios. So, you know, you can have, you know, animal as your base class, and you have dog and cat as your sub classes. But a dog does not have all of the same properties as a cat. And so if the, if the animal that we’re representing is a dog, then the things that represent a cat are doled out, and vice versa. So that’s, that’s an approach. And then we have the ER style, which is probably the more complicated representation. And this is the true abstraction setup here where we have a base class or superclass that represents the base information. And then you have child classes that are connected to that table through relationship of some type. But let’s first take a look at the object oriented approach. So generally speaking, when we do this approach, all the common attributes are going to be in all types. So if you have an attribute that’s shared between classes in your data structures, then that common attribute is going to be a column in all of those tables in your database design. In the object oriented approach, we have no keys, or sorry, we have no foreign key. So there is no direct relationship between each of the tables. So we have no connection between the objects, that connection is represented using that that connection is used or created using those common attributes. So the common attributes are those connections between each of the subtypes. And then tuples are only inserted into the applicable type, meaning that there is no base type.

So we would have sale for dogs, cats, and things like that, we wouldn’t actually have an animal class or an animal table, we would just have dogs and cats, and those dogs and cats that have all the same attributes as an animal does. But only the unique attributes for dog and the unique attributes for cat. Sometimes you will need a general type. We can’t always get away get away completely from the base type. But most of the time, we can get away with that. But let’s show an example here. So here is a short example showing an employee part time employee and salary employee. So part time and salary employee are to two general types of employees. And here, we I ended up just adding a general employee, just in case with these, right, here’s our object oriented approach. And you can also see, let me actually pull out the highlighter here, the common fields, right, so employee ID, we have name, we have email, as well. The things that are unique between these are going to be the other other other fields, right pay rate, salary, vacation time, those are all unique, right? To each of these different types of employees. So we have salary and vacation time for salary employees, and then part time employees just have a pay rate their hourly, they don’t have a salary or vacation time. So this is again, a really interesting problem. Four at face value. This is a relatively quick and easy way to represent our employees. Right. One of the cons Of course, is that we have duplicate information. But that duplicate information is actually spread across each of these tables. So how do you prevent an employee from being in all three? Well, typically, you would have things like a check constraint again, right? You could use it when and when something is inserted, you can use a check constraint, right? And that check constraint can make sure that if an employee is salary, if they are not also part time. How do you prevent, make? Or how do you make sure that employee ID stays unique among all three tables? Well, generally speaking for employee IDs, employee ID, in all three cases can be done using a sequence object. So we have that one sequence object that keeps count, right?

That one single object that generates the employee IDs, for each for each of these that makes that unique across all three other general question here as well, what if What about email? Right? What if you added an email attribute that all employees that all all employee types should have, but must be unique across? All? Right, so what if I had an email here, but that email then could not show up here and cannot show up there. I don’t have a foreign key that represents that. And so again, this becomes a lot of heavy lifting with constraints, like check constraints, sequence objects, all those sorts of things to actually represent this. So I do see this object oriented approach to sub classes, and some scenarios in practice, but far less often, because you have all these additional constraints that are needed. Plus, you also have duplicate information, you have the potential for duplicate information stored across different tables. So what are the other better ways that we might be able to represent this? Well, not necessarily better ways, but different ways. So let’s take a look at a another representation of subclasses. So another representation, we can actually use just a single table approach. So within that single table, anything that is shared between our objects, so let’s bring back the dog cat example, right? dog and cats are both animals. And so though, the common attributes that all animals share, those would be non knowable. So those are required attributes as part of our table. But the things that make our dog a dog and, and cats a cat, those would be knowable, because obviously a dog would not have cat attributes, and a cat would not have dog attributes. So those would actually be knowable as part of our representation.

But let’s take a look at a general example here, using our employee. Now, I’ve collapsed all three of our all three of our tables. So we had the employee part time and salary employees all three different tables with the object oriented approach. In this case, we are doing one single table with all three, and the the common attributes, employee ID, name and email are now non nullable. Non nullable, and then the three nullable columns salary, pay rate and vacation time, were the things that made salary employees salary employees, and part time employees part time employees within you are UML diagrams, by the way, know, the Italian. If a column name is italicized, then that means that column is nullable. That is a standard UML representation for databases. But let’s take a look at this table a little bit more, right? How would I actually enforce that? If someone is part time that they don’t have a salary and don’t have vacation time? How would I enforce the correct columns or no or not? No, depending on which type of employee that we’re actually doing? Well, we could do a we could introduce a type with a check constraint. So let’s take a look at what that would look like. We could do this without a type, but it makes it a little bit easier to do Do a type here. So I would actually do a type inside here. So add a type, add type in as a new column. And so then if employee type is full time, then the salary must not be no vacation time must not be No, but pay rate must be no, because a salary employee does not have a pay rate because they have a salary. And then if the employee is part time, then they cannot have a salary or vacation time, but they must have a pay rate. And so this is the general approach that we’re going to take, if we want to represent a subclass type relationship. So employee types with salary part time, and we could have other types of employees as well with if we wanted to use only a single, single table, this works out for the most parts. And I typically like this approach a little bit better than the three tables approach because I don’t have duplicate data being introduced. And it’s much easier to keep track of things that way. But again, you have all of these Nolde columns. So data that is not actually fulfilled. So we don’t need salary and vacation time for a part time employee.

So that’s extra overhead added into this, as well. Typically, this will was this will work very well for small examples or small datasets, but start to fall flops, performance wise, for very large datasets. Because, again, we’re taking up extra space, we have extra columns that aren’t actually being used. And so it’s extra overhead that we don’t need, those check constraints can get expensive after being ran over, over and over over over the course of a time. So typically, a better approach to representing subclasses is the ER style. So typically, what what what we’ll have here is a single super type. So we will have an employee. Right? This is going to contain the primary key, right? And the common attributes, right, just so we’ll have the employee ID, name, email, all the things that all employees have there. And it will contain a record that represents all all employees will be here, right? All employees will be here, and then we’ll have each subtype, those subtypes will also contain the key, but the specific attributes that make that type it right. And so we have part time and salary over here. And so we actually have this representation. Right. And so now we’re actually starting to represent the true style of abstraction here, right. So, if we’re talking about you know, object oriented programming here, here, this is a superclass and these are sub classes. So, part time employees and salary employees are inheriting from the base class, the base class or superclass employee, but let’s take a look at this representation and a full example. So, here is a look at a much larger example with the employees all stretched out here. So I have employee salary employee and part time employee, employee here. This is this is my base class or superclass. This is a subclass. subclass. And notice that we have we have two keys now are so we have a primary key that’s employee ID.

And then we have employee type ID, which is a foreign key to employee type. And it’s also unique with the employee ID. So the same employee, right So Bob can only be one type of employee and Jill can only be one type of employee. And so we actually store the type of along with the base class, right, we store the subtype along with the base class. And that lets us know, right, that lets us enforce these checks down here, that lets us enforce these checks down here. And so we want to make sure for example, that we want to enforce mutual exclusivity, right. So if a an employee is a salaried employee, we want to make sure that they are not a part time employee as well. So they can be one or the other, but not both. And so to enforce that exclusivity, we need to keep track of the employee type and the employee table. So we can enforce that check constraint down here. So an employee cannot belong in the part time employee if their employee type is for example, not to. Okay, and then the foreign key. Alright, the foreign key employee ID actually will cover the, the information that is shared up here, right. So not only is the employee ID the primary key for salary and part time, but it’s also a foreign key back to the original base class. So this representation here, this ER diagram representation is my typical way of representing subclasses as part of databases, it makes, generally speaking, the most logical sense, but things like nullable columns, and the others are still beneficial to our representation here. And sometimes they mix and match we kind of combine some of the pros of one solution and, and the pros of another to make a better solution overall for whatever data we’re trying to represent here. But this representation here is going to be the closest you’re going to get as far as the traditional representation of a base class and subclasses in object oriented programming in terms of databases,