Chapter 10

Design Patterns and Practices

General Database Design Practices

Video Transcription

Welcome back everyone. In this video, we’re going to be continuing our discussion on database design. And so to start out, we’re going to really focus on some general best practices when we talk about designing our tables apart from general database theory. So, first off table names should be nouns. And this sounds kind of odd at first, but when we actually start talking about the logical representation of a lot of our data, right, the we typically don’t always talk about our data in just nouns. So for example, we’ve talked about this, you know, albums being produced by certain artists before and naming that table. In conceptual model, right, and a conceptual model, an album is produced and produced, is a fairly well, good descriptor for an album. But as far as our database is concerned, when we talking about storing that information, produced album is a better representation. So it gives a lot more information and exactly what it is, right produces the action, not the thing. And so we want to name our data table our tables, after now, and that represents the data that’s being stored inside of it. Other general recommendations, same thing kind of goes for columns and our tables as well. So columns should be nounce, just like our tables. There are some general exceptions here, though, particularly when we have two valued columns. So this generally is referring to a true false yes, no type answer. This would be things like, if we have a user table, right, typically will have some indicator whether or not this user is active or inactive. Just in case like if a user is deleted, or decides to stop subscribing to our service, whatever it may be. And usually, we don’t really want to delete that information. And so we tag it with a column that is active or is removed, or something of that nature. And when we have this these types of columns, typically we describe it as the positive side, so is active instead of is inactive, or is removed versus is not removed.

Generally speaking, this is just a way that we typically think about these types of questions, we typically don’t think about a true false question on the negative, we typically refer to it as a positive side. And then, of course, it can be false. Other general recommendations that we have here, along with column names, they should not be redundant. So we want to try not to repeat the name of the table in the name of the column, unless it is a key. Okay, so we’re talking about things like if we have a car table, well, we would name the price of that car, the data that represents the price that car just price rather than car price, it may initially feel like you should name it car price is more information. As a whole, you think about when we referenced these columns inside of our queries will typically access those columns through the table itself. So we already have the information of what that price is associated with. So car dot price. So car car price is redundant. So try to keep the name of the table out of the name unless it is a key that is very useful for when we reference things from outside of the table, particularly when we do things like joints. So sometimes we can use different names for different references of the same thing. So in the second table, if we have like a foreign key or something like that, we can rename that foreign key to be more associated with what it represents in the in the foreign table. But usually speaking, it’s better practice to keep the same name of the key in the foreign table as it was in the in the original table. But overall, just be consistent in your naming scheme. We have things like Person ID versus Person ID, so all caps ID versus lowercase. Generally speaking, it doesn’t really matter either way here as long as you are concerned. Since throughout your database design as far as which capitalization you use, and similar thing goes with the naming of your keys as well, as long as you are consistent with how you reference them. That is what really matters there. And as far as readability goes, but keeping those names as particularly along with foreign keys, that can help improve that readability as well. And you’ll see some of these examples as I show some better design patterns later on. And you can kind of make note of how I named the keys inside of each of those tables as well, you can kind of see the best practices being put in action.

But what other kind of recommendations do we have here? So think twice before using delete, think twice before using Delete. When we create our tables, we can allow or disallow deletion of that data, particularly when we run things like updates, deletes, which will be a topic in another video. But typically speaking, in the real world, Once data is created, it’s never truly ever deleted. Any Once data is created, it’s almost never truly deleted. This is because it’s good record keeping as well, you know, we if a user decides to stop subscribing to our service, we don’t want to delete that user and all of that person’s history, because what if they decide to resubscribe at a later date, we want to be able to pull all of that information back to, you know, enable that process and make that transition a lot easier for that user. And this also helps, again, with record keeping, and data analysis and reporting and things like that. Users can also make mistakes. And so if you use your accident, when we delete something, then hopefully we can actually recover that information as well. And if we do, so, if we do this well enough, we can restore all of that information. So we will also have another video fully on this process as well. This can typically be resolved using things like history tables. So tables that are designed just to keep track of historical records when a record is deleted or updated. But you can also do this with just a column indicating whether or not a a record is active or inactive or is removed or is is not removed. That’s a the low hanging fruit. And there are some pros and cons between each solution. But as I mentioned, we will have a full video on that topic and another time, other things that we want to consider. And regarding two keys, typically you want to use something called a surrogate key versus a natural key. Natural keys are what we see in real life that are naturally unique, like a social security number or an email address, right? So if we have, let’s say, a user table here, we have email, oops.

So email is a typically a good way to indicate uniqueness, right? Because our emails are all unique. So if a user name, for example, is just their email, we can guarantee that that user is going to be unique, and we don’t have to worry about any anything conflicting with that when another user comes along. However, the problem here is that when we run things like updates, if we run our queries, deletions, insertions, all those all the different operations that we actually use, involving this user, working with a natural key is actually more expensive than doing something like a surrogate key. And so usually, what we do in this situation is we will have a surrogate key. So let’s say in this case, user ID. And again, since this is a key, I’m repeating the name of the table inside of my column name. So we’ll make our user the user ID our primary key here, and then our email address is just going to be a unique key now. Or just to kind of enforce the uniqueness property there since it’s no longer a primary key, but this way, right this way, if our email changes for whatever reason, our performance is not actually impacted because all primary keys here, all primary keys are actually indexed. And so that indexing actually improves the speed of our queries. And when our primary key changes, that index has to be recomputed. And so that’s one of the reasons why the performance can get hit, if we are using a natural key as our primary key. But in general, this user ID, this surrogate key will never actual change never actually change. But again, we’ll talk more about the performance implications when we talk about indexes, along with the impact of having primary keys as well. But that will conclude all my general real world practice, best practices recommendations here. Of course, there are more out there. These are just some really easy ones to remember and general rules of thumb that can really make your life’s a lot easier when you’re working with your database design.

Database Relationship Types

YouTube Video

Video Transcription

Welcome back everyone. In this video, we’re going to be taking a deeper dive into relationship types between our tables. And we’ve talked about some of these in passing before when we talked about UML diagrams. So we’ve seen these as part of the homework, and throughout our various other videos that we’ve had in this class so far. But let’s take a deeper dive into what these represents. So the first one that I’m going to cover here is one too many. And so the one of the common examples that you’ll see with this is a parent child type of relationship. So this is something where one record can have many connections on the right hand side, so one on the left many on the right, this can be something in the form of department, and employee. Or it could be something like our customer, and order, as we’ve seen so far in our database. And so the one to many representation here between these two entities would look something like this. So we would have, for our departments, we have, we’re one we have a department that can have one or more employees, an employee belongs to one and only one department. So the double dash here. So this, this here is going to represent one and only one. And then this here is one or many, then for our customer and order we have, we have one customer who has, who can have many orders. But a customer may not have an order yet, right. And we’ve seen this as part of our queries with the wide world importers database, where some customers or some salespeople don’t have any orders or or sales and a particular year, or even on a on any particular day, right. So that representation looks something like this, and a formal UML diagram. So we have our one and only one, zero or many, one and only one, one or many. But some of this here is logical constructs only, meaning that I can draw my relation here in my UML diagram. But some of this cannot be related cannot be physically represented as part of our database, and SQL Server Postgres, whatever database you’re actually working with, now, particularly this representation here, between department and employee.

So I have one department that can have one or more employees. But how there is nothing here that can, when I’m actually writing my database, physical database, there’s nothing there that I can do to prevent a department existing without having an employee. So some of this representation cannot be physically represented as part of our database. However, we can enforce some of this using things like check constraints, and other forms other methods, and constraining data inside of each of our columns. But things like one to one or more, we can’t necessarily always enforce that representation. Now, zero or more is more representative of what actually happens in our physical database design. But it is important to draw your UML diagram correctly towards what the data actually is logically, because that helps programmers and other people who are interpreting and using that database, it helps them understand how to write the code that supports that relationship behind the scenes. So things like this will not necessarily be able to be restricted physically on the database side but can be restricted on the application side. Otherwise, relationships that we talked about so far are one to one. So an example of this is products and inventory. And so the relationship here, between product and inventory here is going to be I’m just gonna draw a line here. We have one product to zero or one inventory. We could also have one to one on either side, so the remember, this is one and only one. And then this symbol here is zero, actually type right out zero here, this is going to be zero, or one.

So drawing this out here, right again, basically, these can only be implemented with zero or one similar issue with our zero or many, right, we cannot actually, logic, we can’t actually physically enforce one and only one or one or many inside of our database construct that these are only logical representations. So physically, these can’t actually be represented. So here again, I’ll read this from left to right, we have one product is associated with zero or or one inventories. And then an inventory is associated with one and only one product, right. So an inventory logically cannot be present without a product, but a product can be can be present without an inventory. And then down here, a products can, one products can be can belong to only one and only one inventory. And inventory can be represented by one and only one product. So the difference between these two representations here is that my product on this line here can’t exist without a corresponding inventory. Up here, it can, my product can be alone, my product can exist without the inventory there. And again, some of these representations are logical only physically, we can actually enforce all of these relationships. And I’ll try to point these out like I did here, with the one and only one. But some of these logical representations can be enforced through check constraints. And we’ll get to that here in a little bit. But the last typical relation that we could have is many to many, so many to many, is typically going to be implemented using a linker or bridge table associated with it. So many too many is going to look something like this. So we have something like that, and you can have a one or zero there. So we can have one or many or zero or many.

Now, as I mentioned, right thing, these are typically represented using what we refer to as a linker table. So this right here is a linker, or bridge table. The primary reason why we actually have the linker or bridge tables here is because many too many itself can’t be physically represented in our database, right? Physically, this cannot be implemented. And so in order to physically implement this relationship, we actually have to enforce something like this, where we have a intermediate table or aka link or or bridge table that represents that many to many relationship. So if I read this from left to right here, a product can be in zero or more product locations. And a location can also be in more than one product location. And so what this enforces here is that we can have many products at many locations. Right? But this table is the representation of that relationship, right? This table is that representation in our relationship. So anytime, best practice here, anytime you have a database that has or a diagram that has this many to many relationship here. We are typically going to redraw that using a link or table. So if you ever do see that and in practice, just the many to many, just understand that underneath the hood and the actual raw implementation. There’s going to be be some form of link or bridge between those two tables. But that will conclude all of our standard relationships between our tables, and the following videos we’ll talk about some different variants among these types of relationships.

Relationship Variants

YouTube Video

Video Transcription

Welcome back everyone in this feeder, we’re going to be talking about some variants of our standard relationships that we have inside of our database. So the standard relationships, one, or only 1010, or many, many, many of those sorts of things. So, first one that we’re going to talk about here is how we represent multiple relationships between two tables. So generally speaking, right, we can have a table that has more than one relationship with another. So we have a center table, something like like this. So we have an end, let me switch back to here. So we have an employee. And then an employee may have a role. And then we have departments that the employee belongs to. And we have this connector between the three. And so we have this kind of relationship here, where there is more, on the on the left hand side, we have one or only one. But here in the middle, we have one or more. So we have this linker table, right, if we have, we have an employee may have more than one role in the department, right? And so to represent that multiple information, because it because we an employee can have more than one role, and more than one role is associated with a particular department. So how do we represent that information. So we put all of that down into its own table. So we have our multi way relationship here, right, we have our employee table. And just now remember, we are using surrogate keys, I could in fact, use email as a primary key here, because it is a unique field. But right it is a natural key. So we don’t want to use a natural key as our primary key here. So we have employee ID.

And remember, all of our keys have the name of the table associated with it. But all the other columns do not. So we don’t have employee first name, we just have first name. And primary reason for that is when we go down here to do joins, things like that. The keys are important to have the name as part of it because it helps writing those queries a little bit easier to read and easier to do overall. But then we go down here, and we have our departments as well. So departments department ID, and and name. names should probably be a unique key here. So we don’t have any department who has the same name. And then we also have roll. So role ID, role name and role description. And then we can have an employee that is associated with more than one role, which is also says can be associated with more than one department. And so this is a really unique way or interesting way to represent this relationship. If an employee has, let’s say, one role to accomplish and department X, and a different role to accomplish and department a, we can actually represent that information here. So again, this center table is not actually a physical representation of the data. But this here again is a linker table, linker or aka bridge table. So very similar to how we represent a many to many relationship between two tables. We can also represent a many to many relationship between three tables by using that link or table as well. And we can connect more than one table together in one table by using more than one foreign key here. So you’ll see that I have SK one f k two f k three.

That’s because each of these foreign keys references a different key in a different table. So roll ID represents roll ID employee ID to employee ID and department ID to department ID here. And again, remember I’m using some best practices here. My foreign keys are named after the primary key In the originating table, this makes it significantly easier to track back to the or the origin points, or the origin table for that key. This not only helps us in our database design, but when you actually start programming this in part of an application, it also makes writing your application code significantly easier as well. So let’s continue on to our next relationship variant. And this case, we’re talking about multiple roles. So this is somewhat like I did before, where I have multiple foreign keys for multiple from, or sorry, the the table I showed here, I have multiple foreign keys down here from more than one table. But in this situation, I’m going to have multiple foreign keys to the same table. So for example, we have the relationship here, between con a customer and contact information. So what we actually achieve here is two foreign keys foreign key one foreign key two, but primary contact, secondary contact both of these. So both of these, reference, the contact ID, both of those reference the contact ID. So a primary key can be the foreign key in another table in multiple references. So we are not restricted to using a primary key as a foreign key once and only once. We can use it multiple times and doesn’t have to be used multiple times in different tables, it can be used multiple times and only a single table as well. So this works out fairly well. One of the caveat here is how do you prevent both from being the same? Right? How do you prevent the primary contact from also being the secondary contact? Well, generally speaking, this can be done using a check constraint. And I’ll showcase these, I’ll showcase these in, in some feature examples. We’ve shown we’ve shown check constraints before as part of our lecture series. But we haven’t really got a chance to really implement our own yet. But if if it’s just simply doing something like this, where we’re checking to make sure that the secondary contact is never the primary contact, and vice versa, then we can achieve that with the check constraint.

So what’s next? Self referencing entities, this is a kind of an awkward one, we can have an entity like employee reference itself. So we can have an employee that references itself. This is kind of a weird one, right? So for example, how do we have a how would we represent someone’s manager as part of an organization? Well, a manager is nothing more than just another employee, right? And so we don’t want to necessary we don’t necessarily need another table to to store, just the managers, because the manager is just an employee as well. So what we can actually represent here is that a manager, right, the manager ID is nothing more than a foreign key to employee ID. So not only can we use a key as a foreign key multiple times in the same table, or an external table, or multiple external tables, but we can also use a primary key as a foreign key within itself, which is kind of weird to think about, but it works out fairly well. So essentially, you think of this as a hierarchy chart, right? We have multiple employees as part of an organization, and so on and so forth. Right? And so, the person at the top, let’s say, this is the this is the CEO, right? They, we would actually have we would actually make the Manager ID here nullable. And so the person at the top would not actually have a manager, but everyone else well or you could also make it to where the if if if an employee does not have a manager, then the manager is themself, so you could actually have that as well. So you would have a self referencing loop there. But this organization chart as employee organization chart is really what we represent with just the single table. So a self referencing table is actually a very expressive way to represent this type of information. This similar structure can also be represented in your file system. So if you open up File Explorer in Windows, you have folders and folders within folders, and so on and so forth. How do you actually store the location of a folder? Well, a folder is nothing more than a folder that can be inside of another one, and so on and so forth. So you could represent the parent information with this same self referencing structure. And I’ll show an example of how we might store files here in a little bit. But these are some of the primary variants of relationship types between tables that we may actually represent. And in some following videos, we’ll talk about some more advanced relationships that we can represent between our tables.

Database Subclasses

YouTube Video

Video Transcription

Welcome back everyone, In this video, we’re gonna be taking a look at subclasses. Now subclasses seem kind of an odd thing for databases. subclasses are an object oriented programming idea, right? So we have a parent class serves as our base class. And we have child classes that inherit, you know, attributes and behaviors from the parent. But we actually could represent similar ideas inside databases, typically, we’re going to have three different kinds of approaches here, the object oriented approach, which is really one, one direct mapping to what you would expect in like Java, or Python, where you have a table for each class or type. So you have a, a table for, let’s say, an animal, a table for a dog, and so on and so forth. You have things like nullable columns. So you can represent the class structure by just having columns that you can know out for certain scenarios. So, you know, you can have, you know, animal as your base class, and you have dog and cat as your sub classes. But a dog does not have all of the same properties as a cat. And so if the, if the animal that we’re representing is a dog, then the things that represent a cat are doled out, and vice versa. So that’s, that’s an approach. And then we have the ER style, which is probably the more complicated representation. And this is the true abstraction setup here where we have a base class or superclass that represents the base information. And then you have child classes that are connected to that table through relationship of some type. But let’s first take a look at the object oriented approach. So generally speaking, when we do this approach, all the common attributes are going to be in all types. So if you have an attribute that’s shared between classes in your data structures, then that common attribute is going to be a column in all of those tables in your database design. In the object oriented approach, we have no keys, or sorry, we have no foreign key. So there is no direct relationship between each of the tables. So we have no connection between the objects, that connection is represented using that that connection is used or created using those common attributes. So the common attributes are those connections between each of the subtypes. And then tuples are only inserted into the applicable type, meaning that there is no base type.

So we would have sale for dogs, cats, and things like that, we wouldn’t actually have an animal class or an animal table, we would just have dogs and cats, and those dogs and cats that have all the same attributes as an animal does. But only the unique attributes for dog and the unique attributes for cat. Sometimes you will need a general type. We can’t always get away get away completely from the base type. But most of the time, we can get away with that. But let’s show an example here. So here is a short example showing an employee part time employee and salary employee. So part time and salary employee are to two general types of employees. And here, we I ended up just adding a general employee, just in case with these, right, here’s our object oriented approach. And you can also see, let me actually pull out the highlighter here, the common fields, right, so employee ID, we have name, we have email, as well. The things that are unique between these are going to be the other other other fields, right pay rate, salary, vacation time, those are all unique, right? To each of these different types of employees. So we have salary and vacation time for salary employees, and then part time employees just have a pay rate their hourly, they don’t have a salary or vacation time. So this is again, a really interesting problem. Four at face value. This is a relatively quick and easy way to represent our employees. Right. One of the cons Of course, is that we have duplicate information. But that duplicate information is actually spread across each of these tables. So how do you prevent an employee from being in all three? Well, typically, you would have things like a check constraint again, right? You could use it when and when something is inserted, you can use a check constraint, right? And that check constraint can make sure that if an employee is salary, if they are not also part time. How do you prevent, make? Or how do you make sure that employee ID stays unique among all three tables? Well, generally speaking for employee IDs, employee ID, in all three cases can be done using a sequence object. So we have that one sequence object that keeps count, right?

That one single object that generates the employee IDs, for each for each of these that makes that unique across all three other general question here as well, what if What about email? Right? What if you added an email attribute that all employees that all all employee types should have, but must be unique across? All? Right, so what if I had an email here, but that email then could not show up here and cannot show up there. I don’t have a foreign key that represents that. And so again, this becomes a lot of heavy lifting with constraints, like check constraints, sequence objects, all those sorts of things to actually represent this. So I do see this object oriented approach to sub classes, and some scenarios in practice, but far less often, because you have all these additional constraints that are needed. Plus, you also have duplicate information, you have the potential for duplicate information stored across different tables. So what are the other better ways that we might be able to represent this? Well, not necessarily better ways, but different ways. So let’s take a look at a another representation of subclasses. So another representation, we can actually use just a single table approach. So within that single table, anything that is shared between our objects, so let’s bring back the dog cat example, right? dog and cats are both animals. And so though, the common attributes that all animals share, those would be non knowable. So those are required attributes as part of our table. But the things that make our dog a dog and, and cats a cat, those would be knowable, because obviously a dog would not have cat attributes, and a cat would not have dog attributes. So those would actually be knowable as part of our representation.

But let’s take a look at a general example here, using our employee. Now, I’ve collapsed all three of our all three of our tables. So we had the employee part time and salary employees all three different tables with the object oriented approach. In this case, we are doing one single table with all three, and the the common attributes, employee ID, name and email are now non nullable. Non nullable, and then the three nullable columns salary, pay rate and vacation time, were the things that made salary employees salary employees, and part time employees part time employees within you are UML diagrams, by the way, know, the Italian. If a column name is italicized, then that means that column is nullable. That is a standard UML representation for databases. But let’s take a look at this table a little bit more, right? How would I actually enforce that? If someone is part time that they don’t have a salary and don’t have vacation time? How would I enforce the correct columns or no or not? No, depending on which type of employee that we’re actually doing? Well, we could do a we could introduce a type with a check constraint. So let’s take a look at what that would look like. We could do this without a type, but it makes it a little bit easier to do Do a type here. So I would actually do a type inside here. So add a type, add type in as a new column. And so then if employee type is full time, then the salary must not be no vacation time must not be No, but pay rate must be no, because a salary employee does not have a pay rate because they have a salary. And then if the employee is part time, then they cannot have a salary or vacation time, but they must have a pay rate. And so this is the general approach that we’re going to take, if we want to represent a subclass type relationship. So employee types with salary part time, and we could have other types of employees as well with if we wanted to use only a single, single table, this works out for the most parts. And I typically like this approach a little bit better than the three tables approach because I don’t have duplicate data being introduced. And it’s much easier to keep track of things that way. But again, you have all of these Nolde columns. So data that is not actually fulfilled. So we don’t need salary and vacation time for a part time employee.

So that’s extra overhead added into this, as well. Typically, this will was this will work very well for small examples or small datasets, but start to fall flops, performance wise, for very large datasets. Because, again, we’re taking up extra space, we have extra columns that aren’t actually being used. And so it’s extra overhead that we don’t need, those check constraints can get expensive after being ran over, over and over over over the course of a time. So typically, a better approach to representing subclasses is the ER style. So typically, what what what we’ll have here is a single super type. So we will have an employee. Right? This is going to contain the primary key, right? And the common attributes, right, just so we’ll have the employee ID, name, email, all the things that all employees have there. And it will contain a record that represents all all employees will be here, right? All employees will be here, and then we’ll have each subtype, those subtypes will also contain the key, but the specific attributes that make that type it right. And so we have part time and salary over here. And so we actually have this representation. Right. And so now we’re actually starting to represent the true style of abstraction here, right. So, if we’re talking about you know, object oriented programming here, here, this is a superclass and these are sub classes. So, part time employees and salary employees are inheriting from the base class, the base class or superclass employee, but let’s take a look at this representation and a full example. So, here is a look at a much larger example with the employees all stretched out here. So I have employee salary employee and part time employee, employee here. This is this is my base class or superclass. This is a subclass. subclass. And notice that we have we have two keys now are so we have a primary key that’s employee ID.

And then we have employee type ID, which is a foreign key to employee type. And it’s also unique with the employee ID. So the same employee, right So Bob can only be one type of employee and Jill can only be one type of employee. And so we actually store the type of along with the base class, right, we store the subtype along with the base class. And that lets us know, right, that lets us enforce these checks down here, that lets us enforce these checks down here. And so we want to make sure for example, that we want to enforce mutual exclusivity, right. So if a an employee is a salaried employee, we want to make sure that they are not a part time employee as well. So they can be one or the other, but not both. And so to enforce that exclusivity, we need to keep track of the employee type and the employee table. So we can enforce that check constraint down here. So an employee cannot belong in the part time employee if their employee type is for example, not to. Okay, and then the foreign key. Alright, the foreign key employee ID actually will cover the, the information that is shared up here, right. So not only is the employee ID the primary key for salary and part time, but it’s also a foreign key back to the original base class. So this representation here, this ER diagram representation is my typical way of representing subclasses as part of databases, it makes, generally speaking, the most logical sense, but things like nullable columns, and the others are still beneficial to our representation here. And sometimes they mix and match we kind of combine some of the pros of one solution and, and the pros of another to make a better solution overall for whatever data we’re trying to represent here. But this representation here is going to be the closest you’re going to get as far as the traditional representation of a base class and subclasses in object oriented programming in terms of databases,

Union Types

YouTube Video

Video Transcription

Welcome back everyone. In this video, we’re going to continue talking about various types of entities, and subclasses, including union types. This is also somewhat of a subclass idea. Generally speaking with Union types, sometimes relationships are mutually exclusive, like we showed just a little bit ago, that can involve all sorts of things, right? Folders, with files, users groups, all of these things are mutually exclusive, right? A folder can be owned by a user or a group. But how would you prevent a folder from being owned by both types? Right? How would you prevent a folder from being owned by a user and a group, but it’s usually only only owned by one, right? It’s either owned by a particular user or owned by a group, but not both. So this is like a subclass. This is like a subclass. And this is a really big, a big representation here. So I’m actually going to switch screens here. And we are going to take a look at this inside of Lucid Chart. So this is a user folder and group. So in its base form, it’s going to be very difficult to actually represent the mutual exclusivity here. So a user can have zero or more folders, and a group can have zero or more folders. And here again, here, I’m combining some of my techniques here, right, here’s a self referencing table. So a folder belongs to another folder. And so that parent ID is indeed another folder. So here’s our self referencing fact. And then we have a name for it and group ID and user ID.

So how do we represent the mutual exclusivity here. So a better way of actually representing this because there’s, it’s, there’s really no way of representing or preventing a user and a group from owning a folder here. So down here, I have a much larger representation of this along with some check constraints. So now along with the user, and group, I have an owner. So I separated out the the information into an owner. And that owner is kind of like a subclass, right? We’re, we’re, our user and group are kind of like sub classes to the base class owner, right, just like what we did with the ER style diagram here, you can kind of actually see general representations that we saw with employees, right. So think of this as owner being employee, part time employee, salary, employee and employee type. So I have the same general structure here. But now I’m just combining a bunch of other things along with it, because now I have the multi way relationship here between user folder, group and owner. And then I also have a self referencing relationship here with the folder ID with parent ID. But now with this kind of structure, I can enforce the owner type, I can enforce owner type. So I can make sure that, you know, only F if it’s owned by user that only exists here on the user table, if it’s owned by a group that only exists in the group table. But then down, but then I can now enforce exclusivity, because the owner is indicated by only one single field, right? It’s indicated by one single field before, right.

Before I had group ID and user ID as part of the folder, right? What should our properties right? But the ownership now, instead of having both included US foreign keys here, only the owner ID is included, only the owner ID is included. And so we enforce mutual exclusivity by allowing only one owner type ID here for what the same owner so owner type and Owner ID are unique together. And so only one owner can exist for a folder now, through this process. Only one owner can now exist through this process. So I’m going to include as part and Canvas I’m gonna include these images as well as part of This lecture. So please do take some time to kind of look through this. And please ask if you have questions. This is more of one of our more complicated UML diagrams that we have done so far in terms of relationships. So, this here is kind of a mixture of er, we have multi way relationship and self referencing. So, those are the types of relationships here that I’ve represented. And so, and again, remember this, this one here on the bottom is the general better approached represent the idea that we were trying to do with this with with this. So, this up here, I can’t actually enforce everything that I need to properly as far as the relationship goes. But down here, I can actually enforce that relationship in the database itself, without having to rely on application logic.

Weak Entities

YouTube Video

Video Transcription

Welcome back everyone, In this video, we’re gonna be talking about weak entity types. So weak entities are entities where their keys come from other classes. So this example comes from things like order lines, or tracks for an album. So even with a wide world importers database, we’ve seen a lot of examples where the keys, for example, or her lines come from another table. And so typically, we tend to the sets are often used in one to many relationships, where, where we have a minimum of one rather than zero, or many, so one, or many versus zero, or many. Logically, we can only we can enforce that situation, right, we can’t actually enforce that we have exactly one of the other. So for example, we can’t enforce that an album has one track, we can’t physically enforce that fat. While Chicly we can represent that here. But physically we cannot. So this is referred to as a weak entity set. Because our key over here depends on a key from another table. So typically, what we’ll have here, we have Track ID. And then our we have a foreign key album ID unique key for album ID, that goes over here. And the unique not unique key is also combined with the track number as well, because we can’t have duplicate track numbers for album. But this type of relationship here is weak at best, because like I mentioned, logically, we can represent this here easily enough with an album can have one or more tracks.

But an album, an album can’t have zero tracks, but has to have one or more, but physically within our database we cannot represent. And we can’t force the album to have at least one track. Now, this also brings up a little bit more of discussion into applications, because I can enforce this relationship and with application logic, but I cannot enforce it on the database side. So within the database database itself, so Microsoft SQL Server, Postgres, MySQL, I can’t actually force this relationship to exist. But I can force this so I can force the one or more for a during with the application. So if an album is only entered through an application, so let’s say a front end user interface, for example, I can actually force the user to also enter at least one track before the album is saved. So I can force that, but that relationship cannot be forced on the database side. So that is one of the weaknesses of this type of relationship here. So the one to one or many, the one or many side of this relationship is the weak entity, right? Because this depends on the album existing first. And the fact that we can’t, we can only represent this logically or through an application and not force it or we cannot force it on the database side.

The Multiple Path Problem

YouTube Video

Video Transcription

Welcome back everyone. And in this video, we’ll be talking about the multi path problem. So we’ve already talked about multi way relationships where we have multiple relationships between one or more tables. But the multiple path problem can come out of this type of relationship. So when there is more than one path between a single entity to another, we get an issue, right? This provides multiple different ways to do our joins, when we write our SQL queries, and when we have multiple ways of performing our joins when writing our queries, we can actually get different results depending on which direction we actually started joining first. So this provides an issue, right, because if we if we aren’t aware of this issue ahead of time, this provides an interesting problem that our queries are not going to provide consistent results, depending on which direction we start to join first. So there are a couple ways we can fix this issue. The first one is going to be taking out the foreign key reference entirely. And this makes it so that the person who is writing the query only has one and only one option to joining the tables together, right. So if there aren’t multiple ways of joining the same information together, then there’s not going to ever be the inconsistent issue with results. And so the problem here, though, attributes from the reference table would have to be duplicated in order to get that relationship across. The second option, second solution here is going to be duplicating the key and then use composite foreign keys instead, by using composite foreign keys, then no matter which direction we actually join, the results will always be the same, the result is always going to be the same. And so in this, I’m going to show a couple different examples of this, it’s all going to be based off of the same, the same set of tables, but I’ll show it in UML first, and the issue where where it arises. And then I’ll show an example in SQL as well. So here are two examples, right. So the first one on the left, right, this side, this is the multi path problem over here. And then it is fixed. Oops. On the right hand side, I fixed the multi path problem. And so let’s take a look at some of the distinct differences between the two. But first, I’m going to highlight why a multi path problem exists.

So here there is more than one way to get from an invoice back to the order, right? So I can get to from the invoice I can get the order here, right foreign key, but I can also get it through the invoice line through here, but I can also get through the order line through here. And so there’s different paths that I can take to get back to our my original order I can go this way I can go this way. So both directions, so I can go I can go that way. I can go that way, right. So two directions that I can actually perform my joints, I can join my invoice with invoice line, then order line and then back to order, or I can join invoiced order both of those are going to result or could potentially result in different data sets as a result, right. And so if I go over here now with my my fixed solution here, multiple paths still exist. But what now and what the same exact result, this is due to the composite keys, this is due to composite keys. And so in particular, this line here, okay, and this line here, and this line here, now, invoice ID is not unique over here anymore. So invoice ID is just the the foreign key over back to the invoice. And my order ID right is now a foreign key. Right? Foreign Key reference Seeing both write foreign key referencing both this order and that order. And so what this actually enforces is that the invoice, invoice line and order line are all synced up to be the exact same order, right, they all link back to the same order. Now over here, you notice that this order, the invoice is synced with that order, the order line is synced that order, right? But in theory, right, this order this order here, and this order here could be different, because there is no synchronization of the Order ID between each of the tables.

And so that’s why this, this side, on the left, this is a problem because the order ID is not synced up to the same order. But over here on the right hand side, since I’m using a composite key with the Order ID. That becomes right, that becomes synced up. And so when I joined, no matter which direction I joined, if I joined invoice, invoice line order line, or invoice order or invoice line order line, or vice versa, whichever direction actually joined, the result of my query will be the same, right, the result of my query will be the same. So let’s take a look at what this might look like an SQL open up a new query here. So I’m actually going to do a couple of couple of things here. First, I’m going to set up my tables just like I just like what with how, initially, I’m going to set it up like the multi path problem here. So my table is going to represent this issue right here. And so I have my order table, order line, invoice invoice line. And then I just have some basic data to actually insert here just to represent the problem. And so notice, I don’t have any composite foreign keys here, I have no composite foreign keys, I have a unique key order ID or line number unique key here. But that’s pretty much it, I have no composite foreign key, I do have my foreign keys linked here, but no composite foreign key. Okay. And so let’s let us actually showcase the query that will highlight this problem. So first, let’s go ahead and connect to my database. All right, and then I’m going to use CC 520 here. So again, all of my tables that I all of my tables that I had shown as part of the UML diagram on the left hand side, so this one over here. We’ll be and then some associated data with each of those tables. And this query. I am pulling out all of the information. So invoice line, invoice ID, order ID, product quantity, quantity, invoice or invoice quantity from invoice joined on invoice line, joined on order line joined on order and then down. And then let us execute. Okay, so I’ll run this once.

So invoice 345, invoice ID 2121, order ID 2020 and 10. Okay, and package a cool beans bean roaster packet of Cool beans, quantity, and so on and so forth. Now if I go down here Alright, so now that we have our base data, our base data inserted into our tables with the multi path problem existing in there. Let’s check out these two sequel SQL queries. So both of these queries are going to pull invoice invoice information along with the order information and the product quantity the so the amount ordered And the invoice quantity. And at face value, they both look to be about the same. But the line that I want to highlight here is this last one. So with this line, right invoice ID, invoice line ID five, I get order ID 10 For package of Cool beans here. But down here, I get order ID 20 are the exact same invoice ID and invoice line Id still the same products order quantity and invoice quantity but a different order ID. So this, this provides an issue. And really the only change that I actually made here is here. So I have entered joined demo dot order on order ID O dot order ID equals O L dot order ID. And then here is O dot order ID equals AI dot order ID. And so the IDs are not syncing up properly between each of the entities with our join. The entities are not, they’re not a sinking together with the correct order ID, because I now joined on a different predicate. So the order of information is dere. Same joints just ended up with some slightly different information as a result. So how would we have fixed this issue? Oh, I’m going to make the here a new page here. And well, you see see 520. So same exact tables, same tables.

But now instead, I am actually representing this fixed, fixed table over here. So the primary changes are gone that I’m going to represent are now this composite key, this composite foreign key here, along with the unique key order ID and invoice, invoice line and order line. So let’s take a look at that. So here is a few new things. Here’s the new unique constraint. So the order ID and the order LINE ID must be unique together. Here are the new unique constraints. Order ID and invoice ID must be unique together. And now down here, here is a composite foreign key. So order ID and order LINE ID must reference order ID and order LINE ID in the order line table from the invoice line table. So invoice line to order line. And order ID is synced now between invoice line and invoice. So not only is inside my invoice line table, not only am I syncing my orders, but I’m also syncing the order lines as well. Right. So let’s take a look at this as part of our database. Run this again here real quick. Let me drop. If we look at this issue, right, if we look at this issue, oh sorry, I can’t actually zoom in here. So if you squint really hard or zoom in really sitting really close in your screen says insert statement conflicted with the foreign key constraint in voiceline the conflict occurred and database CC five pony table demo order line.

So let’s go down and look at our order line. And so scroll down here. So we’re in our invoice. We have a composite foreign key order ID and order line, order ID, order LINE ID. And then we also have our composite foreign key here with our order ID and order ID invoice ID and the invoice. So if we go back over here, we have our sorry, this should also sorry, I do have an error here. Okay, one so this should also be Sorry, not or so I have a my foreign key here fk one fk one. So these two together, these two together. So that’s my reference. That’s my reference here. And so what happened is that when I inserted when I tried to actually insert into my order line table here, I violated oops, sorry, I violated my foreign key, I violated my foreign key constraint in my invoice line. Because if I scroll down here, we got these three, these three insert statements happened because these three here, three rows, six rows, four rows, here’s my four rows. But this failed, right? Primarily because of this line here. This should actually be 201 instead of 101. Right. But that violated the key constraint because of this, right? This has to. So this should be to a one, not one to one. So this is a typo, right, this is a typo from me entering the data, or whoever is entering the data, whatever it may be. This is a user error. This and this could be either user error, or programmatically whatever it may be. But now with our composite constraints with our composite foreign keys and unique keys, this is prevented now through the error that’s generated from the database. So this doesn’t have to be prevented from on the application side, it can be prevented through the database design itself. So nonetheless, right, this is more a more complicated, sorry, this in particular is more of a complicated problem that we are probably used to as far as our design. But issues like this do come up. The multi path problem is something that is doesn’t happen often. But when you have this complicated relationship here, between a bunch of tables referencing each other, now you can get into issues like this. So just be aware of this is one possible way that we can use to correct and fix this problem and our database design

Design Patterns and Practices

Subsections of Design Patterns and Practices

General Database Design Practices

Video Transcription

Database Relationship Types

Video Transcription

Relationship Variants

Video Transcription

Database Subclasses

Video Transcription

Union Types

Video Transcription

Weak Entities

Video Transcription

The Multiple Path Problem

Video Transcription