Database Design Example

Video Transcription

Welcome back everyone. And in this video, we’re going to be taking a look at an example database design. So we’ve shown some basic database, UML diagrams, but we really haven’t taken the time to draw our own yet. So in this example, we’re going to take a look at designing a really basic database for offering college courses on a campus like K State, a course is going to be defined by its number delivered by a department in a college and offered certain semesters at the same time on various days of the week. So this is going to be the basis that we’re going to start off of start from for drawing our database design. And here, I don’t really give a lot of details here, right, we’re just basically defining a college class, right, like cc 520. Right, CC is essentially the department, which is offered in the Arts and Sciences College. And so 520 is the number and typically right, the CC the department and number is going to be a unique combination within a given college. And so there’s a lot of other core details here that we’re missing as part of this. But let’s take a look at how we might start drawing a database that will store college courses. So what we have here is called Lucid Chart, it is a diagramming tool that can be found online, you can download it as well, there is a free version that I’m using here, it does limit you to the number of documents that you can actually create. But for the most part, you can get around that by adding multiple pages like I have down here to the same document, you can achieve the same kind of functionality through a lot of other programs. One other one that I really like is called draw.io, which is a completely free tool that you can use online to draw things like flowcharts. And in this case, UML diagrams for drawing or designing databases.

So here is a UML diagram that I talked about, for our table that we talked about here, or the course that we talked about here, a course identified by a number department college and then offered some semesters, at the same time at various days of the week. So I don’t quite have all of the detail here yet that we talked about. But I do have some of it, right, so we have a course table. So every every entity that we see over here, and you see there’s a couple of different options here that we can actually use, most of the ones that I’ll be using today are going to be just this entity model here, which allows me to identify the columns that are listed here, and my table. So table name, which is course and may actually be better me to make this a shaded header, it doesn’t quite look as clear. So I’ll leave it unshaded. But we have the table name here. And then all of the columns associated with that table or that that that is in that table. And on the left of each of those column names, I can specify whether or not something is a key, like a primary key foreign key, or maybe that has a unique constraint associated with it. So UK is used to denote a unique constraint. And we can specify a variety of other things, other constraints here as well, if we’d like to, if you wanted to get in more detail, we could do this triple the triple field, where are the triple column table where we have the keys or constraints on the left, the column attribute names here in the middle, and then the column types on the right hand side. And today’s example I’m not going to go into assigning types here. But I will talk about the general difference between some of this. So what I’m focusing on for this example is our logical design of our database. Meaning that what we actually put into into action and implementation like in Microsoft SQL Server will actually be a little bit different to what the actual database design actually looks like.

Because the logical sometimes or Some things, we’re not able to concretely model as part of a logical design, versus what we what we actually implement in the actual database. And I’ll try to highlight some of those differences as we go along here. But for now, we’re not going to really try to define the physical, the physical implementation of our database. So I’m not going to bother with the column types for now. So let’s go ahead and take this out. But we have a single course here, a single table. And obviously, we’re not, we don’t have all of the information that we need yet, like days of the week time, this course is going to start all those sorts of things. And even things like semester, right is very vague of what is actually stored there, like, are we going to store fall 2020, fall 2021, spring 2022, whatever it may be, that’s a lot of little different pieces of information that we have to cram all into one column. And we want to avoid those types of things as well. We don’t want a column that we are forced to store more than one bit of information, even if it is related, it is much more powerful to store the those pieces of information in separate columns, or even in separate tables. So we, for one can have more expressive queries. And two, we can actually have better data integrity, right, we don’t have to have as much redundant data being stored as part. So let’s kind of explore what we can try to start adding to this, to make this a little bit better.

So I’m going to use my little copy up here just so I can change things here. But or to make this go a little bit faster. So let’s first start out by taking out some of the dupe the things that would be duplicated, right, because if a let’s say, you know, in general, right, a department is going to offer more than one course, typically. And if a department offers more than one course, things like the college name, and department name, are going to be duplicated across all of those different instances or all those different rows. And again, we want to avoid duplicated data. I’m not going to touch semester for now, let’s go ahead and focus on college and department. So let’s make a department, a department table. So I’m going to have department here, and I’m going to go ahead and do most of my most of my primary keys, for a lot of this data are going to be surrogate keys, meaning that at some unique identifier, typically it’s going to be some auto incrementing number, that as we increase as we insert new data into our table, these surrogate keys are a little bit more beneficial and easier to work with. We could enforce that, you know, a department name be unique, which we’ll probably do anyways here. But having the primary key based off of that can be tricky at times. So in general, it’s better practice or you’ll see this in practice out in the real world where we have this auto ID field that serves as our primary key. And even in other database implementations like MongoDB, which is a no SQL solution, their primary their default primary key is an auto incrementing ID field that you don’t even have to add as part of the the database design is just exists as part of the database implementation. So let’s go ahead and add a couple more fields here. So we’ll need to do let’s add Name, college and let’s say like phone and email or something like that. So here we have the part met name, let’s do college. And then phone and email. And then we don’t have I’m not going to have a key here quite yet. We’ll expand that in a little bit. But I am going to enforce that the department name be unique so we can’t have to department to department of computer science does have to have some unique, some unique name as part of it. So now what we what can we do with this setup?

Well, it I can actually show a relationship between these two tables. And now instead of actually storing department here, as part of my course, I would actually store department ID. And that department ID now becomes a foreign key. And with that, we would, we can still enforce that this be a unique key as well along with number. So we don’t have the same course number twice, for a particular department. If that is a restriction that you want to add, I’m not going to add as many of the like unique keys restrictions here, I’m going to save that as an additional exercise for you to try on your own. But just something to consider. So now we actually have, we don’t have as much duplicate information, along with the department now. So we don’t have duplicate information for the department. But we’re still duplicating information for the college because you can see here, I have college, for the course. And then I also have college for the department. And so it really doesn’t make sense to keep college as part of our course. Because since it belongs, since it is associated with the department, then we can take that out of the association with the course because we don’t need that we can get that through the department. So we can remove that row. Let’s this back over here. And by the way, as I’m drawing these relationships here, we talked about before, where the quote with the crows foot notation in our homework, so the crows foot notation, with the three lines. This is this means many. This one here that I’m putting on that is one or many. And then if we had a relationship that has the three the three ticks, but with an open slot, let me find it, here we are, this one here that is zero, or many, that means a department must have can have zero or more courses, or one or more courses and this situation here. And some of like, as I mentioned, some of these relations here cannot actually be strictly enforced as part of the physical implementation of your database.

So if you initialize your, if you initialize everything, and you say that while the department must have a course, well, some of these things are not actually you’re not actually able to physically enforce as part of the database design in Microsoft SQL Server or Postgres, or MySQL, or whatever database management system you’re actually using. So this is primarily a This is primarily a logical, logical implementation of our database as a whole. But we have our line being drawn here to show that a department will have at least one or more courses. And a course can only be assigned to a can be offered by one and only one department. We’re not going to open up the can of worms here that you know, maybe a courses co taught, and it’s offered by you know, two departments, we’re not going to go down that road for this design. But that could be potentially something that we might have to account for down the road. But let’s continue on expanding this a little bit. Because we also want to consider our college along with this. Because with our college that we have here, again, we have the same issue that we had with our course, we have a we have the opportunity here if a college has more than one department that we’re storing redundant data. So let’s go ahead and make ourselves a college table. And then within this college table, we’re going to have a college ID. And let’s do a couple extra rows here and I’m actually just going to do a name for our college I’m not going to bother quite yet with make this unique and make this our primary key. And then we’re going to draw a relationship from college to department And a college will have, maybe, let’s go ahead and enforce one, at least one department or more. And then a department must be associated with exactly one college.

Also keep in mind as I’m drawing these relationships, the the spots where I actually connect it to on the UML diagram here, whether I connect it down here or up here that has no relation, you won’t actually see the linkage drawn between the actual fields. It’s just the linkage between this table to this table or this entity to that entity. And now then, we actually need to make this here a foreign key and change this to college ID. All right, so now we are on the road to having less duplicate data as a result of writing this out, but our design here is pretty flawed over all, primarily because we still can’t tackle this thing, this aspect right here, a course is offered certain semesters at the same time on various days of the week. And so all what we’re actually able to are all what we’re achieving with that, right now is just this semester column right here, as part of our course. And having the semester all all about information packed into one column is quite flawed, we’re not going to be able to query where our queries aren’t going to be able to be very expressive. It lacks consistency with data integrity, and all those sorts of things. So let’s work on splitting that out from our course table. So I’m going to shift our entities down here a bit. And I’m going to make, let’s see here, probably needs a few extra tables that I’m going to put up. So let’s put one over there. Put one up here, and then I’ll put a couple down here. Alright, so let’s go ahead and see what we can do to pull out semester. So first off, let’s make us semester, a semester table. Because things like fall, spring, summer, all those sorts of things we don’t want to list out and in one column, fall 2020, or fall 2021, or spring 2022, or whatever it may be, because again, we’re duplicating the idea of have the term right fall, spring, summer, or intersession, or things like that. So we don’t want to duplicate that information. So we want to attach an ID to that.

So we could even say this to be term. And let’s make this primary key term, ID. Let’s a nother row here. And this is going to be a unique field. Term name. Over here, let’s see here. What do we need to actually start to pull out from our courses? Well, let’s rename our semester to B term. This is going to be term ID. This would be foreign key. And then we would have something like year here. But even with this, if we drew this as part right, we’d end up duplicating information. As we offer, let’s say cc 520. We offer it this spring, but then we offer it again, the following year, and so on and so forth. And some you know, some courses, of course are offered every semester. And so we end up again, duplicating information about the course write the name, description, all those sorts of things that we don’t really need. That doesn’t really change from semester to semester necessarily, but we we so we don’t want to actually duplicate that part. So let’s kind of keep on abstracting this a little bit farther. To be, let’s say, course schedule, so we can keep track of when something is actually offered. So let’s include here. Course Schedule ID. I’m doing some of this on the fly here. So of course, there’s probably some better ways of doing this by did kind of want to illustrate how we might iterate over our database design as we go. So we have our primary key. And we need at least a couple of fields underneath here. So let’s have a foreign key to be term ID. So let’s put that up there. Let’s shift this up a little bit. So we have some space. And let’s take out let’s take out this row. And then we’ll go ahead and draw this relationship, let’s flip it. So we’ll have a course here with zero or many here. And then let’s say we have course, start date, course and date, but it doesn’t. Well, it’s just not make as much sense here to call this a course schedule.

But let’s take this back out and rename this term schedule instead. Because mostly like fall, the fall 2021 semester starts at a specific dates, and so on. So let’s rename this to term dates. Alright, so we have like, the fall term, or spring term, or so on and so forth. And then we have the date associated with this here. And then with that, maybe we also have Above this we have we could we could get, actually, no, we don’t want, we no longer have to store a year because it’s kind of implied from our terms start date. So let’s go ahead and remove year there. And with that, then we can also remove year from our course. So a course is offered on a schedule let’s go ahead and model this. So like our will assume for now that the course the course, the course names and stuff like that don’t actually change. So let’s then go over here and make our course schedule to do here let’s shift some of this over. Alright, so we have a course, we have our term. And now let’s go ahead and try to extract a little bit more information here. As a result, so we have to actually have some way to combine our term with our course because the course is offered during a specific term. And so we want to be able to model that. So if we we could, in theory, make a relationship directly with the term schedule. We could link a course directly inside here, but that means we You’d also duplicate the start date and end date for a huge number of courses. So we don’t want to do that. And then we’d also have a many to many relationship there, which is also kind of hard to model physically inside of a database. So let’s go ahead and keep on trying to work through this.

Let’s make a schedule ID, this is going to be our primary key here. And then we’ll need to have quite a few row or quite a few columns as part of this Sony course ID will need the term schedule ID, the will also need to know the days of the week, and then also the time that the course starts. Or the location, the location. So let’s then do this number. We’ll do course ID. And then we’ll do term schedule ID. And then I’ll leave this one blank here, this will be the week. Something we haven’t quite fit, we haven’t quite got that table made yet. Love a foreign key here, a foreign key there. And then for now, I’m just going to keep this abstract and say location, delete that. But as I mentioned, here, we need some way to denote when this course is offered. So during that actual terms, so let’s go ahead and do a week schedule. And the week schedule ID. And then let’s add a few rows here. Let’s do days of the week, and then start time stop time. So days, the course the days, and then the start time, stop time. And then let’s go ahead and make these three fields unique together. So we don’t have a duplicate week schedule, right. So like Monday, Wednesday, Friday at 1pm to 2pm, or something like that. So let’s then get this key over here to our car schedule. There we go. And so let’s go ahead and also draw our relationships here. So we have a course, that may be associated to a that’s going to be associated with a course schedule. One schedule item is going to be associated with one course. Because a course can’t be scheduled more than once and a particular term. Or at least in this situation here. That there and then we have terms scheduled to course schedule.

So we’ll once a term schedule may be used and multiple course schedules. So we have, but of course schedule a single course schedule is associated with only one offering there. And you we might have a term that has no courses yet, so we’ll do zero or many there. And then we need to associate our week schedule with our course schedule, and that will be a similar relationship. We may have a time that does not have any courses. But a course schedule. A course that has been scheduled has exactly one one offering during that particular term. But zoom out just a little bit so we can get a bigger picture of what we have going on here. So this is everything that we’ve we’ve done so far. And of course, there’s absolutely some more work that we can do here. For example, we don’t have any enrollment information as part of our courses, of course now. So we have our college department, the course, and the time that the course is actually scheduled. But we have no information on being or we have no way for a student to actually enroll in said course, either. So I am going to leave that portion of this particular database a exercise for you to do on your own. Likewise, things that I haven’t included here, there are a lot of various things in my design here that may or may not work. Or, like, for example, I haven’t put unique keys on things that probably should be unique. I haven’t added all of the other additional tables that we might also need to store this type of information. But you kind of got to start to see the the picture that we did, right, we started off with a very large course table that had everything in it.

And things particularly things like the departments, the semester, all of that, and we broke that out into their own tables. So now we have our own department table, our own college table, that way that information is not duplicated, we now also have fewer ways that the course information is duplicated, because we separated the course from the scheduling. That way, we don’t have to repeat the course name and description and things like that. So this is really the start of trying to design a good database, right trying to reduce redundancy, trying to increase consistency among data. And overall make the your SQL that you can write much more expressive. If everything is crammed into one table, or one column, things like that, it becomes more difficult to write actual SQL queries. But that will include my example, that we talked about here. And the follow on videos will actually spend a good amount of time talking about the underlying theory behind what makes a good database design. We’re not going to have a huge focus as far as a lot of practice on the database theory, but I’ll be covering it in a more general sense. So hopefully that you catch the overarching ideas that is important to designing a good database