Chapter 3.1

Single Table Queries Part 1

  • Schemas & Objects
  • Minimum Table Requirements
  • DDL vs. DML
  • Booleans
  • SELECT Statement
    • SELECT
    • FROM
    • WHERE
    • GROUP BY
    • HAVING

Subsections of Single Table Queries Part 1

Single Table Queries Part 1

YouTube Video

Video Transcription

Hello everyone, In this video series, we’re going to be taking a look at some simple queries. Now, in this, in this video series, we’ll cover quite a few things, everything you need to know about a full simple query, including how to how to select columns, pick tables out the table that you want to actually pull data from, and filtering those results from those tables. Now, in a lot of programming languages, like Python, for example, everything is an object. And very similar idea here with databases, and particularly with SQL Server, almost every single thing inside of it is considered an object. So each of your tables there, we’ll be talking about views, procedures, and there’s even some functions, and all sorts of other things that we’ll be covering in this course.

But everything is treated as an object. But those objects are contained in what we call a schema. So in if you’re talking about programming languages, like for example, C sharp, your schemas considered your namespace, so everything lives inside of this, you can kind of almost treat this as like a folder, right. And inside, on your computer, you probably have your courses organized or your your information and documents for each of your courses, each of those courses underneath a single folder. And inside of those, you have individual documents and things like that. And so very similar idea of what we have with databases, but instead of calling it a folder, we’re going to refer to that as a schema. Now, a schema itself cannot have other schemas inside of it. So if we’re talking about folders, it’s a folder that can’t have any sub folders. But it is going to contain all of the objects associated with the database. So all of your tables, any stored procedures, and queries that we have saved out, and everything that is associated with it.

So how do we actually refer to the schema, very similar to how we refer to classes and objects in your programming language. So if you’re talking about Python, it’s the package name, dot and then the item or member inside of that particular Python package like a class or a function, and similar idea of what you see in Java. So in this case, we have and will, in the examples that I’ll show here and a little bit, we have a sale schema. And in that sales schema, we have a series of tables that we’re going to work with. So here I am selecting everything, so select star, and we’ll talk about this query here in a minute, but select everything from the sales dot orders table. So sales being the schema and orders being the actual table name. So what do we need to actually have a table. So we, in our early videos, we created a very simple table, we talked about what a table actually contains. So we have attributes which are columns, and then we have rows as well.

So each row representing a actual Single, single record inside of that table. But a table itself is going to have a table must belong to a schema, you can’t have just this orphan table out there, that doesn’t belong to anything. So a table must belong to a schema, which is essentially going to break down to being your database, right? A table must also have at least one column, right? So we can’t have a, we can have a table with no records in it. So no rows, but we cannot have a table that has no columns, because otherwise we have nothing to actually define the data that’s actually being stored there. So as far as the column goes, each column must have a unique name. And that only has to be unique within the actual table itself. So if I have Table A, which has a phone number or email for example, Table A can have email and Table B can also have email. But within each table, we cannot have two columns that are the same name, because otherwise we cannot uniquely identify a particular attribute for any record. So must be unique name. We must also define a data type.

So this is particularly with SQL. Each attribute must have a defined data type. So no change whatsoever. If you’re coming at this from the Java, the Java point of view, but if you’re coming from Python, unfortunately, we do have to actually define the data types here for for each of our columns, and we must also define whether it is null or not null. And the null ability at the null ability modifier here is going to indicate to SQL Server or your database whether or not this column is optional. Hey, so no allows records to be inserted into this table without that column present. So if I have a record, and the phone number is optional, for this particular table, I can insert a record about let’s say, a person. And that person doesn’t have to have a phone number in order to be inserted into this table. Not Null is going to make that column required for all records that exist inside of that table. Now all of this is actually specifically for SQL. We will talk about way later and into the course, I will talk about something called no SQL, which has a little bit more relaxed requirements as far as what the tables are defined, and the types and things like that, which is a little bit more related to what you would expect from kind of like the Pythonic and the Python environments. But these are the minimum requirements that we need in order to actually have a table be a table. So what about a query, right, we’ve talked about and executed a few queries before.

So now we’re really going to kind of dive into what a query is, and kind of define all the individual parts. So SQL itself is a declarative language, meaning that we are going to define what we want not how to get it, which is kind of a backwards thing of what we actually are used to, right. So the the data already that the SQL Server itself, right, the seek the server management system, their job or its job is only to or it SQL Servers job or the or the databases job is going to be responsible for knowing how to retrieve the data, right? The data is stored on the computer somewhere, all your SQL knows as you are connected to, to that particular database, and the database is going to handle retrieving the actual data, all of the SQL is going to actually define or the query is going to define is what you actually want out of that. So what data do you want? Not how do you actually retrieve that data, which is a little bit different compared to how you are working with Python or Java, right? If you are, for example, wanting to read in a text file, and then write out contents to it, you actually have to tell the computer where that file is, you have to actually physically tell you have to tell the language, how to open that file, how to read that file, and then how to write to that file. Which is completely different in most database languages especially. And what we’re working with here is that we are not having to tell the database, how to write it, where and where to actually store it and everything like that. The database itself knows how to actually handle all those operations, which makes our lives as database database engineers to make our lives significantly easier.

SQL itself is a set based language, meaning that for things like C sharp, Java, Python, it’s not procedural or really like any other language that you’ve actually worked with. Really, order itself is not always super important, although we will talk about order on how things work with the actual SQL language, because the SQL statements are consumed in a specific order, but you can actually have them in whatever order you’d like. So very rarely does order actually matter versus a actual program written in Java, C sharp, Python, whatever language really, order absolutely matters, right? It’s top down. Or if you’re looking to a function, it works on line one, line two line three, and so on. But SQL is quite a little bit different than that. And when we’re working with any kind of data and our database, particularly with SQL, everything is going to be dealing with sets, right, a set of data, meaning things are unique. And we have, we can have duplicates in that. But we’ll be diving into a little bit more about what that set is going to kind of mean here in here in a moment. So common problems. So as we get started with working with SQL and SQL Server, there are some common pitfalls that some students or or, or people who are new to writing SQL fall into.

So one is that you disregard one of these properties, right? The fact that sequel is set oriented, and declarative, so we are trying to reverse you’re, you’re not completely reversed on us as far as how we’re actually writing SQL code. But it’s not line by line by line, right. And this becomes significantly more important, as we add more and more to our queries, as our queries get more and more complex. If you’re thinking about it in a as a procedural language, from you know, top down, then it’s not necessarily going to work out, the logic won’t actually end up executing as you expect it to, and you’ll end up with a lot of different results or results that you don’t expect. So Okay. Oops, sorry, one second to those who are recording. Okay, so, in this class will be I will be referring to or you’ll see these acronyms, quite often. DDL and DML. Okay. So DML, which is what we’ll be working with, for the majority of this course, is referred as stands for data manipulation language. Okay. So with DML, this is all of the query statements that you’re going to use to retrieve data or modified data. So inserting data into your tables, updating data, or updating records that already exist, deleting them, merging them, and just flat out retrieving them like select. On the other hand, we also have data definition language, or DDL. And this deals with primarily creating data. And not just necessarily creating data, but creating databases and creating tables, views and stored procedures and things like that also fall into this category. But we won’t get to those particular parts until later in the course. This first section of the course will be focusing primarily on just the SQL for database manipulation, or data manipulation.


YouTube Video

Video Transcription

Alright, so now that we have reviewed some of the basics, let’s go back to writing some actual queries. So we’ve already ran some basic select queries before, but we didn’t really break them down into into what each part actually contains. So we’ll start out first by looking at a select, but without the from clause. So the from clause is actually optional. But the Select is required. So this is the SQL syntax here. And I will show more things like this as we add more statements and elements to our SQL query. But this is about as basic as you can go. And then the square brackets there, that denotes an optional element. So the from clause is optional. But let’s without further ado, take a look at select select is the only thing required in a SELECT clause. So we don’t actually have to have a table to pull from, we can actually project data just without any initial data source, though usually, and the vast majority of queries you’ll actually use, we will have a From clause associated with the select. So you select these columns from this table, or from this data source, essentially was what that boils out to, can also rename the columns that we actually select. If our database design is done correctly, or done well, so to speak, we shouldn’t have to add a lot of aliasing here. But column alias aliases are really helpful when we are pulling data from multiple tables. And then also when we’re actually showcasing query results back to the actual user. Because a lot of times the database column names aren’t necessarily super user friendly. As far as in the user, it may be fine for a database engineer designer, but not so much for the end end result. I’ll showcase some aliasing here in just a second.

But the SELECT clause is going to be the projection operation within aren’t you know, if we want to use database terms here, now are sorry, sets. So with sets, we have projection and selection, and for some reason, when the SQL language was being designed, they chose the select keyword to set to for the projection operation, and the from clause to be part of the selection operation. And so it’s it’s kind of backwards as far as the actual SQL statements go. What it really boils down to for the Select or projection operation is that we are picking out which columns or attributes of our data that we actually want to have come out and the end results. So if we have five columns in a table or a set, we are going to project or select a, either all of those columns or a subset of those columns. So maybe column one, two, and five, and we’ll skip the other two. That’s what projection is going to do. And here in a few here in a moment, as well, I’ll show you the from clause, which is the selection operation, which deals with rows instead of columns. Without further ado, let’s take a look at a couple of examples of our select. Now, as I was mentioning earlier, we do not have to have an actual table to select from. So I can go in here and say select seven and run it and I actually get a result. So I end up with one row one column. I don’t have a column name yet. But there it is. There’s my data that from my for my query result.

Now a lot of times you won’t actually see queries as simple, but sometimes they’re actually pretty useful. But nonetheless, I can actually go back in here and this is where aliasing becomes very useful. And so we will say as, and then we can, let’s say number here. Cool. Now, as I run that now, I actually get a readable column name out, so I get a number. There. Also notice here, when I look at number, you’ll notice that it is is highlighted in blue. So things that are kind of reserved words just kind of like as you’re typing and your favorite programming environment, it highlights key words as part of the language. So if you have that situation here, or you end up wanting to have a space and your column name, then you will need to do something like this. So we want to actually denote the actual name either using square brackets, which is going to be the way that SQL Server usually will prefer it, you’ll see that and other database languages or other SQL flavors, you will see this as a double quotes. So either one works. For me, I do not mind either way, which way you go. Most of my examples, you will see me using the square brackets, just because it’s more SQL Server II ish. But the double quotes are also perfectly acceptable and perfectly acceptable syntax.

Now we can do all sorts of things here as well, like I can add more columns, so I can do. Right, and I actually get a text column back out, I can even put a date. So state time offset, this function will actually pull out the current time on my local computer or the SQL Server instance, the time from that server server. Now see, if I do a space here, it doesn’t really work out so well. But if I actually wrap that in either quotes, or the square bracket to denote the actual name, I can actually have a space now on my column name, this is really as simple as a, an expression as you can get as far as being a complete and full query. Okay. Now, a couple other random thoughts here, the capitalization of the actual SELECT statement or or clause elements, doesn’t really matter, right, it is not case sensitive as far as select. So I could go in here and do all all lowercase if I want it to, so I could go select. And that will still execute. It is common syntax for people to use all caps for any SQL elements, and queries, because that helps denote it from the rest of the content. So it helps to note away from column names, values, conditions, all sorts of other things that are not reserved words in a SQL query. So that is the primary reason why we use all caps for any SQL statements. That just helps us pick out the keywords a little bit easier, especially if we don’t have an IDE that is doing all of the coloring for us.

Okay, so let’s keep on moving forward here on to the from clause. So from is going to actually denote what table or tables your query is actually pulling data from. So in other words, it tells the query where all the data is coming from, you’re going to use from a lot of different contexts. And tables can be defined relatively loosely. For for now, our simple queries are going to pull from one single table inside of our, our schema or schema database. But there is a lot more that can kind of fall underneath the from clause. As we’ll see later in the class. We can use aliases as well for the from clause. So we can say from table XYZ as x, you know, whatever name we actually want to put for the table. So the aliases work the same way as a SELECT clause. And the names though that are available to the select clause, are are extracted from or inherited from the result of the from clause. So that’s how select happens right? So we actually even though we list select first in the SQL, the from clause actually has to execute first because the Select has to know what columns are available to it to actually purchase So we’ll take the the columns from the from clause, so all the columns from the table or tables that are in the from clause get passed to the select clause.

For projecting, let’s take a look at an example of this and action here. So we had this piece of SQL way earlier in these video series, so select star from sales orders. And you can notice here I get 1000s of rows in this sales orders table is from the worldwide importers database. If you don’t quite have that selected, please see the the setup video that’s out there, or feel free to reach out. And I can help you get that worldwide importers database set up for you. Because sometimes it can be a little bit tricky. But this database was provided by Microsoft, as an example database. So we’ll be using a lot of this database a lot in the lecture notes. So but as you can see here, lots of different sales information, we’ve got orders, we’ve got customer, the customer that made that order, who sold to that customer, and then all sorts of other things here as well order date and a variety of other things. So if I go down here, there’s actually quite a lot of columns in this table. So select star can be kind of kind of annoying, as far as as far as that goes, he very rarely need all of the columns from a given table. So as far as efficiency goes, it is far more efficient to select the specific table or the the specific columns from the specific table that you’re looking for.

Right. So for example, I maybe I only want something, say orders, maybe I only want the order ID and then the date that that order was made. And then the customer that made it, do customer ID. Now if I run this again, aha, my my results are far more clean, right, there’s a lot less information there that I need to actually consume. So column names default to whatever the tables column names actually are in the database. If it’s a direct reference, you can actually put the column name without the table that’s associated with it. That is perfectly fine, perfectly valid syntax. However, when you start to do multi table queries, it becomes less clear which column comes from which table. So it is more more common and better practice to always specify the table that that particular that particular column or attribute actually came from. So or dot. And there we go. We can also do the same thing here, right if with an alias, so if I did, as, and then if I really wanted to be shorthand, I can say, oh, but notice now that my syntax no longer works. Because the Orders table does not is not does not exist. It does exist in the database. But it does not exist as far as an option or an available source of information. Because it’s no longer included in my FROM clause. The table that’s included included in my FROM clause, as far as the Select knows, is just Oh. So let’s change that to oh, oh, and we’ll also there we go.

So now, I run that we’re back in business. So things become a little bit easier to actually do there with aliases, especially if you have like, a really long, like long schema name or a long table name. Sometimes the aliases are really nice. Just add a little shortcut there. And that’s relatively common practice. And as long as you don’t have a ton of tables, single character aliases are fine. But if you have a really big complicated query, this is you use good practice and naming your variables in your code. You want to use good practice in naming things like aliases inside of your SQL for our order date here, we can actually change this up a little bit. We can repeat, we can project a column more than once. And what do I mean by that? Well, what if I add here? The order year? Right. So there is this handy dandy function called year. And again, we’ll use I’ll show more more functionality with the date time format stuff here later, but I can say year here, and if I run that, I get the order year so just the year out of the date that’s full date is still there. Although I should probably add a alias here. Just to be a little bit easier to easier to run. That is just the basic select from here and the next series of videos. The next video, we’ll look at expanding our simple query


YouTube Video

Video Transcription

So now that we have actually worked a little bit with select and from So specifying which tables we want to pull data from, and which columns from those from those tables we want. Now we can actually filter which rows that we want. So the from clause is again, right the selection operation, right. So it is all of the rows, all of the rows of data that we want to pull as part of our query. Now, select does filter a little bit, but it filters only vertically. So it filters which columns actually end up showing up. But to filter which rows show up, we need the where clause, the where clause, as I mentioned, just provides the basic filtering per row. And now it will accept any predicate or Boolean expression as part of it. So all of the Boolean expressions that you’ve learned so far in just your basic programming, will actually apply here, for the most part. So we are partially supporting the selection operations.

So from gives me all of the rows from a particular table or tables. And if I ever want, if I want to reduce the number of rows in that set, I use the where clause. So I’m filtering out which rows actually end up showing. So let’s take a look at a few examples here, a little bit easier to actually see this in action, rather than listening here to me talk about them. So let’s add a new cell down here. So as we saw before, from our, our larger query, so if I run that again, right, we have large 1000s and 1000s of rows there. So how do we actually reduce that to be only a specific set. So if we take this same exact query that we had, right, and then I’m going to add to this a where clause. So select from where, and then I’m going to say year, O dot, order date, and then set that is that equal to 2016, since we are only doing orders of 2016. So now, if you run that and look at the order date, so nice, my the number of rows that actually have here are significantly fewer. And my order date is only 2016. orders that have a year of 2016. So the Boolean operations are very similar to what you would expect in Python or Java. Of course, now I’m not using the double equals for equality, I’m using the single equals which can be confusing. In this context, when used in the where clause, it is not an assignment operator, it is the equality operator for the UI for a Boolean expression.

Now, years are kind of tricky. What I have here is a specific order date, but I actually have to convert the way I’m actually writing this code, I’m converting each time each date, so year, month, day, to a single year. So I take the full year, convert it into a car full date, and then convert it to just a year. But I can do this exact same thing down here by just doing a range on the date instead of having to convert it. And typically that is going to be the preferred way of doing so because it is a little bit more efficient. And with databases, unlike our our code. We want our code to be efficient. But it is more important for SQL queries to be efficient as possible. Because we’re dealing with 1000s upon 1000s of records, the majority of time, think about, you know, writing queries for something like Amazon, right has millions upon millions of things of records there. And so if we have an inefficiency in one of our queries, that adds up to a significant amount of extra processing time over some period on our servers, and of course, a worse experience for the our our end user. So we do want to be as efficient as we possibly can. So let’s go in here and say order date. And then we are going to do, we’re going to use like a greater than or equal to here. So and I’m going to put the date here as a stream. And so this is an easy way to do this, this doesn’t have to be an official date time datatype as long as the string matches what we’re actually looking at, so we’re doing one one of 2016, so January 1 2016. And then we are going to put an AND operator here, so and ODOT, order date, order date, and you’ll find the IntelliSense with SQL is hit and miss that time.

So whether you’re in your Azure Data Studio, or SQL Management Studio, or whatever your IDE you’re using IntelliSense can be hit and miss, which is what’s happening there. But anyway, so let’s, let’s put, we want our, our date to be less than 2017. So we’ll put the first of January of 2017. Now I could I could put the end of 2016. And do make sure make this less than or equal to, that also would have worked. And then we’ll close that off with our clause. See here, that’s where my mistake was I had an extra semicolon, semicolon, by the way, as I’m showing you here, denotes the end of a SQL statement. Okay, run that. And there we go, I get all again, all of the orders that were made in 2016. But this is actually a little bit more efficient than the query that I showed previously, where I’m converting the date to a year and then comparing it to the number here, I just compare the date directly without actually modifying its format. And I’ll be showing a variety of these little things as we’re working through our examples here and through assignments. The code I’m the sequel that I’m using here is a very basic WHERE clause write this using a date, but your where clause is essentially used on any column, that that is being available or projected from our slot, right. So whatever columns are available, I can actually pull them out there. So I’ll actually show some of the orders or some of the sets there, but it’s not specific just to the select clause right? The from operates first, then the where and then the select.

So the rows are filtered before they make it to the select. So the selection operation happens with the from and where clause is done before projection. So selection first, then projection. So select actually happens after the from and where clause SQL statements. But here I just used simple equality check. And greater than less than, but there are a lot of different Boolean operators that we can utilize inside of our WHERE clause, and a variety of other places in our SQL statements. booleans though are the only are only supported as expressions. So there is no actual boolean data type. So you know, in Java, we have Boolean. And even in Python, we have a false and true type associated with the language, but SQL really doesn’t. They just use it as expressions. And that’s the vast majority of database management systems. So SQL Server, MySQL, and a variety of others will have very similar similar goes. So where can we use these Boolean expressions. So we’ve already seen them being used in WHERE clause in my examples. But we also have if statements and loops inside of inside of our SQL statements, and we can also have like case a case function, which will I’ll showcase here in a later video.

So the case function is very similar to the switch statement in Java. Although of course, Python does not have a switch statement, but more or less just a shorthand series of ifs. But we’ll get to that here. And not too long, but all the operators that are are all the primary operators that are supported for Boolean are mostly standard except as I mentioned, as you see here, and in my previous example, the equality operator is not enough. equals, it’s just the single equals, we have greater than less than or equal to naught is done a little bit different. So the standard way of doing not is less than greater than. But there are others that are supported like the exclamation points. Okay. So not equal to, not less than not greater than those are supported. But they are not part of the SQL standard. So your mileage may vary, depending on which database language, you’re actually using all SQL, but each, each company implements it in a slightly different flavor.

Most of your languages that we work with write a boolean value, all right, even if we talk about just general logic is true or false, right? There’s no in between. But with databases, we actually introduce a third value called unknown. Unknown, that’s kind of a weird situation, right? Because what happens, if a value is no, most languages know is going to come back as false or false see, because no being the absence of value, the absence of value cannot be true, because there’s nothing there, which is a lot of the same case in a lot of languages. But with SQL, as you’ll see here, we are going to pull a query like this one sec, let’s clear. Clear that there and run this. So again, write slug star, I’m using Select star, just as a quick example. Try not to get into the habit of using Select star for solutions to things, it is very useful tool to just kind of explore results. But at the end of the day, you’ll want to reduce that and actually specify your columns. But if I run this here, you’ll see that nothing gets returned. Right? Nothing gets returned. Because nothing is no All right, the order date does not actually know. But at the end of the day, right? A lot of the times here, this is going to be still evaluate to true or true, false or on known. Right? So even if it is unknown, right? It’s not going to actually show up.

So if I showcase this here, with this query here, the where clause is filtering the rows by order date, order dates that are not No, right, that are not equal to no. And running these, I still get zero rows, right? Because date, order date is actually a non nullable column. So the order date must exist. But a better way of actually showing this, because there’s also there’s seven like 73,000, some odd 100 rows and the orders. Table. But let’s switch this to a a column. That is no. Okay. So if I flip this back, right. Not no. Oh, dot internal comments, not No, I get nothing as a result. But if I flip this, right, say equal NULL, also, I the results are nothing right, and nothing is actually coming out. But if I flip this to say, is no. I actually get quite a lot of records out. So if I scroll over here, and ternal comments, right? So here you can see internal comments. All of these are actually not no sales orders that have no internal comments. But you notice that the equal sign and the not equals operator, both of those don’t actually work for naught because the Boolean comparison here, all right, a value so no, not equal, no is actually unknown, because we have unknown there so it’s not actually true. So don’t add so those items never actually get returned as a result in your query. So if you are ever working with no or a which is usually the case for for things that are non null or nullable columns. And if you’re trying to check for null, the is operator is usually the preferred way to do the Boolean comparison.

So, this will return true if the internal column comm internal comment is no right. So, if we backtrack this, I can say is I can also say is not no. Right? This is more more so related to Python than it is and how things are compared to Java, right? So we say is, is no, or is none in Python. And instead, and we also have an actual better pit person here. So, we actually have, we can actually showcase this one here, since this is columns are here. So we have a whole bunch of normal columns and we can show those that are not no there. So, these orders have already been picked, the items have already been picked up. So, that’s just another example of how we can utilize Really N expressions. Like said Boolean expressions most commonly are going to be found in your where clause, but as we saw back here on our slide, we can find them in our where clauses, control statements like ifs and loops, as well as our as a case function. You may also see them in a variety of other ways in stored procedures as well. But that will conclude this part. In the next section, we will talk about grouping or group by


YouTube Video

Video Transcription

Welcome back everyone. Now in this video we are going to be taking a look at group by and having. So before we took a look at simple SELECT FROM clause or from query, along with where so remember, we select columns from our table, and where is there to filter out the rows. So remember that we have the selection and projection operation. So the selection being the from and where clause. So the selection of what rows of data are we going to include in our results, and then the the projection or the select statement in our SQL query is which columns are we going to include in our on our query result. But now we can actually also group those rows together, right. So after we have whichever rows that we actually have in which columns that we have, we can actually group the results by certain conditions. So each of these though, by the way, right, these are all still all optional items. So if you ever look at the actual official documentation, everything in these square brackets here are optional statements that can be included in your SQL query.

But let’s take a look at group by and having it’s a lot easier to start picking this up, as we start to show some more examples. But groupbuy specifically, is going to define a group based off of a set of call or a set of columns or expressions, right. So based off of whatever the expression is, whether it be some comparison, or a specific column, or two or more, this is what we’re going to group our rows by. So it’s defining the defining the rules that we’re actually doing the grouping by. So this does allow aggregation. So if we, let’s say, group by order date, right, we can get the number of orders for a particular date, right, which is a very useful query to run. And there’s a lot of other different aggregations that we can show here in just a second. There are many options available to the group by element. So we can do a list of columns that we group by, we can group by sets, expressions, and there’s also we call queue roll up, those we’re not really going to cover in this class. But if you’re interested, I am more than happy to tack on a video or post some text up, that kind of explains them. But these are some of the standard operations that you’ll see associated with group by. But most of the time, we’re going to be working with aggregates, so max, min, average, and count.

So those are most of the common aggregates that will actually work with. Now aggregates themselves are almost always this particular syntax. So the function, so the aggregate function that we have on on on the right side there, and so let’s say count, and then in parentheses, you will have an some sort of expression that will tell you or tell the function what to count, right? Out of those rows. So all is the default. And then we can do, we can do distinct, so do we want to count duplicates, for example, right? Or no duplicates. duplicates are no duplicates, and then whatever the expression is, so and I’ll show an example of this here in a few minutes. And as I mentioned before, some examples of this are min max, average, sum and count. Although count is similar, but count allows for no expression. So for some, for example, you can’t sum star right? But count is kind of unique there where we can actually put star as a wildcard and say just count all the things right count all the rows, but for like some min max average, you want to know what specific thing that you are summing up or averaging or finding the maximum right you want to know the specific column out of the group that you’re actually going to apply that function to count is a little bit different count that aggregate function actually returns the number of records or rows that are in that group. But this can be utilized and with the over clause, when we are utilizing partitions. But I’m going to kind of skip over partitions for now. And we will save that topic for another time. But just kind of be aware that it is there in case you see this as you’re looking at this, or reading about this online, but we will talk about partitions in a later lecture.

In this case, no values are ignored by default. So if there is no value of is not included as part of the aggregate, let’s take a look at some examples of the group by Alright, so now I got got myself put up towards the top of the screen. And this simple query, here I am selecting all the customer IDs from the orders and grouping them by the customer ID. So when I group by the customer ID, I am essentially grouping all of the records that are associated for that specific customer. And so if I run this, you actually can end up figuring out how many customers we actually have, or how many unique customers that we actually have here. So this is all of our customers that have placed an order with us. And of course, I can try to add columns here. But this becomes a little bit tricky. So let’s do au dot order date here, I run that I get an error. So this is one of the weird things are not to say or weird things, but not initially intuitive things about the group by clause. So I cannot project an element or an attribute or column. When we have a group buy, and if that column is not inside that group, so I cannot project a column that is not part of the grouping. Because I’m grouping by a specific condition, write an expression right here, I’m grouping by just the customer ID. And so each record, I mean, let me take this off real quick. When I get my query results here, the rows that are fed to the select clause are what you see here on the screen. And so when we have like 1234, for the customer ID, there is no date associated with each of those customer IDs in this case, because those columns have been filtered out already by the group by now I could put the date back in here. Okay, now, let me do this way.

So you can add columns here. Like if I wanted to do an O dot Sales Person ID, and put this up here. And rerun this here. This actually works, because I have the me actually sources here, right. So this will group all of for so for the salesperson, it’ll group them together and all of the actual customer. So if, if customer 531 had 10 orders with this particular salesperson, all of those records will show as one row. But what about the date? Well, the date comes across a little bit easier when we actually do aggregate functions. So if, for example, I take off the salesperson here and put back the ODOT order date, and then instead tack on a let’s say aggregate function now, what happens here? Uh huh There we go. So let’s put that as first order here. But so what I’m doing here is I’m grouping by our customer ID, and then I am going up here in my selection and say, Hey, give me the smallest order date for this customer ID. So, for this group, give me the smallest order out of that group. So for each of the customers I get the date of their first order in this particular table. When I have a group by I cannot project a column if it is not part of the grouped by, but I can project a column if it is part of the group by or I can project it if it is an aggregate, if it is an aggregate, so min max, average sum count all of those sorts of things. So I could here, I can say count, and then star. And this will tell me how many? No orders? So how many orders? Has that customer actually made? So, for each customer, when was their first order? And how many orders did they actually make, right? That is what I’m actually associating here.

So, this is group by group by itself is a very powerful expression. And it really does help combine and aggregate database results in that’ll be a really common operation that you’ll see as we start moving through the class. But let’s talk about how we could actually filter those results. So I showed how you can group your results. So group those rows. And then what about filtering those because the where clause doesn’t actually filter the groups the where clause filters the individual rows, before we get to the group by so the having clause or having element is what we can use to actually filter records after they’ve been grouped. So basically, just like the where clause accepts any Boolean expression that you express, but aggregates, aggregates can actually be used here, right? The WHERE clause cannot use the aggregates because where the where clause is a single row by row operation, the having clause is a group operation. So you have a group of things that you can apply this filter to. So therefore, you can also use aggregates as part of the filtering process. So let’s take a look at an example here, I’m going to replace the query that I had before. Do or here, and sorry, for my bad syntax here, let’s fill this in with count. Okay, so I’m going to run this and we can see what happens.

And I’ll kind of work on explaining this here. So we have our counts here. So this is the number of orders again, as I say, order count. So we have order counts. And then we have our men date here. And I’m actually going to add these to a new line. So they’re a little bit easier to read. And so we have as a first date. And then let’s have that as order year, that’ll work. This is last date. And this is first There we go. Okay, so now I got names on all my columns here. We’ve got order count, first date, last date, order year. So what I am actually grouping by here is the actual order year. And again, I am using a shorter syntax here, converting the order into just the year and grouping by that. But again, though, you can, you can group by the actual raw dates. But here we can actually, we can’t group by a range of dates here, because we can’t do group give me a group that goes from one 120 16 to 1231 2016. Because there’s a lot of dates in between that range. And so you have to give one value that is represented representative of that particular group, we can group by more than one column, by the way, as I showed earlier, but here, we’re just grouping by year.

So for each year, when was the first order? When was the last order and then what was that particular year. And that was our group by, but lots of things that we can actually do with this right? The, with the group by alright, we can actually have, we can actually filter now out specific groups, right, so we can actually filter out specific groups. So let’s take off the semicolon here, and actually add in our having clause now. So select from where group by so from orders where the picking completed when is not no so it’s actually If the order has been completed, fulfilled, group those by year. And for each of those for each of those groups that we have, I am going to specify that I only want the years where we were very successful. Okay. So I want the years are that we successfully completed, let’s say 20,000 orders. And if we look at our results there, that should give us two rows.

So if we run this, oh, yeah, there we go are two rows. So this particular this particular filter, we would not be able to successfully do in the in the where clause, because the where clause is row by row, where the having is group by group. So in having clause we can actually apply a aggregate function to filter out groups that only have certain things there. So that’s a very useful feature there. But that pretty much concludes for the series, the example SQL queries that I’m going to show here in the following video, I’m actually gonna take a quick a short amount of time here to actually talk about the processing order. So as I’ve been talking, I’ve been jumping around in these actual SQL queries, talking about what each part actually does, but there’s actually a very specific order that these actually get executed in so we’ll take a look at that next

Processing Order

YouTube Video

Video Transcription

Welcome back, everyone. So this will be our last little video for our single table queries part one series. But in this video, we’ll talk about processing order of our query. Now, before I talked a little bit about how our SQL language is not a procedural language, so we’re not going step by step, line by line from line one to line x, in order that they actually happen. So there’s a different processing order that actually happens than what’s actually shown on the screen. So we talked a lot about the major elements of select, and we’ll talk more about the SELECT query. Later in this class, we’ll add some more elements to it. But the core bits here are SELECT FROM WHERE group by and having are the ones that we’ve talked about so far. Now, the order is not the order that they are shown here on the screen, it does not go from SELECT FROM WHERE group by having, we actually go to the from clause first.

So we need to know the source of our data before we can do before we can do anything, right. And we know nothing until we know where the source is from. Now, I can do the slug claws before without any of the things underneath it, right. So if we just have select by itself, of course, the slug will actually be will be done first. But if we if we have select and other things, select is not the first thing, right? So from this data source, so from XYZ table, where, right, so we filter out the rows from that table, right? So from Table A, we want only the rows that match this particular condition. And then we can then group those rows together by a certain condition, right. So group by color, right? Having, let’s say a count of 10, right. And then we actually select right, we actually do the actual projection. So remember, all of all of this here from where group by and having are all part of the selection operation. And then the select is the projection, right? Select is the which columns are vertically, which things are we actually going to show the forum were grouped by having are all horizontal, so which rows are we going to show, so we pick the rows that we want first, and then we pick the columns from those rows that we want.

But this is really going to be something that I will really work on and repeat quite often in these videos is the processing order because it is not intuitive. When you first start reading SQL, that it is not the order that you read it in that it operates in, it’s a different processing order, it executes in a different order than what is actually shown there on the screen. And just as a review, here, this is just a friendly slide help remind you of all of the different syntax that we actually covered here for our simple single table query. So we have the SELECT clause, and then from were grouped by having this is the typical typical ordering that you’ll actually see them written in and of course that’ll be talked about just before that actually operates executes from where group by having and then select. Now, any one of these down here are optional, right I can have a From clause and not a where clause, but a group by I can have a group by without having, I cannot have a having clause without group by though, having the having clause must be paired with the grouping, because the having is an aggregate filter, not a row by row filter. That is one limitation of the syntax there. But that really concludes the first part of our simple table queries. This is the first part of a three part series that we’ll be talking about, for doing single table queries. I will see you in the next video.