Transaction Log

Video Transcription

This topic I’m going to be mostly skimming through, it’s important to be aware of, but not, I’m not really going to be testing on it. It’s more of a, what happens underneath the hood sort of thing. So we talked about transactions last time, right? Transactions are a essentially a receipt for any, any SQL that we actually execute on our database, including selects, updates, inserts, and deletes. These are really important, because that allows us to enforce acid, which remember animosity. So either all or nothing. So we can’t have if we do for updating 100 rows, acid implies that we can’t update 50 of them and let the other 50 Just go out into cyberspace, right? So consistency, which is really important. So think of a database as as just a bunch of states, right? When we do an update, we transition from one state of the database where those updates don’t exist to a state of the database that those updates exist.

And we don’t want any in between states, right? We want one single transition transition between those. So if we insert a row, we will go from not having that row to having that row, nothing in between. And so that’s where we get our consistency of data. And with our transactions, we showed how, if we don’t enforce acid, we can see those inconsistencies and happen in between if a read or something happens while a write is in process. Likewise, we talked about isolation. So we have different isolation levels that have different levels of properties. So that prevent dirty reads phantom rows and things like that. But if we execute our SQL and pn and full isolation, then we have no issues whatsoever. But of course, that does come with a cost of less concurrency. And so you do have to play the balance between those. And then we also talked about durability. So if we do make a change to the database, can we be certain that that change stays, right? This is what happens when we commit a transaction. Because in a during a transaction, we can actually execute our update or insert or select or whatever.

But if it’s not committed, the not transaction, the SQL inside of that transaction hasn’t officially happened yet. So that is what we get with durability. Okay. So acid in general is extremely important. When we’re talking about SQL, relational databases, no SQL, like Mongo have different ACID properties. So if we get time to talk about no SQL, I’ll bring that topic back up. But in the big thing that we look at here, consistency and isolation in terms of acid are accomplished. Oh, yes, question. It depends on which NoSQL database you’re talking about. Different no SQL database engines have different properties, that that, that forgo some asset, some acid property for a foreign other. Most of the time, no SQL will abandon certain properties or lessen them less than the restrictions on them in favor of high concurrency or high availability. That’s typically what no SQL goes for us. Super high concurrency and very high, high availability.

And and relational databases at times can limit as we saw with transactions different, depending on the isolation level, will limit the concurrency and limit the availability of the data if a transaction is running, right, depending on depending on the setup, so yeah, no sequel will favor concurrency and availability over over most things. No sequel is valued for its fast, insanely fast read. Not so much for REITs. Right. So no sequel is mainly for reading data instead of writing data. But again, yeah, we’ll talk more about that later. So an acid consistency isolation, primarily dues used for locks. So we’ve talked about shared locks, exclusive locks, and a little bit about row versioning. So especially things like snaps sharding and the extreme case, right, we can snapshot something. And we have one version of the database that we run on run one transaction on, versus a read, which has a different snapshot that it can run its transaction on. And the atomicity and durability are complex using what we call transaction log. Right?

So when we talked about last time, I didn’t really talk about durability or animosity, we did a little bit of durability with the commits. But we primarily looked at consistency and isolation in my examples, okay, so the transaction log is just a log, alright, a string of log records that record all data modifications associated with a identifying number. So log sequence number, that is a sequentially increasing log number. So 12345, etc, etc, really simple. Each record is going to contain the ID of the transaction. So remember it right as we go through your we’re recording transactions that happen on the database. And then each record will also contain a back pointer to the previous record. Right. So we did roll backs, right transaction, rollback, how do you know where to rollback to? Right? That’s thanks to the transaction log. This is what gives us that durability, right whether something is committed or not committed. And so when it when a record is committed, then we don’t we don’t rollback anymore, so that that part of the transaction log can be released from the log itself, because we don’t need it anymore. But if we roll back, then we can use that backwards pointer to reverse whatever actually happened in that period. So that is the benefit here of our transaction log.

So many purposes, mainly for acid. That’s the really big purpose here, why why we need a transaction log. But with it not, we can also as we as we showed last time, we can recover certain things, as part of our trends are individual transactions that we execute. So we can choose to rollback, or we can choose to commit. But we can also roll forward and rollback for transactions if we have anything that was incomplete. So for example, if you were in the middle of a transaction on a database, and the database connection, failed, or the database just failed completely, maybe the power went out and you didn’t have a battery backup. Well, that transaction, that incomplete transaction should in theory exist in the transaction log regardless. And so when the database actually starts up, if there’s an incomplete transaction in the transaction log, it will automatically roll that incomplete transaction back, right, because it is not consistent. And so in that sense, we roll that back to ensure data durability as part of our ACID properties. So that’s what we roll back. And we can even roll back an entire database that not just an individual transaction, but we can roll back the database itself if we have a complete transaction log, because remember that transaction log is going to record every operation that happens. And so if we have a complete record of those transactions, we can either roll forward or roll backwards, depending on which direction we need to go. Okay.

So SQL Server itself does this for replication. So we can say we have, if we want to ensure our data, high availability of our data, let’s say we have a backup, we could we could have a base, a lot of times what happens we have a ghost data, like a ghost database. So we have Database A, which is the one that is front facing to the world. But then we have database B which is the exact same database, it’s just a copy of it. And so when a transaction happens on database, A, that transaction can be ghosted on to database B, and so that database B is a relatively close to close up to date, version of the production level. So if the production level database gets corrupted or fails in any way, then the fallback database can be activated and everything. There’s no interruptions as a result. Thank you. That’s one way we can use this. And so that encourages high availability of our data, right? So if we can replicate, we can ensure that there are no interruptions for our database, which is really important if you’re running something like an online service, or even something internal, on a on a desktop app. So what is recorded? Well, initially, for we’re dealing with transaction, so the beginning and end of each transaction, of course, and then every data modification associated in between those transactions. And so whether or not you actually explicitly begin a transaction, the these might be recorded underneath the hood anyways, and all of this all these trends, all this transaction log, you don’t have to do anything for it.

This is managed and done by SQL Server itself. As this is nothing that you really have control over or need to worry about as a database designer. This is just happened. This all just happens underneath the hood. Every basically, so when you when you write data, update data, cetera, et cetera, right, you’re allocating new data Speight, you’re allocating new memory. So every page every data allocation, that is also recorded as part of the transaction log. Likewise, if you create a table or an index, we’ll talk about indexes next time. But if you’re creating a table or dropping a table that is also recorded, because we want to be able to roll that back, if so, if that happens, and doesn’t get completed fully, that could really mess up your data, right. And so we want to be able to reverse that, if we have to. So, this poor how much time I got. So I may or may not, I might not have enough time to actually draw a log out, I plan on drawing one, but doesn’t look like I’m going to have enough time to talk about it in theory. So the log itself is just a wrapper around file.

So think of a circular array. So you have an array that just constantly wraps on itself. As, as you grow, if you run out of space, it just keeps on growing. And then if you and, and what the transaction log, if you if you get to if everything has been committed, right then and nothing is, is available or nothing, no other transactions are active. So we have that means we have a dead part of the dead part. But we have a part of the log file that we don’t need anymore, because there’s no way we can, there’s no need for us rollback at that point, because there’s nothing active, and everything has been committed or rolled back by already. And so when that happens, those records can be released. And then the file can then run back into that spot and the data on the file. And so in that sense, everything is partitioned into virtual files. And so you can kind of imagine this log file being segmented into chunks.

And each of those chunks can be active or inactive at any time. So inactive meaning that even if there is data inside of that, inside of that virtual log file, it means that anything that is in there isn’t needed anymore. And so that can be either overwritten or deleted, after being that it has an active transaction as part of it. And so that’s what we each of those stores that indicator, whether it is active or inactive, we also keep track of the minimum lsn. Right. So this is the transaction number, right? So we are the the specific log sequence number. So we want to keep track of the minimum number that we have to roll back to if something happens. So if let’s say we have a bunch of live transactions were in the transaction log do we have to roll back to to guarantee acid right and so that is the number that the log file keeps track of. And that way if does something does happen then a nuclear option so to speak, then we can roll back to that minimum lsn minimum log sequence number in order to ensure that our database ACID properties are hold or held. Sorry So right ahead logging.

So this is done. So basically, we are logging our actions before they’re actually happening on the disk itself. So if an update, if a transaction happens, and an update happens in that transaction, the record of that update is stored into the log file before the update is actually physically written to your hard drive. Because right, the hard drive is going to be a large source of potential errors, right, because as a physical device, it’s the only primary physical device on your computer. And so it is the subject of many errors, and hard drives are more likely to fail. And so we want to be sure to have record of that actual update before it actually happens physically, in case something happens there. Because right if, if something does physically happen with the hard drive, then that doesn’t actually, that has the potential of never making it to your log file. And if it doesn’t make it to your log file, it doesn’t happen. And there were four, we can’t assure acid. And that’s what we’re trying to achieve here. But we also do have these ideas of checkpoints.

So this will essentially happen every once in a while. So after so many X number of transactions or whatever happen, and you have no active transactions in the system, it will do a checkpoint to basically commit everything, flush anything that is in memory, too, and write it to disk. So anything that is being stored in memory in RAM, for your log file gets actually physically written to the hard drive. So nothing is actually lost. So checkpoints are very useful. And then, of course, every once in a while, the log gets truncated. And so inactive virtual log files get flagged for use, right. So if your log file is getting too, this helps prevent your log file from getting too big. Because if you don’t want this infinitely growing file in your system, because data that you wrote two years ago, is irrelevant to data that you have now. And so there’s that space where that transaction information is stored can be reused for later.

Typically, typically, if there is no active transactions, and that virtual log file would and it is not currently. So if a lot if a virtual log file is full, and it has no active transactions as part of it, that need that could potentially needed to be rolled back to. And that’s indicated by this minimum log sequence number. So if the if there’s no if the sequence number and that virtual log file is lower than the minimum lsn, then that that log file, that virtual log file can be flagged as inactive. Because there’s there is what this Minella said that we would never be rolling back into anything in that virtual log file, therefore, it can be flagged for reuse. And then it can be truncated and save save memory. If I had time, I would totally sit here I would sit here and draw draw out a log file. But we are short on time today. So I’m going to skip over this part.

But as you can imagine, so these zeros up here at the top indicates virtual log files. And so you can kind of see here is I have this segmented into four virtual log files. So 1234. And then down here at the bottom is the log sequence file or the log sequence number. So this is 012345 cetera, okay. And so in each one of these, you would actually see things like like one B, which is the begin transaction one, and C as complete transaction or commit transaction, and so on and so forth. So, if you want to see some examples of these, I can do a quick Google search and see some see some examples of this. Again, this is more just for your information, not necessarily expecting you to really truly remember For all the details of this, but just know that the the really big point here from today is that the log file, the log transaction file is important to know that it exists because that’s what ensures atomicity and durability for acid.

Okay? So when the log gets truncated, so what happens typically, when the log gets backed up, so or after a checkpoint, that’s when virtual so that’s when virtual log files are flagged for being reused. That’s typically when that happens. It also happens during database backups and database recoveries, right. So when backups happen, that transaction log is heavily relied on. And so we can see truncation happened during that phase as well. But the reason why we keep that log file nice and neat and clean is so we can do recover certain things as part of your database, things like rollback, yeah, checkpoints are done automatically by SQL Server. And off the top of my head, I can’t remember exactly every time when they when it determines when a checkpoint is needed.

But yeah, the SQL server handles the checkpoints. So during recovery, we have a couple things here, redo and undo. So redo is rolling forward committed changes. So if something gets interrupted, we have the transaction log from the future, right. So if a transaction happens, but we need to reapply that transaction, we can do so we can also roll back that transaction as well. And then we can also rollback or redo or undo certain logical operations, right, like individual updates, individual deletes and things like that, right. So transaction wise, we can roll back and roll forward. Same thing we can do, we can do the exact same thing to for the logical operations because we have to be able to do logical operations order to actually undo or redo a transaction so it makes sense