Chapter 7

Encoding

ASCII Code ASCII Code

Subsections of Encoding

Introduction

YouTube Video

Resources

Video Script

In this module, we will discuss how we can store data in our computers using various types of encoding. Recall, the computers only operate on binary values, which we’ll talk about a lot today in this lecture. And so how do we take things such as images and text and graphics and videos, and make all of those things accessible to our computer? Before we continue, let’s take a look at a few of the things we’ve covered in this class. So far, we’ve covered the work of Leibniz and his creation of the Leibniz wheel, which led to mechanical computers. We’ve covered George Boole and his approach to logic that we call Boolean logic which we use today in our modern computers. We’ve talked about Charles Babbage, the father of the modern computer and his invention of mechanical computation devices. And we’ve also talked about Claude Shannon, whose master’s thesis of using electrical circuits to simulate logic using Boolean logic is foundational to the creation of the electronic computers that we use today.

But there’s one more person that we need to talk about to set the stage for today, and that is George Stibitz. George Stibitz was an inventor and one of the things he was working on was the creation of a true electronic calculator. And so in 1937, he sat down and he created what he called the model K, which is named after his kitchen table. And it was a device that was capable of performing addition using two binary numbers. And this is very important because at this time, a lot of the mechanical computers of the day, we’re still using decimal or base 10 numbers like we use today. George Stibitz was really interested in performing that same mathematics using binary numbers, because he saw the value of an electronic signal with one and zero on and off being represented in binary.

So a little bit later, he was able to create his complex numerical calculator in 1940. And it was very unique because it could perform all of the calculations on very complex numbers using electronics, and it could also be operated remotely. And so during one of his demonstrations, he actually was in New York City and was demonstrating the device which was located at Dartmouth College in New Hampshire. And so he was talking to it via a phone line. And so not only is this really the first example of any sort of electronic machine doing large scale calculations like this, but this is also an example of one of the first remotely access computer systems. So let’s take a look at a video of the complex numerical calculator and see how it worked.

AT&T Archives: Invention of the First Electric Computer

YouTube Video

Binary Numbers

YouTube Video

Resources

Video Script

The first thing we’re going to cover in this module is the concept of binary numbers. Binary numbers are really the core of everything that a computer does. And so we need a way that we can convert the base 10 numbers that we know today such as 24 42 86, into binary numbers that only use ones and zeros.

So let’s start with simple natural numbers, the counting numbers, the whole numbers that are greater than zero. So let’s take a look at this example. In here we have a binary number 00101010. And above that binary number we’ve put in the value of each place. So in a decimal number, each of these places would be powers of 10. So the first place would be one followed by 10 100 1,000, and so on. And so we know that the number 1234 would be 1, 234.

In this number, however, we see that we only have ones and zeros, but they use the powers of two instead of the powers of 10 in each place. So let’s look at how we can calculate the actual decimal value of this binary number. To calculate the value of a binary number, all we have to do is look for the places that have one. And then we know that this is one times 32. We have a zero here, so there’s no value there. We have one here. So this is one times eight. We have a zero here, we have a one here, so this is one times two. Notice that this is the same thing that we do when we’re using decimal numbers. Remember, the example I just gave 1234 would be one times 1000 plus two times 100 plus three times 10 plus four. So this value is 32 plus eight plus two, which is 42.

At this point, if you are a fan of the work of Douglas Adams, you will find a definite pattern in the rest of this lecture, we’re going to look at many different ways that we can encode the same number 42, in all sorts of different ways in our computer system. So as we saw with this example, it’s really easy to take a binary number and convert it into its decimal value. All we have to do is multiply the ones by their place values in the binary number and then add the resulting number together to get the binary number.

What if we want to go the other way? How would we do that? For example, let’s say we want to encode the number 86 as a binary number. To do this, we would have to again know our places are powers of two. So first, look at 128. Since 86, is less than 128, we know we can’t have any values there. So we’ll give that a zero. The next place would be 64. Since 86, is greater than 64. We know that we have at least one value of 64 in here, and now we’ll have to do at Minus 64 and we will get 22. So now we go to our next power of two, which is 32, 32 is greater than 22, which means we’ll give this a zero, then we will have 1616 is less than 22. So we get a one here, and then we’ll subtract 16 and we will get six. So our next power is eight, eight is greater than six, we’ll have a zero, then we’ll have four, four is less than six. So we’ll have a one here, and then we subtract four and we get two, then we’ll have a two that is the same value. So we’ll put a one and then two minus two is zero. So our ones place will have a zero. And so that tells us the binary number 01010110 is equal to the decimal number 86.

Two's Complement

YouTube Video

Resources

Video Script

In the previous video, we looked at binary numbers that are natural numbers, which are whole numbers greater than zero. There are a couple other binary data types that we’re going to look at in this module. The first one would be a signed integer, and assigned integer allows us to have negative values by changing the sign at the front of the number. So just like with decimal numbers, where we put a minus sign in the front to differentiate between positive numbers and negative numbers, we can do something similar with binary numbers by changing a sign bit at the front of the number to determine if it’s positive or negative. But there are a couple of caveats to that. And let’s take a look at that in this video.

So to understand negative numbers in binary, let’s start with our positive number again. On the last slide, we saw that the number 101010 in binary is the decimal number 42. In a signed binary number, instead of using this first digit as 128, we’ll use it as our assignments. And we will say by convention that if this bit is zero, it’s a positive number. And if this bit is one, it’s a negative number. So of course, we want an easy way to switch between our positive and our negative numbers.

So one of the first ways you can think about negative numbers in binary is what’s called the ones compliments. And so to calculate the ones compliment, we simply invert all of the bits. And so we would say that this number is negative 42, in ones compliments. So as we saw in that example, ones compliment is a very easy way that we can calculate the negative number of a binary number, we just invert all of the bits and declare the first bit to be the sign bit where zero is positive and one is negative.

However, there is a problem with the ones complement method of finding negative numbers in binary, which is why we don’t actually use it in practice. Let’s take a look at what that problem is and see if we can spot it. One of the most important things about binary numbers and any negative numbers, in fact is we should be able to add a positive value and a negative value together and get a logical result.

So let’s try some ones complement addition and see what happens. So we’ve already seen the positive value of 42. Before. And previously In this video, we calculated the negative value of 42 using one’s complement. So with these two values, if we add them together, we should get a value equal to zero. And remember, in binary, a value equal to zero is all zeros. So let’s try this addition and see what happens. Since a binary number is just like any other number, addition should work just like we expect. So here we would do zero plus one, and we get one, we do one plus zero, we get one. And very quickly, we’ll realize that since we inverted these values, each place is going to have a different value. So when we do positive 42 plus negative 42. As one’s complement, we get this number that is all ones. So when we add the positive value of 42, and the negative value of 42 using one’s complement, we end up with this number that is all ones.

So what is this number? Well, we know that this number is a negative number, because the first bit the sign bit is one. And so to find its positive value, we would invert all the bits. So we would invert all of these ones to zeros. And so we know that this number is zero in decimal. So if this is zero, then this number must be negative zero.

But wait a minute, is there such a thing as negative zero? That really doesn’t make any sense? And that is where the problem with one’s compliment lies. If we take the negative value and its positive value, and we add them together using one complement, we end up with this weird number called negative zero, which doesn’t exist. So one’s compliment while it seems to make sense on the surface, mathematically, it really doesn’t work out. And so here’s the result of this example. As we saw, we end up with a number that is negative zero.

So we need to come up with a better way for us to calculate a negative number in binary. The answer to that is a process we call two’s complement. Two’s complement is very similar to one’s complement with one extra step. In two’s complement, we will invert all of the bits, and then we will add one to the value of the number. So let’s try that on our example number of 42.

To calculate the two’s complement of 42, the first thing we will do is we will invert all of the bits and then we will add the value one to that resulting number. So we have one plus one in binary is a zero and then we will carry the one, then we’ll have one plus zero here, we will get one, and then the rest of this will just carry down. So the value negative 42 in two’s complement looks like this 11010110. There is a shortcut way to do two’s complement, where you start at the end of the number and you find the first one. Everything after that stays the same, and everything in front of that gets invert. So, again we see one zero stays the same, and and everything in front of that one gets inverted. That’s a quick shortcut to do two’s complement. So remember, to perform two’s complement, we first invert all of the bits, and then we add one and we will get this value for negative 42.

So now that we’ve calculated the value of negative 42 using two’s complement. Let’s do that same example before we add these two values together and check that our result makes sense. So once again, we want to do our addition example and make sure that this works. So now if we start over at this column, we will have zero plus zero is zero, we will have one plus one will become zero, we will carry a one, then we will have one plus one is zero, carry the one, zero, carry the one, zero, you can kind of see a pattern here where we have zero, carry the one zero, carry the one, zero, and then we will have this one that overflows. And right now, we’re not going to worry about that most math processors just ignore this one. But there is technically a one that does overflow when we do this, but now you’ll notice that we get a number that is all zeros, which in binary is clearly the value zero. So two complement addition works just like we expect it to.

If we take 42 plus negative 42, we will get the value positive zero. And once again, here’s the example on this slide completely worked out. So it works. That’s pretty cool. There are, of course, a few other binary values that are important to understand. We know that binary values can be both signed and unsigned. And so what’s really interesting is if we start counting up from zero, the sign values and the unsigned values will be the same all the way up to 127. Then when the first bit becomes one, the sine value and the unsigned value will diverge. At that point, the unsigned value will be 128, but the sine value will be negative 128. And so as we continue to count up in binary, the unsigned value will continue to get larger, whereas the sine value will start to get smaller all the way back down to negative one. This is why we get some really interesting things in the range of these values.

And that particular problem is known as integer overflow. If you’re working with a signed integer, but you’re always counting up, eventually you’ll reach a point where it will overflow and go from the highest possible value to the lowest possible value. And this is really hilariously explained in this XKCD comic, where if you count sheep and you get to 32,767, you will overflow and get to negative 32,768. And you’ll have to start all over again with your sheep going the wrong way over the fence. Because of this eight bit binary numbers have an interesting range and unsigned eight bit binary number can go from the value zero all the way to the value of two to the eighth minus one which is 255. Assigned value, however, can go from negative two to the seventh, which is negative 128 to positive to the seventh minus one, which is positive 127. So So in general, a binary number within bits, if it’s unsigned can go all the way to two to the n minus one, where if it’s signed, it’s negative two to the n minus one to positive two to the n minus one minus one.

Floating Point

YouTube Video

Resources

Video Script

So far in this module, we’ve only dealt with whole numbers such as positive and negative whole numbers or integers. But what about numbers that have decimal points in them? How would we deal with those? In mathematics, of course, we call these rational numbers because they can be expressed as a ratio or a fraction. And so in mathematics, one thing that we’ve used for rational numbers would be scientific notation. You may have seen this a few times in your science class, you’ll have numbers like 1.0 * 10^5.

In binary, we use a similar system that we call floating point. And the whole idea behind floating point like scientific notation is that the decimal point in the number can float around. And specifically, we can do something really cool where we can express a decimal number as two whole numbers, a mantissa, which is the value and then exponent, which is a power that is used to adjust the location of the decimal point. So this slide right here actually shows an example of what that looks like for scientific notation. And we do something very similar for binary numbers.

Binary numbers use the system of floating point, which is based on the IEEE754 standard. And in this slide, I’m going to show you the 16 bit or half size example, which uses an exponent bias of 15, which we will understand in a little bit. Also, it’s really important to understand that the leading one of the mantissa is implied. And again, we’ll talk about that in a minute. In most actual computer systems, instead of a 16 bit number, you would have a 32, 64 or 128 bit floating point number, but that gets a little bit hard to do on a simple screen. So we’ll use 16 bits, but the theory is very similar.

So let’s see what it takes to convert this floating point number into its equivalent decimal for so this example we have a 16 bit floating point number, and you notice it consists of three parts. The first bit is the assignment, and since the assignment is zero, we know that this is going To be a positive number, the next five bits are the exponent. This exponent is 10100. This exponent is equivalent to a decimal value, where we have one, two, this is four, so we have four 8, 16. So you have 16 plus four is 20. Now the important thing to remember is the exponent has a bias of 15. So when we’re going from this value into its decimal value, we really have to take this binary value 20 minus 15, and we get an exponent of five. This allows us to actually store both positive and negative exponents using a positive binary number. For example, if we want to store the exponent, negative five, we would have negative five here and we would add 15 to it and we would get 10 And so then we will encode the binary value 10, which is 01010 as our exponent. But again, we’re not doing that in this example. So we’re going to erase that. And we know our exponent is five.

The next thing we need is our mantissa. And so in the explanation, we saw that there is an implied one here at the front of the mantissa. So this mantissa is really the binary value 1.0101. Now there are two ways to think about this mantissa Of course, we can calculate it directly. And just like with decimal values, where items to the other side of the decimal point are divide our negative powers of that value, so this is two to the zero, so this would be two to the negative one to the negative two to the negative three and so on. So we can actually calculate this value As one plus one fourth plus one 16th. So we can actually calculate this value, one plus one fourth plus one 16th, which is approximately 1.3125. So we have the value 1.3125 times to the fifth. And we know that two to the fifth is 32. So if we take 1.3125 times 32, we will actually get exactly the value 42. So that is the actual decimal value of this floating point number.

That may seem a little complicated, but there is actually a much easier way to do this. Recall that we started with the binary number 1.0101. And we know our exponent is five. So before we do any conversion, all we have to do is move this decimal place five places and so we end ended up with the binary value 101010 with the decimal place here at the end. And of course, we know that the binary value 101010 is equal to 42. So in calculating this value, a lot of people find it much easier to move the decimal place first, and then calculate the binary value, instead of calculating the binary value and then multiplying it times two to the fifth or two to the whatever the power is. Either way works, I have found the second way, much, much simpler.

So as we saw with this example, we can calculate the value of the mantissa. And we can calculate the value of the exponent to be 1.3125, and five, so the overall value is 1.3125 times to the fifth, which is 42. Or we can take the binary value times two to the fifth and simply slide the decimal place over and we’ll find the binary value 42.

So let’s do another example. This time converting a decimal number all the way to a floating point binary number to see what that process looks like. And in this case, let’s do the value 86. We’ve already converted 86 to binary before, which was 1010110 with the decimal point right here, so we need to do two things. The first thing we need to do is move the decimal place all the way forward, so we need to move it 1,2,3,4,5,6 places, so our exponent is going to be six. So to find our actual exponent value, remember we have to add the bias which is 15. And we will get 2121 in binary is going to be 16 plus four, plus one. So we get the binary value 10101. Then to construct our actual floating point number we start with our sign bits. So we have our sign bit right here, this is going to be zero, then we’ll have our exponent which is going to be 10101. And then we will have our mantissa. And remember with the mantissa, we take this value, but we remove this one off of the front, and so the mantissa will be 010110. And then we will fill the rest of it out with zeros until we get to 16. So we have There we go. So the decimal value 86, we can easily convert to a floating point binary number by moving the decimal six places using that six to calculate our exponent of 10101 and then calculating the rest of the mantissa by taking the one off of the value and using the other bits in the mantissa.

Floating point numbers can have a very, very wide range of values. For example, the 16 bit binary floating point numbers that we looked at today have a range from negative 65,000 to positive 65,000 in whole numbers, but it can actually show values as small as 5.9 * 10^-8 And it also has ways of showing positive infinity and negative infinity by setting the exponent to all ones and setting the mantissa to all zeros. Unfortunately, because it is a rational number, it is inexact. For example, if we want to show the value one third, we would end up with this binary floating point of 0101010101 in the mantissa, which is really just 0.33325, which is not exactly one third, just like one third is a repeating decimal in decimal values. One third is also an infinite repeating binary floating point number as well. So it can’t be exactly shown, but it’s not really Either 0.3325 is actually pretty close to what we want. So what are these numbers look like in a real world computer in most modern operating systems and integer is a standard size of 32 bits, although most normal processors today actually support 64 bit integers by default, but a lot of programming languages are still built around the idea of 32 bit numbers. Likewise, we can have long integers which are 64 bits. And then for floating point we have half size, the single size or float which is 32 bits and the double size floating point which is 64 bits. And here we show that of these 32 bits, eight of them are used for the exponent and 23 are used for the mantissa. Likewise for a double size 11 bits are used for the exponent and 52 bits are used for the mantissa

Other Data Types

YouTube Video

Resources

Video Script

So now that we understand how to encode numbers into binary, let’s look at some other data types and see how those work. The nice thing is in the computer, everything is really just a binary number. It’s all ones and zeros. So we really just have to find a way to take other types of data and convert them into numbers. And then we can store those numbers in our computer and use them in our computer programs.

So for example, to store text in a computer program, we can use an encoding that converts each character of the text to a number and then store that number. The code that we use today is ASCII, or the American Standard Code for Information Interchange. And on this table, we see the first 127 characters of the ASCII code. In modern computers we use a more advanced code called Unicode that allows us to show many more characters, but it’s all based off of ASCII and in most of our computer programs will just work with simple ASCII text in most cases. So for example, The letter K is the decimal value of 75. On this table, we can see that here in this third column, we can also see all of the numbers and symbols. And we also have this whole first column of various different control characters. And these were really important in older computers where the control characters would tell the system things to do. For example, we have special characters for, for shift in shift out for end of text or end of transmission for cancel, substitute. And there’s even a particular symbol number seven that will play a bell or a sound. And it’s actually fun, you can still do that today. In most modern systems, you can send a character seven and it will ding on a terminal.

So to store text in ASCII, we would simply store a whole string of binary digits such as this, then to actually calculate what this is, we would break this binary digits up into eight bits. So we have 1,2,3,4,5,6,7,8. We would draw a line right here. We would draw a line right here. And so on every eight characters, we can draw a line. And so then we take each of those blocks of eight characters, and we convert them into their decimal value. So for example, 01100110, we can convert that to a decimal value, which is 102. And then on this previous slide, we can look that value up. So the value 102 is the lowercase character F. So back on the slide, we know that the value 102 is equal to character F. Likewise, we can continue to do this and find out what each character value is for all of these binary numbers. But on the slide, we’ve already done it for you. And actually, it’s really interesting. You can see that this slide uses ASCII text to encode the value 42 in words using those characters from ASCII.

So what about images, a lot of our computer Programs today make use of images. And of course, with the internet and video games and all the technology that we use images are a really important thing to be able to store on our computer. And it turns out there are actually two different types of ways that we can store images on our computer. The first way is a vector image, which uses mathematics to actually describe the shape and the lines and the colors within the image itself. Or we can create what’s called a bitmap or a raster image, which actually stores individual pixel pixels within the image itself. So what’s the vector image look like? It could look something like this. Most computers today support the vector image format, SVG, or Scalable Vector graphics. And a scalable vector graphics image is simply a list of mathematical equations. They’re used to draw the lines and the shapes and fill in all the colors of the image. And you’ll see these SVG graphics used a lot in logos and marketing materials, things that need to be printed very large or very small. For example, here at K-State, there is a scalable Vector Graphics version of the K state power cat as well as a lot of the K-State logos. So they can be printed as small as on a business card or as large as on the size of the stadium without looking pixelated. And that’s the big power of vector graphics is they can be shrunk or expanded as much as you want. And all of the graphics will seem perfectly smooth, because they’re mathematically defined. However, creating a scalable vector graphics such as that takes a lot of work. There is some very special tools. And it’s not like you can just go out and take a picture of something and easily convert it to a vector graphic, you really have to draw it from scratch or spend a lot of time recreating it to get that vector graphic.

The other way that we can store graphics in our computer is through a bitmap. And so a bitmap is simply a list of pixels, and each pixel is assigned a color. So this is a bitmap of an old sprite from a video game, just to give us a really quick blown up example of what a bitmap might look like. So how would we store an image like this in our computer? Well, it comes down to the concept colors. From color theory and art, we know that all the colors in the world are made up of three colors red, green, and blue. And so we can mix and match different intensities of those colors to produce any color that we need in the palette. And this is the key behind paint mixing. If you’ve ever mixed paints, you know that you can get any color by mixing two different paints together at various levels, or maybe all three paints to get the particular color you want. For example, K-State purple is a mix of a lot of red, a lot of blue and not very much green. So that bitmap if we actually render it out as this is hexadecimal values, if we render it as hexadecimal values, we would get something that looks like this. And so these values, each pair of two digits represents a particular color, we have red, green, and blue, and in this case, they should be inverted. So we have blue, green, and red. So if we overlay that data on top of the bitmap itself, you can see that each particular Color is represented by its own value. And so these values in hexadecimal are just binary numbers that have been simplified a little bit so they’re easier to read. And so for example, the squares that are dark red are 18009 B. So it means one a, there’s very little blue 00 means there’s no green, and nine beams, there’s a lot of red in that color. Likewise, we have colors of yellow, and orange, and then we have black as well. So of course, the store this bitmap does take quite a lot of value. Each one of these numbers requires 32 bits of data to actually store them.

But in old computer systems, especially early video game systems, we didn’t have nearly enough memory to do this. And so one of the tricks you can use with bitmaps is you can replace those colors with very simple numbers, and then provide a lookup table that says What color is what. And so here we’ve replaced these colors with four different numbers 000110 and one one and then somewhere else, we can to store a color table that says 00 is this color one, one is this color, etc. And in fact, some of the early video games use this very technique. If you go back and play the earliest Super Mario video game, you’ll notice that the clouds and the bushes and some of the enemies have different color palettes applied to them. And what they’ve actually done is they’ve taken the same bitmap image and just change the color key that goes with it to convert them from clouds that are white to bushes that are green and converted different colors of enemies from Red koopas to yellow koopas all of the different colors through this little trick.

The last topic we’ll talk about in encoding is the idea of compression. One of the big things in computer science is taking large amounts of data and storing them in smaller spaces. Because as it turns out, storing data can be very expensive. And with the rise of the Internet, we found that transmitting that data can also be very expensive. So we need to look for ways that we can compress data and store it in a smaller number of bits than we would normally So let’s look at an example of something that has some repetitive data in it. For example, how much wood could a woodchuck chuck if a woodchuck could chuck wood? This example of a tongue twister has a lot of repeated data. So what if we took some of those words and replace them with shorter things such as numbers. So if the word wood was replaced with one and could was replaced with two and Chuck was replaced with three, then we’d end up with a sentence that looks like how much one two a 1 3 3. If a 13 2 3 1, it is quite a bit shorter. We’ve noticed that by storing that key of those words, we can replace those longer words with shorter values, and we take up much less space. This is the concept behind a lot of computer compression algorithms. You find repeated chunks of data that show up multiple times in the code. And then you replace those repeated chunks with smaller representations and maybe have some sort of a lookup table that says how to expand those representations. And that’s really it. And so everything from zip compression to things like JPEG for images, are based on some of these similar ideas.

Unfortunately, image compression can become really tricky. This is a really great case study, and we’ll link to this later on in this module. But Xerox copiers used an image compression algorithm that would look through the image and it would try and find parts of the image that it could replace with other parts. So here we have this first image. This is the original image of a printout of an accounting documents. Then here, we have a photocopy of that document made on a non Xerox copier back in the day. And then we have a photocopy of that document made on a Xerox photocopier that had this particular error. Can you spot the problem? So what Xerox was doing is it was looking at the image and it was trying to find things that were very similar. And then it would store only one copy of that image and replace the rest of it with identifiers that say, Oh, this chunk of the image should be this and what they noticed is this Right here, and this eight right here looked very similar. And so as some photocopiers It was no problem. But with Xerox photocopiers, you see that, oh, that six accidentally got replaced with this eight. And so in fact, here, there were a couple of sixes, they got replaced with eights. Although, interestingly enough, this one didn’t. And that shows some of the imperfection of these image processing algorithms that were being used at the time. And so this is a really interesting case study in how image compression can go awry. And it can cause really strange things like your accounting documents to have incorrect values, even though it’s a photocopy and you would think this doesn’t happen. So if you’re interested in this, I encourage you to read more. The case study is really fascinating about how they went through and figured out this issue and what was going on.

Pattern on the Stone Reading

Read Pattern on the Stone, Chapter 6.