Other Data Types

Resources

Slides

Video Script

So now that we understand how to encode numbers into binary, let’s look at some other data types and see how those work. The nice thing is in the computer, everything is really just a binary number. It’s all ones and zeros. So we really just have to find a way to take other types of data and convert them into numbers. And then we can store those numbers in our computer and use them in our computer programs.

So for example, to store text in a computer program, we can use an encoding that converts each character of the text to a number and then store that number. The code that we use today is ASCII, or the American Standard Code for Information Interchange. And on this table, we see the first 127 characters of the ASCII code. In modern computers we use a more advanced code called Unicode that allows us to show many more characters, but it’s all based off of ASCII and in most of our computer programs will just work with simple ASCII text in most cases. So for example, The letter K is the decimal value of 75. On this table, we can see that here in this third column, we can also see all of the numbers and symbols. And we also have this whole first column of various different control characters. And these were really important in older computers where the control characters would tell the system things to do. For example, we have special characters for, for shift in shift out for end of text or end of transmission for cancel, substitute. And there’s even a particular symbol number seven that will play a bell or a sound. And it’s actually fun, you can still do that today. In most modern systems, you can send a character seven and it will ding on a terminal.

So to store text in ASCII, we would simply store a whole string of binary digits such as this, then to actually calculate what this is, we would break this binary digits up into eight bits. So we have 1,2,3,4,5,6,7,8. We would draw a line right here. We would draw a line right here. And so on every eight characters, we can draw a line. And so then we take each of those blocks of eight characters, and we convert them into their decimal value. So for example, 01100110, we can convert that to a decimal value, which is 102. And then on this previous slide, we can look that value up. So the value 102 is the lowercase character F. So back on the slide, we know that the value 102 is equal to character F. Likewise, we can continue to do this and find out what each character value is for all of these binary numbers. But on the slide, we’ve already done it for you. And actually, it’s really interesting. You can see that this slide uses ASCII text to encode the value 42 in words using those characters from ASCII.

So what about images, a lot of our computer Programs today make use of images. And of course, with the internet and video games and all the technology that we use images are a really important thing to be able to store on our computer. And it turns out there are actually two different types of ways that we can store images on our computer. The first way is a vector image, which uses mathematics to actually describe the shape and the lines and the colors within the image itself. Or we can create what’s called a bitmap or a raster image, which actually stores individual pixel pixels within the image itself. So what’s the vector image look like? It could look something like this. Most computers today support the vector image format, SVG, or Scalable Vector graphics. And a scalable vector graphics image is simply a list of mathematical equations. They’re used to draw the lines and the shapes and fill in all the colors of the image. And you’ll see these SVG graphics used a lot in logos and marketing materials, things that need to be printed very large or very small. For example, here at K-State, there is a scalable Vector Graphics version of the K state power cat as well as a lot of the K-State logos. So they can be printed as small as on a business card or as large as on the size of the stadium without looking pixelated. And that’s the big power of vector graphics is they can be shrunk or expanded as much as you want. And all of the graphics will seem perfectly smooth, because they’re mathematically defined. However, creating a scalable vector graphics such as that takes a lot of work. There is some very special tools. And it’s not like you can just go out and take a picture of something and easily convert it to a vector graphic, you really have to draw it from scratch or spend a lot of time recreating it to get that vector graphic.

The other way that we can store graphics in our computer is through a bitmap. And so a bitmap is simply a list of pixels, and each pixel is assigned a color. So this is a bitmap of an old sprite from a video game, just to give us a really quick blown up example of what a bitmap might look like. So how would we store an image like this in our computer? Well, it comes down to the concept colors. From color theory and art, we know that all the colors in the world are made up of three colors red, green, and blue. And so we can mix and match different intensities of those colors to produce any color that we need in the palette. And this is the key behind paint mixing. If you’ve ever mixed paints, you know that you can get any color by mixing two different paints together at various levels, or maybe all three paints to get the particular color you want. For example, K-State purple is a mix of a lot of red, a lot of blue and not very much green. So that bitmap if we actually render it out as this is hexadecimal values, if we render it as hexadecimal values, we would get something that looks like this. And so these values, each pair of two digits represents a particular color, we have red, green, and blue, and in this case, they should be inverted. So we have blue, green, and red. So if we overlay that data on top of the bitmap itself, you can see that each particular Color is represented by its own value. And so these values in hexadecimal are just binary numbers that have been simplified a little bit so they’re easier to read. And so for example, the squares that are dark red are 18009 B. So it means one a, there’s very little blue 00 means there’s no green, and nine beams, there’s a lot of red in that color. Likewise, we have colors of yellow, and orange, and then we have black as well. So of course, the store this bitmap does take quite a lot of value. Each one of these numbers requires 32 bits of data to actually store them.

But in old computer systems, especially early video game systems, we didn’t have nearly enough memory to do this. And so one of the tricks you can use with bitmaps is you can replace those colors with very simple numbers, and then provide a lookup table that says What color is what. And so here we’ve replaced these colors with four different numbers 000110 and one one and then somewhere else, we can to store a color table that says 00 is this color one, one is this color, etc. And in fact, some of the early video games use this very technique. If you go back and play the earliest Super Mario video game, you’ll notice that the clouds and the bushes and some of the enemies have different color palettes applied to them. And what they’ve actually done is they’ve taken the same bitmap image and just change the color key that goes with it to convert them from clouds that are white to bushes that are green and converted different colors of enemies from Red koopas to yellow koopas all of the different colors through this little trick.

The last topic we’ll talk about in encoding is the idea of compression. One of the big things in computer science is taking large amounts of data and storing them in smaller spaces. Because as it turns out, storing data can be very expensive. And with the rise of the Internet, we found that transmitting that data can also be very expensive. So we need to look for ways that we can compress data and store it in a smaller number of bits than we would normally So let’s look at an example of something that has some repetitive data in it. For example, how much wood could a woodchuck chuck if a woodchuck could chuck wood? This example of a tongue twister has a lot of repeated data. So what if we took some of those words and replace them with shorter things such as numbers. So if the word wood was replaced with one and could was replaced with two and Chuck was replaced with three, then we’d end up with a sentence that looks like how much one two a 1 3 3. If a 13 2 3 1, it is quite a bit shorter. We’ve noticed that by storing that key of those words, we can replace those longer words with shorter values, and we take up much less space. This is the concept behind a lot of computer compression algorithms. You find repeated chunks of data that show up multiple times in the code. And then you replace those repeated chunks with smaller representations and maybe have some sort of a lookup table that says how to expand those representations. And that’s really it. And so everything from zip compression to things like JPEG for images, are based on some of these similar ideas.

Unfortunately, image compression can become really tricky. This is a really great case study, and we’ll link to this later on in this module. But Xerox copiers used an image compression algorithm that would look through the image and it would try and find parts of the image that it could replace with other parts. So here we have this first image. This is the original image of a printout of an accounting documents. Then here, we have a photocopy of that document made on a non Xerox copier back in the day. And then we have a photocopy of that document made on a Xerox photocopier that had this particular error. Can you spot the problem? So what Xerox was doing is it was looking at the image and it was trying to find things that were very similar. And then it would store only one copy of that image and replace the rest of it with identifiers that say, Oh, this chunk of the image should be this and what they noticed is this Right here, and this eight right here looked very similar. And so as some photocopiers It was no problem. But with Xerox photocopiers, you see that, oh, that six accidentally got replaced with this eight. And so in fact, here, there were a couple of sixes, they got replaced with eights. Although, interestingly enough, this one didn’t. And that shows some of the imperfection of these image processing algorithms that were being used at the time. And so this is a really interesting case study in how image compression can go awry. And it can cause really strange things like your accounting documents to have incorrect values, even though it’s a photocopy and you would think this doesn’t happen. So if you’re interested in this, I encourage you to read more. The case study is really fascinating about how they went through and figured out this issue and what was going on.