Chapter 12

How the Internet Works

Encapsulation Encapsulation

Subsections of How the Internet Works

Introduction

YouTube Video

Resources

Video Script

In this module, we’re going to dive a little bit deeper into the technologies that allow the internet and the World Wide Web to work today. As we discussed in a previous video, early computer networks relied on circuit switching, where each computer had to have a direct connection to each other computer it needed to talk to. This was very cumbersome and very prone to error. And so we needed to come up with a better system for doing this.

So a circuit switch network works a lot like a telephone network. If you want to call somebody in New York and you’re in Los Angeles, there has to be solid wire connection between New York and Los Angeles from phone to phone to really enable that connection to work. But of course, with computers there is one major difference between how two computers communicate and how two humans communicate. Think about a phone call. When humans are on the phone, there’s almost no downtime, either one person is speaking or the other person is speaking. With a computer network, however, the communication is much more interrupted computer will send a command. And it will have to wait for some time before it receives any data back. And likewise, the sending and receiving can be much, much faster. And so with packet switched networks, instead of needing a solid connection all the time, can we just take those little bursts of data and somehow transmit them as needed, leaving the other spaces open for other bits of data to come in as well.

That’s really what led to the idea of packet switch networks. And this is one of the big reasons that computers and phones really need different network architectures. And it was something we realized in the 1960s. So the work of building packet switching actually lies in the work of Paul Baran. Paul Baran was a researcher at the RAND Corporation in 1959. And he worked on building computer networks or any any type of network really, that would be reliable in the case of a nuclear attack. So for example, if we want to send a message across the United States and there’s a nuclear attack somewhere in the middle, could we make sure that that message could get around the attack and still be received everywhere that it needed to be? Or more precisely, if networks went down? How could you actually get a communication across a network that was broken? So let’s take a look at a video looking at the work of Paul Moran and his creation of packet switched networking and why it was so important.

Protocols

YouTube Video

Resources

Video Script

The topmost layer of the seven layer OSI network model is the application layer. In this video, we’ll take a look at some application layer protocols and discuss what they do. These protocols are usually used to send data across two different computers using different programs. For example, there are protocols such as pop, SMTP, and eye map for sending and receiving email. We have HTTP and FTP for sending and receiving files and webpages using the World Wide Web. We have DNS and DHCP for helping us configure and set up our networks. And we can use protocols such as SSH, RDP, or telnet to remotely connect to computer systems all across the globe. So let’s take a look at a few of these application layer protocols and see how they work.

Probably the most important of the application layer protocols is DNS or the Domain Name System. You can think of DNS like the phonebook for the internet. Earlier in this module, we talked about how each computer on the is assigned a unique IP address, which is kind of like a phone number. However, just like phone numbers, IP addresses can be very confusing and hard to remember, it’s just a random string of numbers that we have trouble associating to anything. So instead, DNS allows us to remember domain names, and then it will associate them with IP addresses that actually are the computers we want to talk to. For example, when we open up a web browser and type in www.ksu.edu, DNS actually looks up the value of ksu.edu in the DNS system and returns an IP address like 129.130.8.41. And it says here, you need to talk to this computer to get the ksu.edu. The same thing happens for Google.com, ibm.com. Any website that you put into your web browser, we use DNS to figure out what the IP addresses of the computer system you want to talk to. The domain namespace itself is actually a hierarchical system. At the very top level, you have the root domain name servers. And then below that you have the top level domain servers for domains such as.org.com, dotnet.edu. Each of those have their own domain name servers. Then below that you can have servers for each individual domain names such as ksu.edu, ibm.com, espn.com, they all can maintain their own servers. And they can even delegate to further sub zones. For example, the computer science department runs its own DNS server at cs.ksu.edu. So we can have our own systems.

The process for looking up a domain name using DNS is interesting because it’s recursive. In computer science. We’re very familiar with recursive algorithms. So let’s say we want to look up the IP address for wikipedia.org. We would start by going to a DNS for cursor, which is usually a piece of software on our computer, it would go out and try and contact the root name server and say, Where is www.wikipedia.org and the root name server would say, I don’t know but I know where the .org name server is. Why don’t you try there? So then our recursive would go to the.org name server and say, Hey, where is wikipedia.org? And the.org name server says, Well, I know where Wikipedia is, why don’t you go try there. So then we’ll go talk to the wikipedia.org name server and ask it Where is www.wikipedia.org. And that org name server will say, Aha, I am the authoritative name server for wikipedia.org. And I can tell you that it is at this IP address. So then your DNS cursor will tell your computer Oh, you want to go to wikipedia.org. It’s this IP address. And usually this process happens within just a fraction of a second after you press the Enter key to load a webpage.

The next protocol we’ll discuss is the Dynamic Host Configuration Protocol or DHCP. DHCP is what allows a computer to connect to an unknown computer network and get an IP address and get connected without any other configuration needed. Before DHCP was really commonly used bringing your computer to a new office, usually required quite a lot of technical setup. And it made it very hard to have portable computers available like we do today. The Dynamic Host Configuration Protocol consists of four very simple steps that happen whenever a new computer connects to the network. Let’s go through those steps really quick and see what they do. In the first step, the client will send a DHCP discover message, which is basically a shout message that goes to every single system on the network. And it says, Hi, I’m new, I need an IP address. Then, on the network, there’s a DHCP server that is listening for those shout messages. And it will send back its own shot message of a DHCP offer that says Hi, I’m a DHCP server. Here’s an IP address and some configuration information you can use. Then the client once it gets that information, we’ll send a short message that says hey, that’s great. I’d like to use that information. And finally the DHCP server will send back one final shout that acknowledges Okay, good. That’s your information. Good luck. And then it will record that that IP address has been used by that particular computer. Meanwhile, the computer will use that information to configure its network settings so that it can talk to the network. So whenever you come to campus and you bring your computer and you see your wireless icon flashing for a few seconds before it goes solid, the DHCP protocol is actually what’s going on. It is connecting your computer to K-State’s wireless network requesting an IP address and getting connected so that you can use the internet on your mobile device.

Another important protocol at the application layer is the hypertext transfer protocol or HTTP. HTTP was developed by Tim Berners-Lee to actually send webpages using the World Wide Web. And it was developed in the late 1980s. And it’s really a revolutionary part of the internet we use today. Here’s what the HTTP protocol actually looks like. It is very interestingly, a text based protocol. So at the top of this image in red, we have the request where we’ve connected to wikipedia.org and we’ve requested to get the main page. So then the web server we’ll respond. And in blue, it responds with the response headers that tell us about the information we’re about to receive, such as the size and different things about it. And then finally, in green, you see the body of the response, which is actually the HTML that makes up the webpage that gets sent. And so if you’re really good, you can use tools such as telnet to browse the web. By typing in these HTML, HTTP requests and getting the response directly from the web server. It’s a really fun activity to try if you’re interested.

To make things a little bit simpler HTTP requires several different commands. For example, there is the get command that used to request a webpage, the post command that used to send data back to a web page, the head command, which gets information about the page without actually loading the page itself. And then there are other commands that are less often used, such as put and delete to change pages and delete pages off a web server.

To further simplify things, HTTP also defines a set of status codes that the server can respond to With telling more information about the response, the sending the most common response hopefully is 200, which simply means Okay, I received it and I’m sending back data. You can also get responses like 301 moved permanently if a website has been moved, you can get 434 bitten if you try and log into a website that you don’t have access to. And you can get some dreaded errors such as the 404 not found error if that web page can’t be found on the server. Of course, there are other errors, such as the 500 errors, the internal server errors, that means the server itself is having a problem and can’t respond right now. There’s also the 503 error, which means that the service is unavailable, but it might come back soon. As you can see, there are a lot of different application layer protocols that are used on the internet today, and they all serve a variety of different purposes. I hope this has been a really interesting look into the technology behind how the internet works. In the next few modules will actually explore what it takes to build web Websites ourselves.

Seven Layer OSI Model

YouTube Video

Resources

Video Script

As we saw in the video, the work of Paul Bran led to the creation of packet switch networks, a packet switch network might look something like this diagram, we have three computers. But we actually have six network devices that are creating a complicated network infrastructure that connects these three devices. So looking at this diagram, can you see some advantages to this network setup. For example, if one node goes down, can we still find a way to get a message to all the other nodes, it looks like we can in a lot of cases. So that makes this network very fault tolerance. In addition, if we want to add more computers to the network, we just have to plug them in at one of these open spots and they’ll be able to communicate as well. So packet switch network is really important. One of the key concepts in packet switch networking is routing. And routing is figuring out a way to create a path in the network to get from point A to point B using the shortest path which means Be the fewest number of hops, it really depends. And so with networking, if any one particular system goes down, our routing can actually figure out a way to get around that problem without any issues.

When we talk about modern computer networking, we can think of it in terms of the seven layer OSI network model, sometimes referred to as the network stack. Each of these layers performs a particular task in turn involved in getting data sent from one computer application to another application on a different computer. The top three layers application presentation and session are usually handled in the software itself. So we’re not going to talk about those too much in this video, but we’ll take a look at the bottom four layers and how they are actually used to get data from one computer to another computer using the modern internets. The big concept here is encapsulation. Each of those layers adds a little bit of data around the data that we want to send, allowing it to get where it needs to go. So we start with data from the application. Then we go down to the transport layer and we add a little bit of data to it. Then we go down to the internet layer, we add some more data. And finally we get to the link layer and add even more data.

This may seem really confusing, but let’s use a little metaphor to try and understand it better. For example, think about a letter you’re sending through the postal system. We have the letter, we might put it in an envelope, and that envelope might have a name on it, such as mom and dad, then we can take that nice envelope and put it in another envelope. And that’s the actual mailing envelope that has the address information on the back and the stamp and the return address. And then we put that letter in the mailbox. And when that letter gets picked up by the Postal Service, it might go into a box full of letters going to the same destination making up the frame. Then when the letter gets to the destination post office, it’s taken out of that box, they read the address, it gets to the location where it needs to go. They open the outs envelope, they look at the inside envelope, they see who it’s intended for. And then that person can open that envelope and get their data back out. And so encapsulation is really the important concept around this layered model of networking.

And we’ll take a look at each layer and what it does in particular, just to see a little bit more about how it works. The very bottom layer of the network model is the physical layer. And the physical layer handles the actual sending of individual bits and bytes across the network wires. He uses technologies such as hundred 10, or 1000, base T or gigabit And the important thing about the physical layer is it doesn’t have any knowledge of the packet whatsoever, just the next hop the next destination that it needs to be sent to on the network itself.

The next layer up is the data link layer. The DataLink layer or the Ethernet layer deals with different frames of packets and it deals with things such as congestion. One of the most important things that happens at the DataLink layer is Establishing the rules of the road for each packets, it assures that a packet has a location and an address. And so the data link layer really handles the way the data is routed across individual Ethernet cables between different networking devices. You can think of it kind of like working with a zip code.

Next, we go up to the network layer. And the network layer is the Internet Protocol layer. And this is actually what defines the end points of the packet where the packet starts at one computer and the destination of the packet at the other end. And so in the network layer, we have this packet structure that has a lot of different information in it. But the most important thing that it has is the IP address of the computer that sends it and the IP address of the destination computer that should receive it. You can think of an IP address like a phone number. It’s a unique number that identifies a single computer on the internet itself. And it tells us where it’s coming from and where it’s going to. This is the packet structure for IP version four but it is Quickly being replaced by IP version six.

Let’s take a look at the difference between IP version four and IP version six. IP version four uses addresses that have 32 bits of data. And so if you think about it two to the 32nd is about 4.2 billion. So really big number. And for a long time, we thought that that was going to be enough IP addresses on the internet. Of course, the internet has become much, much larger of the last few years. And so it’s very easy to imagine that there well more than 4.2 billion devices in the world. I mean, there are over 7 billion people in the world. So we don’t even have enough IP addresses for each person to have just one device. And if you’re anything like me, you probably have multiple devices that you’re using. So we’re looking at moving toward IP version six, which uses 128 bit addresses. It’s four times as many bits but we get to the point where we can have 340 undecillion IP addresses. Another way to think of it is Each atom in the basically each grain of sand on the earth can be assigned its entire ipv4 address space of 4 billion addresses and we’d still have plenty of them leftover. So I’m hoping that ipv6 is probably enough, at least for now. So of course, the network packet at the ipv6 layer looks very similar with just a larger source address and a destination address. And so hopefully this works for a long time to come.

This graphic taken from an XKCD comic shows the IP address space as it looked like a few years ago, in 2006, when this comment was written, all of the greenspaces were still available as open IP address space that could be taken on the internet. But of course, it’s all now gone. And so we’re running out of address spaces on the ipv4 internets. One interesting thing to note is originally on the internet, the 256. First octets of IP addresses were originally assigned to individual companies. For example, IBM was the assigned any IP address starting in nine. The K-State IP address range if you’re interested, is 129.130. And so we have one 256 of one square of this particular diagram was assigned to K-State originally.

The fourth layer is the transport layer. And the transport layer is where the transmission control protocol or TCP resides. The biggest thing that TCP does is it adds ports to the IP address. And so on a computer we can have many different programs that are all connecting via the internet at the same time, and we need a way to tell which data goes with which program. That’s what the transport layer does for us by assigning ports to programs on a computer. As we receive packets or as we send packets, we can mark them with a certain port that is associated with a certain program that should be able to interpret that data. Computers today use a set of ports, the US 65,535 unique ports. And some of them you might be familiar with. For example, the internet. The World Wide Web uses Port 80 for TCP or for HTTP connections.

So there are two different things that operate at the transport layer. There’s TCP the Transmission Control Protocol. And the other protocol at the transport layer is UDP the User Datagram Protocol. And you can see here the UDP packet structure has a lot less information. This is because unlike TCP, UDP has no technology to guarantee transmission of data. As we’ve discussed earlier, TCP relies on verification and retransmission of missing data so that TCP can guarantee every single packet is delivered if possible, whereas UDP does nothing like that. It just sends the packets and hopes they received, but it doesn’t have any way of controlling that. TCP of course is very useful for when we need to make sure every single bit of data gets there, but UDP also has its uses. For example, think of streaming data or store streaming video, or streaming video game data. If you don’t need every particular packet, instead of trying to retransmit those packets that have been missed, UDP will just skip them entirely and keep moving on.

This, of course leads to the two worst jokes in the history of computer science. Do you want to hear a TCP joke? I could tell it to you. But then I’d have to keep repeating it until I was sure you got it. Do you want to hear a UDP joke? I could tell it to you. But I’d really never be sure if you got it or not. See what I mean. It’s really two different technologies trying to do two different things. But they all have their very important uses in the internet today.

So to summarize, TCP is a connection orientated, reliable protocol that uses acknowledgments to verify the data has been sent and received properly. UDP on the other hand, is a connectionless protocol that is not really reliable, because it doesn’t use that system of acknowledgments to make sure the data has been received properly. These two texts analogies are two of the core technologies that operate at the transport layer on the internet in that seven layer OSI model that allows different computer programs to talk to each other using the internet.

The World Wide Web: Crash Course Computer Science #30

YouTube Video