Deployment
Putting your work out there
Putting your work out there
Now that we’ve explored building web applications and writing web servers, we’ll explore in more detail how to make them available on the web, a process known as deployment.
Some key terms to learn in this chapter are:
The key skills to develop in this chapter are:
As we discussed previously, when we visit web pages in our browser, the browser makes a HTTP or HTTPS request against the web server for the resource. For this request to be sent to the right server, we must first determine the server’s address. This address takes the form of an Internet Protocol (IP) address. There are currently two ways these IP addresses are specified, IPv4 and IPv6. An IPv4 address consists of 32 bits split into 8 bit chunks, and is typically expressed as four integers between 0 and 255 separated by periods. For example, the IP address for Kansas State University is:
129.130.200.56
Consider that an IPv4 address consists of 32 bits. This means we can represent 2^{32}
unique values, or 4,294,967,296
different addresses. You can see how many websites are currently online at https://www.internetlivestats.com/watch/websites/, as of this writing it was nearly 2 million. Combine that with the fact that every computer connecting to the interent, not just those hosting, must have an ip address, and you can see that we are running out of possible addresses.
This is the issue that IPv6 was created to counter. IPv6 addresses are 128 bits, so we can represent 2^{128}
different values, or 340,282,366,920,938,463,463,374,607,431,768,211,456
possible addresses (quite a few more than IPv4)! An IPv6 address is typically expressed as eight groups of four hexidecimal digits, each representing 16 bits. For example, Google’s IPv6 address is:
2607:f8b0:4000:813::200e
IPv6 adoption requires changes to the internet infrastructure as well as all connected machines, and is typically done on a volunteer basis. Thus, it is a gradual process. The United States is at rougly a 47% adoption level; you can see current adoption statistics for all countries at Akamai’s IPv6 Adoption Tool.
In addition to the address, requests are sent to a port, a specific communication endpoint for the computer. Different applications “listen” at specific ports on your computer - this allows your computer to have multiple applications communicating across the network or interent at the same time. You might think of your computer as an apartment complex; the IP address is its street address, and the ports are individual apartment numbers. Different applications then live in different apartments. Many of these ports are unused (the apartments are vacant), as most computers are only running a few network-facing applications at any given time.
Many applications and communication services have a “default” port they are expected to use. This makes it easy to connect to a remote computer. For example, a webserver uses port 80 by default for HTTP requests, and port 433 for HTTPS requests. If your server is listening at these ports, there is no need to specify the port number in HTTP/HTTPS requests targeting your IP address. You can also run a program on a non-standard port. For example, with our Razor Page applications, we have typically been running our debug version on port 44344
(with IIS Express) or 5001
(when running as a console application). If you know the IP address of your computer, you can visit your running debug server by using the address http://[ip address]:[port] from another computer on your network.
Some of the most common services and thier associated default ports are:
Port | Protocol | Use |
---|---|---|
22 | Secure Shell (ssh) | Used to communicate between computers via a comand line interface |
25 | Simple Mail Transfer Protocol (SMTP) | A common email routing protocol |
53 | Domain Name System (DNS) Service | IP lookup by domain name |
80 | Hyper-Text Transfer Protocol (HTTP) | Servicing WWW requests |
143 | Post Office Protocol (POP) | Retrieving email |
143 | Interent Message Access Protocol (IMAP) | Managing email |
193 | Interent Relay Chat (IRC) | Chat messages |
443 | Secure Hyper-Text Transfer Protocol (HTTPS) | Servicing secure WWW requests |
Of course, you don’t normally type 129.130.200.56
into your browser to visit K-State’s website. Instead, you type k-state.edu or ksu.edu, which are domain names owned by the university. A Domain Name is simply a human-readable name that maps to an IP address. This mapping is accomplished by the Domain Name System (DNS), a lookup service composed of a hierarchial network of servers, with the root servers maintained by the Internet Corporation for Assigned Names and Numbers (ICANN).
When you enter a domain name into your browser, it requests the corresponding IP address from its local DNS server (typically maintained by your ISP). If it does not know it, this server can make a request against the broader DNS system. If a matching IP address exists, it is returned to the local DNS server and through that server, your browser. Then your browser makes the request using the supplied IP address. Your browser, as well as your local DNS server, cache these mappings to minimize the amount of requests made against the DNS system.
To get a domain name, you must purchase it from a domain name reseller. These resellers work with ICANN to establish ownership of specific domain names - each domain name can have only one owner, and map to an IP address. Once you’ve purchased your domain name (which is actually leased on an annual basis), you work with the reseller to target the domain name at your preferred IP address(es). This entire process is typically handled through a web application.
The localhost domain name is a special loopback domain. It always points at IP address 127.0.0.1
, which is always the local machine (i.e. the computer you are using).
We have talked several times about HTTP and HTTPS, without really discussing what is different about these two approaches other than that HTTPS is “secure”. Essentially, HTTPS uses the same protocol as HTTP, but requests and responses are encrypted rather than being sent as plain text. This encryption is handled at a level below HTTP, in the communication layer (currently this uses TLS - Transport Layer Security). This encryption is done through symmetric crypography using a shared secret. You may remember studying this approach in CIS 115. Remember this Computerphile video demonstrating a Diffie-Hellman key exchange by mixing paint colors?
By using this encryption scheme, we make it impossible (or at least very difficult) for third parties intercepting our HTTP requests and responses to determine exactly what they contain. Hence, credit card information, passwords, and other personal information is protected, as are search terms, etc. However, this is only half of the process of HTTPS. The second half involves establishing that the web server you are making requests to is the one you want, and not an impersonator. This requires an authentication process, to ensure you are communicating to the correct server.
This authentication aspect of TLS is managed through security certificates. These are built around public/private key encryption and the X.509 certificate standard. A certificate provides proof the server serving the certificate is the one of the domain address in question. Think of it as a driver’s license or your student ID card - it lists identifying information, and you carry it with you to prove you are who you say you are. And much like a drivers’ license or a student ID card, it is issued by an authoritative source - one of several “trusted” certificate authorites, or an authority whose own certificate is signed by one of these authorities.
Anyone can issue a security certificate, but only one with a chain of signed certificates that goes back to a root trusted certificate authority will be considered “trusted” by your browser. This is why you may have had issues running your web applications using HTTPS - when launching the project in debug mode, it uses a self-signed certificate (i.e. your application creates its own certificate), which the browser reports as untrustworthy. Depending on your browser, you may be able to allow this “untrusted” site to be served, or it may be disallowed completely. Visual Studio and ASP.NET projects typically offer to install a “dev certificate” that allows your localhost communications to treated as trusted.
Traditionally, security certificates are issued from a trusted authority using annual fees, and often accompanied by insurance that pays for legal issues if the certificate is ever violated. However, the Let’s Encrypt Security Group, launched in April 2016, offers free security certificates with the goal of making HTTPS ubiquitious on the web. This easy availablility means there is no reason to not host your websites using HTTPS instead of vulnerable HTTP. You can visit the Let’s Encrypt website at letsencrypt.org.
To host your website, you will need:
While you have been running your website in debug mode on your development computer, you probably won’t use it to host your actual website on the Internet. First, your machine would need to be running the web server application constantly. Any time your computer was turned off, or the web server was not running, your website would be inaccessible.
Also, in most residental setups, you probably won’t be able to access the running program across the Internet anyway. This is especially if you have multiple computers connected through a router. In that case, only your router has a unique IP address, and all communications are routed through it using that address. The router also assigns “internal” addresses to the computers networked to it, and handles distributing request results to those computers that made them (kind of like a mailroom for a very large institution). To make your website available, you would probably need to set up port forwarding or a similar technique with your router to have it forward requests to the computer running your web server. You probably would also need to modify your firewall settings (firewalls prevent connections against ports that you don’t mean to have open to the Internet).
A third challenge is that you most likely do not have a static IP address, i.e. one that will always point to your computer. Most Internet Service Providers instead typically assign a dynamic IP address to your router or modem when you estabish a connection, drawn from a pool of IP addresses they maintain for this purpose. This address is yours until the next time you connect (i.e. after a power loss or rebooting your router), at which point the ISP assigns you a different IP address from the pool. This simplifies the configuration for their network, and allows them to share IP addresses amongst infrequently-connecting users. Further, most residential ISP plans specifically forbid hosting web applications using thier connection - instead you must sign up for a commercial plan that includes a static IP address.
Instead, you will probably host your websites on a remote machine running in a server farm (similar to how the CIS Linux servers that you put your personal web pages on in CIS 115). There are several benefits to using a machine in a server farm: typically they are very good machines, optimized for hosting web applications, the server farm has ideal operating conditions (good air conditioning, a backup power system), better uptime, and on-staff IT professionals that keep the equipment in good repair. Of course, you will have to pay fees to have access to a machine. There are typically several options offered by these service providers:
public_html
where they can place files to be served statically). This is the approach our department uses for your access to the CS Linux server.We’ll take a deeper look at virtual machines next.
A virtual machine is a simulated computing device. Let’s start with a simple example. You can play classic NES and Super NES games using the Switch. How is this done? Does the Switch contain the same hardware as the original NES or SNES? No. It uses a NVIDA Tegra processor, which utilizes the 64-bit ARM architecture. The NES used an 8-bit CPU, and the SNES used a 16-bit CPU. These are very different architectures. Of course, you may note the service I referenced is an online service. So do they hook up thousands of NES and SNES units to a web server processing input and output? Also no, but also yes.
What they actually do is emulate that hardware with software. Thus, they have a program that pretends to be the NES hardware, and the load into that program the actual NES game (the software) you wish to play. This emulator is a part of a web server that streams the resulting game images to the Switch, and the Switch likewise streams the player’s input back to the server. This is essentially the same approach used by any video game console emulator (sans the use of a web server).
In the same way, we can use a virtual machine to emulate another kind of computer within our own computers. For example, the VMWare platform you have access to as a CS student allows you to install VMWare on your computer. Once installed, you can create virtual Linux or Windows PCs that run within the program. Essentially, VMWare pretends to be a separate computer and shares your computer’s real resources through interfaces that make it seem to be the hardware the program emulates. Another approach to running a Linux VM within a Windows environment is the Windows Subsystem for Linux (WSL). This is essentailly a virtual machine that runs a Linux kernel (of your choice) within Windows. It offers better performance than VMWare as the VM plugs directly into Windows OS procedures.
Most Internet hosting services offer Virtual Machines that you can rent. Typically you specify the amount of RAM and storage, as well as the number of CPUs you wish to utilize. These represent a portion of the real machine’s resources that are assigned to your specific VM. You then are granted root access to the VM, and can install whatever software you need; for example, installing Linux, Dotnet Core, and a relational database to run your web application. Once you have your application set up, you launch it as a constantly running service, and it begins to listen for HTTP and HTTPS requests on port 80 and 443. The service usually provides you with a static IP address, which you can then hook up to your domain name.
The computing device being emulated does not have to equate to hardware analogues. In CIS 200 you worked with a VM extensively that does not correspond to physical hardware - the Java Virtual Machine (JVM). The JVM is a virtual machine that does not correspond to any real-world hardware, but provides a similar computing environment that can process Java Byte Code (which is similar to the assembly code run by hardware-based processors). A different JVM exists for every hardware platform that Java can be run on, mapping the virtual system procedures to the specific platform’s hardware and operating system.
You may be asking yourself “is .NET also a virtual machine?” In one sense, yes, as it works from a similar concept as Java, but the reality is a bit more complex. The DOTNET languages are compiled into Intermediate Langauge, a format similar to Java’s Bytecode. And the DOTNET runtime has specific builds for different operating systems and hardware platforms, much like the JVM. But rather than the DOTNET runtime running the IL in a virutal environment, when you execute a DOTNET program, it is further compiled by the DOTNET runtime into assembly for the specific platform using a Just-in-Time compiler. So ultimately, DOTNET code is run in actual machine, not a virtual machine.
In recent years, containerized applications have become increasingly popular, especially for web-based deployments. This is a variant on the Virtual Machine idea, where the application and the execution environment are separated into two parts - the container (an image file containing the binary code and resources of an application) and an execution environment (a virtual machine, often emulating a Linux or Windows machine, but possibly more specialized). The containerized application is then run within this execution environment. The chief benefits of containerized approaches are portability and scaleability. As the execution environment can be installed on a variety of platforms, much like the JVM, the container can be run on different hardware - a development PC, a development Mac, a production server, or someting more esoteric like a cloud server.
As the emulated execution environment is the same across each of these platforms, the same containerized application should run identically on each. Moreover, to create additional instances of a containerized application, you just launch another copy of the container - a process that takes milliseconds. Compare this to adding another traditonal server or VM - you have to intialize the VM, install the necessary supporting environment, libraries, and other dependencies, and then install your application and launch it. This process can take minutes to hours, and often requires an IT professional to work through the steps. Cloud services like Amazon Web Services, Microsoft’s Azure, and Google Cloud all utilize container-based approaches to allow rapid upscaling and downscaling of web servers to meet the actual demand for a web application. These platforms typically handle load balancing (directing traffic evenly amongst the containers running instances of your web server) automatically, making the process of hosting large-volume web traffic much easier. And for small, limited-demand web applications, many of these cloud services offer a free tier with a limited amount of bandwith a month. The speed at which a containerized application can be spun up means that these free tier applications can be shut down most of the time, and only run when requests are coming in.
In this chapter we discussed what is necessary for hosting a web application: a computer running your web server, a static Internet Protocol address, a domain name that maps to your IP addresss, and a security certificate for using HTTPS. Each of these concepts we explored in some detail. We saw how using HTTPS allows encrypted end-to-end communication between the web client and server, as well as providing assurance that the server we are communicating with is the one we intended to make requests from. We also discussed the differences between self-signed certificates and those signed by a certificate authority, and introduced the nonprofit letsencrypt.org which provides free certificates to help secure the web.
We also described several options for the computing environment running your web server: a dedicated machine in your home or office, a dedicated machine in a server farm, a virtual machine, or a containerized cloud service. We discussed the benefits and drawbacks of each approach. As you prepare to host your own applications, you’ll want to consider these, along with the costs of the various services, to decide how you want to deploy your production application.