Containers
All the fun of virtual machines without all the extra baggage!
All the fun of virtual machines without all the extra baggage!
Welcome to module 5 A - the first new module added to this course in several years. In this module, we’re going to explore one of the biggest trends in system administration - containerization. We’ll see how containers have quickly overtaken traditional virtual machines for running workloads in the cloud, and how tools such as Docker make using containers quick and easy. We’ll also learn how we can use containers in our own development workflows, making it much easier to build tools that will work as seamlessly in the cloud as they do on a local system. Finally, we’ll take a look at Kubernetes, which has quickly become the most talked about tool for orchestrating containers in the cloud.
Before we begin, I want to give a huge shout out to the TechWorld with Nana channel on YouTube, which is linked at the top of this page. Her comprehensive video courses on Docker and Kubernetes are the most well organized resources for those tools I’ve come across, and much of the structure of this module is inspired by her work. So, I must give credit where it is due, and I encourage you to check out her resources as well.
We’ll start by taking a look at the architecture of containers in general, and then we’ll dive in to using Docker on our own systems. Let’s get started!
Up to this point, we’ve been working with traditional virtual machines in our labs. To use a traditional virtual machine, we first install a hypervisor that allows us to run multiple VMs on a single system. That hypervisor may be installed directly on the hardware, as it is in most enterprise settings, or we may install the hypervisor on top of an existing operating system, like we’ve done with VMWare on our own computers. Then, inside of the virtual machine, we install another entire operating system, as well as any of the software we need to run our applications. If we want to have multiple VMs that all run the same operating system, we’ll have to install a full copy of that operating system in each VM, in addition to the operating system that may be installed outside of our hypervisor. Can you spot a possible problem here?
In traditional VM architectures, we may end up wasting a lot of our time and resources duplicating the functions of an operating system within each of our virtual machines. This can lead to very bloated virtual machines, making them difficult to move between systems. In addition, we have to manage an entire operating system, complete with security updates, configuration, and more, all within each virtual machine. While there are many tools to help us streamline and automate all of this work, in many ways it can seem redundant.
This is where containers come in. Instead of a hypervisor, we install a container engine directly on our host operating system. The container engine creates an interface between the containers and the host operating system, allowing them to share many of the same features, so we don’t need to install a full operating system inside of a container. Instead, we just include the few parts we need, such as the basic libraries and command-line tools. Then, we can add our application directly on top of that, and we’re good to go. Containers are often many times smaller than a comparable virtual machine, making them much easier to move between systems, or we can just choose to run multiple instances of the same container to provide additional redundancy. In addition, as we’ll see later, containers are built from read-only images, so we don’t have to worry about any unwanted changes being made to the container itself - a quick restart and we’re back to where we started! With all of those advantages, hopefully you can see why containers have quickly become one of the dominant technologies for virtualization in industry today.
In this course, we’re going to take a deep dive into using Docker to create and manage our containers. Docker is by far the most commonly used container platform, and learning how to work with Docker is quickly becoming a must-have skill for any system administrator, as well as most software developers. So, learning how to use Docker is a great way to build your resume, even if you aren’t planning to work in system administration!
Docker itself consists of three major parts. First, we have the Docker client, which is either the Docker command-line tool or a graphical interface such as Docker Desktop. We use the client to interface with the rest of the system to create and manage our containers. Behind the scenes, we have the Docker engine, also known as the Docker daemon, which handles actually running our containers as well as managing things such as images, networks, and storage volumes. We’ll spend the next several parts of this module exploring the Docker client and engine. Finally, Docker images themselves are stored on a registry such as Docker Hub, allowing us to quickly find and use images for a wide variety of uses. Many code repositories such as GitHub and GitLab also can act as registries for container images, and you can even host your own!
Next, let’s look at the structure of a typical container. A container is a running instance of an image, so we can think of a container like an object in object-oriented programming, which is instantiated from an image, acting like a class in this example. An image consists of several layers, which are used to build the file system of the image. Each time we make a change to the contents of an image, such as installing a piece of software or a library, we generate a new layer of the image. This allows us to share identical layers between many images! For example, if we have multiple images that are build using the same Ubuntu base image, we only have to store one copy of the layers of the Ubuntu base image, and then each individual image just includes the layers that were added on top of the base image! This makes it easy to store a large number of similar images without taking up much storage space, and we can even take advantage of this structure to cache layers as we build our own images!
When we create a container from an image, we add a small, temporary read/write layer on top of the image layers. This is what allows us to instantiate the container and have it act like a running system. As soon as we stop the container, we discard the read/write layer, which means that any data added or changed in the container is lost, unless we make arrangements to have it stored elsewhere. So, this helps us maintain a level of confidence that any container started from an image will always be the same, and we can restart a container anytime to revert back to that state.
So, as we saw earlier, we can easily instantiate multiple copies of the same image into separate containers. Each of those containers will have their own read/write layer, but they’ll share the same base image. This makes it extremely easy to deploy multiple instances of the same container on a single system, allowing us to quickly build in redundancy and additional capacity quickly.
We can also use this to build really interesting infrastructures! Consider developer tools such as GitHub Codespaces or educational tools like Codio - behind the scenes, those platforms are simply instantiating containers based on a shared image that contains all of the development tools and information that the user needs, and once the user is done with their work, the container can be either paused to save computing resources, or it can be destroyed if no longer needed by the user.
So, in summary, why should we consider using containers in our infrastructure instead of the virtual machines we’ve been learning about so far? Well, for starters, we know that images for containers are typically much smaller than traditional virtual machines, and the use of layers and de-duplication allows us to further reduce the storage needed by images that share a common ancestor. In addition, since the image itself is somewhat decoupled from the underlying operating system, it becomes very easy to decouple our applications from the operating system it runs on. We’ve also discussed how containers and images are very portable and scalable, so it is very easy to start and run a container just about anywhere it is needed. Finally, as we’ll see later in this module, there are many tools available to help us orchestrate our containers across many systems, such as Kubernetes.
One topic we really haven’t talked about yet is sandboxing. This is another great feature of containers that really can’t be overlooked. Consider a traditional web-hosting scenario, where a single server has a webserver such as Apache installed, as well as a database, email server, and more. Since all of those applications are installed on the same operating system, a vulnerability in any one of them can allow an attacker access to the entire system! Instead, if each of those applications is running in a separate container, we can easily isolate them and control how they communicate with each other, and therefore prevent a vulnerability in one application from affecting the others! Of course, this depends on us properly configuring our container engine itself, and as we’ll see later in this lab, it is very easy to unintentionally introduce security concerns unless we truly understand what we are doing.
As we’ve already heard in this video, containers are quickly becoming one of the most used tools in system administration today. According to Docker, as of April 2021, there have been over 300 billion individual image downloads from Docker Hub, with 10% of those coming just in the last quarter! There are also over 8.3 million individual image repositories hosted on Docker Hub, meaning that nearly any self-hosted piece of software can probably be found on Docker Hub in some form.
Datadog also publishes their list of the top technologies running in Docker among their clients. Looking here, there aren’t too many surprises - we see that the Nginx webserver, Redis key-value database, and Postgres relational database are the top three images, with many other enterprise tools and data storage platforms close behind.
Hopefully this gives you a good overview of the architecture of containers and some of the technology behind the scenes. In the next few parts of this lab, we’re going to dive directly into using Docker on our own systems.
Now that we’ve covered the general architecture of containers, let’s dive a bit deeper into Docker itself. Docker is by far the most commonly used tool for creating and managing containers on a small scale, and learning how to use it is a super useful skill for just about any one in a system administration or software development field. So, let’s take a look at!
First, let’s go over some general terminology related to containers. Recall that an image is a read-only template that is used to build a container. The image itself consists of several layers, and may be built on top of another existing image.
When we launch an image, it is instantiated into a container. A container is a running instance of an image, and it includes the read/write layer as well as any connections to resources such as networks, volumes, and other tools.
Finally, a volume in Docker is a persistent storage location for data outside of a container. We’ll explore how to create volumes and store data from a container later in this lab.
Docker itself consists of several different components. The Docker engine, also known as the Docker daemon, is the tool that is installed on the system that will host the containers. It manages creating containers from images and connecting them to various resources on the system.
The Docker client is the command-line tool that we use to interface with the Docker engine. On systems with a GUI, we can also install Docker Desktop, which acts as a client for interfacing with Docker, but it also includes the Docker engine. In many cases, installing Docker desktop is the best way to get started with Docker on our own systems, but when working in the cloud we’ll generally install the traditional Docker engine and client.
Along with the Docker client, we have another tool known as Docker compose. Docker compose allows us to build configuration files that define one or more containers that should work together, and we can also use Docker compose to easily start and stop multiple containers quickly and easily. I generally prefer working with Docker compose instead of the Docker client, but it is really just personal preference. There are also 3rd party tools that can interface with Docker, such as Portainer, that can do much of the same work.
Finally, we also need to know about registries for container images. Docker Hub is by far the most well known of these, but it is also important to know that both GitHub and GitLab can act as a registry for Docker images. Some companies, such as Microsoft, also choose to host their own registries, so many of the .NET Docker images must be downloaded directly from Microsoft’s own registry instead of Docker Hub. For this course, we’ll just use Docker Hub and the official images hosted there.
So, to get started with Docker, there are several commands that we’ll end up using. We’ll see these put into practice over the next few parts of this lab, so don’t worry about memorizing them right now, but I want to briefly review them so you’ll know what they are for when you see them used in the later examples.
First, to get an image from an image registry, we use the docker pull
command. This will download the image and store it locally on your system. To review the images available on your system, you can use the docker images
commands.
Once we have an image, we can instantiate it into a container using the docker run
command. Thankfully, docker run
is also smart enough to download an image if we don’t have it locally, so in many cases it is sufficient to simply use docker run
to both get an image and start a container.
Once a container is created, we can use docker stop
and docker start
to stop and start an existing container. This is helpful if we want to effectively “pause” a running container without losing any of its configuration.
To see the containers that are currently running, we can use docker ps
. We can also see all containers, including those that are stopped, using docker ps -a
.
Another very useful command is docker exec
, which allows us to run a command within a running container. This is very useful if we need to perform some configuration within the container itself, but we can also use it to open an interactive shell directly within a container.
Finally, most applications running within Docker containers are configured to print logs directly to the terminal, and we can view that output using the docker logs
commands. This is very helpful when a container isn’t working like we expect, so we can explore the output produced by the application running in the container to help us debug the problem.
Before we can start a container, we must find an image that we’d like to run. To do that, one of the first things we can do is just head to Docker Hub itself and start searching for an image. However, as we’ve already heard, there are other registries available, so if we can’t find what we are looking for on Docker Hub, we may want to check sites such as GitHub or the application’s website to learn about what images are available.
Once we’ve found an image we’d like to run, we need to consider a couple of other factors. First, images on Docker Hub and other registries are marked with various tags. The tags are used to identify the particular version or architecture of the image. So, let’s take a look at the official image for Nginx to see what tags are available. (Navigate to Nginx).
Here, we see many various versions of Nginx available, including images that are configured to include the perl
module, and images based on the Alpine Linux project. If we scroll down the page a bit, we can see a discussion of the various image variants and how they differ.
Most Docker images are based on one of the mainline Linux distributions, usually either Debian or Ubuntu. However, one of the image variants offered by Nginx, as well as many other applications, is based on the Alpine Linux project. Alpine is a very small Linux distribution that uses very compact libraries and doesn’t include a lot of tools, making it ideal as a basis for very small Docker images. However, this can make it more difficult to work with Alpine-based images, since there aren’t many tools included that can help with debugging. Likewise, building an image based on Alpine requires much more knowledge of the system and what libraries need to be included.
Basically, if you aren’t sure, it is generally best to go with an image based on a mainline Linux distribution you are familiar with. However, if you’d like to make your images as small as possible, you can choose to use an Alpine-based image instead.
Finally, if we scroll back to the top of the page, you’ll see that the Nginx image is tagged as a “Docker Official Image” meaning that it is one of the curated repositories that has been reviewed by Docker and will generally be well maintained and updated. You can click on that tag to learn more, and even learn how to search Docker Hub for other official images. When in doubt, it is best to look for one of these official images anytime you want to use a particular application.
For this example, we’re going to use another Docker Official Image called Hello World. This image is a great example of a minimal container, making it an easy place to get started with Docker. So, now that we’ve found the image we want to use, let’s go back to the terminal and learn how to create and run some containers.
Before you begin, you’ll need to install the Docker platform on your system. At the top of this page are links directly to the Docker instructions for installing Docker on various systems. If you are working on your local system or in a virtual machine, you will probably want to install Docker Desktop, since that handles everything for you. For installations in the cloud, you’ll need to install the Docker engine and client directly, and you may also want to add the Docker Compose plugin as well.
[Demo Here]
There we go! That’s a quick overview of using Docker to create and run containers. However, this small example leaves many questions unanswered! For example, how can we access our Nginx server outside of the Docker container itself? What if we want to host our own files in Nginx instead of the default webpage? Also, how can we store data from our containers so that it isn’t lost each time the container restarts. And finally, is there a way to simplify these Docker commands so they are easy to replicate multiple times?
Over the next few parts of this lab, we’ll work on addressing each of these questions.
Up to this point, we’ve only looked at how we can start and run containers one at a time on a single platform. While that is great for development and testing, we should know by now that manually doing anything in an enterprise situation is not a great idea. So, let’s explore some of the methods and tools we can use to better orchestrate our containers and automate the creation and deployment of them.
Container orchestration is the overarching term I’ll use for this concept, though it maybe isn’t the most descriptive term. When are orchestrating containers, we are really talking about ways to manage and deploy containers at scale, sometimes on a single node, and sometimes across multiple nodes. For now, we’ll only focus on a single node, and we’ll address working with multiple nodes later in this module.
Orchestration also includes things such as managing the routing of network data and other resource between the various nodes, and possibly automating the ability to restart a node if it fails.
In short, we want to take the same approach we took in Lab 2 related to building individual systems and apply that same idea to working with containers.
Here’s a quick diagram showing what a full-fledge container orchestration setup might look like. At the top, we have a configuration file defining the architecture we want to deploy - in this case, a Docker Compose file, which is what we’ll cover in this lesson. That file is then used to control multiple Docker engines running across multiple individual computing nodes, or servers on the cloud. This allows us to run containers across multiple nodes and handle situations such as load balancing and automated recovery in case of an error.
Underneath the Docker engine, we may have multiple logical “clusters” of nodes or containers, which represent the different environments that our nodes are operating within. For example, we may have a test cluster and a production cluster, or our application may be separated across various physical or logical locations.
Kubernetes follows a similar architecture, but we’ll look at that more in depth later in this module.
So, for this example, let’s look at another Docker tool, Docker Compose. Docker Compose is used to create various isolated Docker environments on a single node, usually representing various application stacks that we need to configure. Using Docker Compose, we can easily record and preserve the data required to build our container infrastructure, allowing us to adopt the “infrastructure as code” principle. Once we’ve created a Docker Compose file, we can easily make changes to the infrastructure described in the file and apply those changes to the existing setup.
Finally, using Docker Compose makes it easy to move an infrastructure between various individual systems. A common use-case is to include a Docker Compose file along with a repository for a piece of software being developed. The Docker Compose file defines the infrastructure of other services that are required by the application, such as a database or message queue.
The Docker Compose tool uses files named docker-compose.yml
that are written using the YAML format. Inside of that file, we should see a services
heading that defines the various services, or containers, that must be created. This particular file lists two services, nginx
and mysql
. Those second-level service names can be anything we want them to be, but I’ve found it is easiest to match them closely to the name of the Docker image used in the service. As a minimal example, each service here contains the name of a Docker image to be used, as well as a friendly name to assign to the container. So, this Docker Compose file will create two containers, one running Nginx and another running Mysql. It’s pretty straightforward, but as we’ll soon see, these files can quickly become much more complex.
Once we’ve created a Docker Compose file, we can use the docker compose
commands to apply that configuration. Previously, docker-compose
was a separate Docker client that was installed individually, but with the release of version 2.0 it was integrated within the main Docker client as a plugin. If you installed Docker Desktop, you should already have docker compose
available as well, but if not you may have to install the Docker Compose plugin. The instructions for this are available in the links at the top of this page.
[demo here]
Once major feature of the Docker Compose file format that many users may not be aware of is the ability to extend
a service definition from another file. This allows us to create individual files for each service, and them compose them together in a single docker-compose.yml
file. We can also use this same technique to share some settings and configuration between many different services within the same file.
In this example, we see our main docker-compose.yml
file that contains a service definition for a webserver. That definition extends the basic nginx
service that is contained in another file named nginx.yml
in the same directory. Using this method, we can create multiple different web servers that all inherit the same basic nginx
configuration, but each one can be customized a bit further as needed. This greatly reduces the amount of duplicated code between services, and can also help to greatly reduce the size of the main docker-compose.yml
file for very complex setups.
There we go! That’s a quick crash course in using Docker Compose to build a defined configuration for a number of Docker containers. For the rest of this lab, we’ll mainly work in Docker Compose, but we’ll show some basic Docker commands and how they compare to the same setup in Docker Compose.
So far we’ve learned how to pull Docker images from a repository and instantiate them as containers, and also how to use Docker Compose to achieve the same result, but we really haven’t been able to do much with those containers. That’s because we haven’t given our containers any resources to work with. So, let’s do that now.
There are many different resources that can be assigned to containers. We can connect a network port from our host system to a network port on a container, allowing external access into the container from the network. We can also attach containers to various internal networks, allowing specific containers to talk with each other directly. In addition, we can set specific environment variables within a container, which are primarily used by the applications running within the container as configuration parameters, specifying things such as usernames, passwords, and other data needed by the container. Finally, we can attach a storage volume to a container, allowing us to store and persist data even after a container is stopped.
Let’s start with the most common container resource - the networked port. This diagram shows a high-level overview of how this works. Inside of Docker, we might have two containers, one running a database server and the other hosting a web server. Since we want external users to be able to talk to the web server, we can map a network port from our host system, represented by the outer box, to a port on the web container. In this case, we are connecting the external port 8000 with the internal port 5000 on the web
container. So, any incoming network traffic to our host on port 8000 will be forwarded to port 5000 on the web container.
To do this in Docker, we can use the -p
flag on the docker run
command to map a port from the host system to a container. The syntax of this command puts the external port first, then the port on the container. So, in this command, we are connecting port 8080 on our system to port 80 inside the Nginx container.
Below, we see the same system defined in Docker Compose as well. It adds a ports:
entry below the service definition, and uses the same syntax as the docker run
command. Of course, in both cases we can map multiple ports inside of the container by supplying additional -p
entries in docker run
or by adding additional elements below the ports:
entry in Docker Compose.
So, let’s give this a try and then show that we can access the webserver from outside the container.
[demo here]
Once we’ve mapped a port on a container, we can use the docker ps
command to see the various port mappings present on a container. This is a great way to quickly confirm that we mapped the port correctly. I still sometimes get my ports reversed, so this is a quick and easy way to confirm it is set up properly when I’m debugging problems in a container.
Next, lets go from individual ports to entire networks. We can expand the previous example by moving the db
container to a separate network within Docker, isolating it from other containers and possibly even the outside world by making the new network internal to Docker.
–
To do this using the Docker client, we can start by creating our container using the docker run
command. This will connect the container to the default network in Docker - any container that is started without defining a network will be automatically connected to the default network.
However, we can create a new network using the docker network
command as shown here. This will create a new bridge network named redis_network
, and it also specifies that it is an internal
network, meaning that it won’t be connected to the outside world at all.
Once we’ve created the network, we can attach any existing containers to that network using the docker network connect
command.
We can also use the --network
flag in the docker run
command automatically connect a container to a defined network when we create it. However, it is worth noting that this has two caveats:
docker run
command. If we want the container to be attached to multiple networks, we have to attach the rest manually using the docker network connect
command.This Docker Compose file shows the same basic setup as the previous set of Docker commands We start by creating an nignx
service that is connected to both the default
and redis
networks, and it has a port mapping to the outside world. Then, the redis
service is only connected to the redis
network. Finally, at the bottom of the file, we see a new top-level entry networks
that lists the networks used by this configuration. Notice that we don’t have to include the default
network here, since it is created for us automatically, though we can list it here if we want to configure it beyond the default settings.
So, let’s go ahead and apply this Docker Compose file to see it in action.
[demo here]
Once we’ve got our containers running, we can use the docker network inspect
command to inspect the various networks on our system. If we scroll through the output, we can find the containers connected to each network, as well as other information about how the network is configured. We can also test and make sure the containers can talk with each other, though we’ll leave that to a later example.
Another resource we can give to our Docker containers are environment variables. Many applications that are developed to run in Docker containers use environment variables for configuration. A great example of this is the MySQL Docker image - when we create a container from that image, we can provide a root password for the database as an environment variable. We can also use it to configure a second account and database, making it quick and easy to automate some of the basic steps required when setting up a database for an application. This diagram shows some of the places where environment variables are used in the process of building and instantiating a Docker container.
So, let’s see what that looks like in practice. Here is a set of docker
commands that can be used to create two containers - one containing a MySQL database server, and another containing PHPMyAdmin, a program for managing MySQL databases. In both of the docker run
commands, we are including an environment variable using the -e
flag. For MySQL, we are setting a root password. Then, for the PHPMyAdmin container, we are configuring the hostname where it can find the MySQL server. Notice that the hostname we are giving it is the name of the MySQL container we created earlier, mysql1
. This is one of the coolest aspects of networking in Docker! Docker actually handles an internal form of DNS for us, so each Docker container on a network is accessible by simply using the container name as the full hostname on the Docker network. This makes it super simple to connect Docker containers together in a network!
This slide shows the same basic setup in a Docker Compose file. Just like we specify ports and networks in the YML format, we can add another entry for environment
that lists all of the environment variables. So, as before, we can create a docker-compose.yml
file containing this content, and then use the docker compose up -d
command to bring up this configuration in Docker. Let’s do that now to see how it works.
[demo here]
We can also explore our running Docker containers to find the various environment variables that are available to that container. We can use the docker inspect
command to inspect the configuration of each individual container. Of course, this presents a unique security vulnerability - notice here that we can easily see the root password for our MySQL server in our Docker environment. This is not good! If anyone has access to the Docker engine, they can potentially find out tons of sensitive information about our containers! So, we should think very carefully about what information we put in the environment variables, and who has access to the Docker engine itself. Thankfully, for orchestrating containers across the network, tools such as Kubernetes include a way to secure this information, so it is less of a concern there.
Now that we have a running MySQL server, along with PHPMyAdmin, let’s take a minute to set up a database and store some information in it.
[demo here]
Now that we’ve stored some information in this database, let’s stop and restart the container and see what happens.
[demo here]
Uh oh! Our information disappeared! This is because the transient read/write layer of a container is discarded when it is stopped, so any changes to the filesystem are not kept. While this is a great feature in many ways, it can also cause issues if we actually want to save that data!
Docker includes two different ways we can store data outside of a container. The first is called a bind mount, which simply connects a directory on our host’s filesystem to a folder path in the container. Then, any data in the container that is written to that folder is actually stored in our host’s filesystem outside of the container, and it will be available even after the container is stopped.
The other method we can use is a Docker volume. A Docker volume is a special storage location that is created and managed by Docker itself. We can think of it like a virtual hard disk that we would use with VMWare - it is stored on the host filesystem, but it isn’t directly accessible outside of Docker itself.
There are many pros and cons for using both bind mounts and volumes in Docker - it really depends on how you intend to use the data. See the Docker documentation for a good discussion about each type of storage and the use-cases where it makes the most sense.
To use a bind mount in Docker, we must first make sure the folder exists in our host filesystem. For this, I’m just going to use the mkdir
command to make the directory. We can then add a -v
parameter to our docker run
command. The first part of the volume entry is the path to the directory on the host system, and then following a colon we see the path within the container’s filesystem where it will be mounted. This is a very similar syntax to the port mappings we learned about earlier. At the bottom of this slide, we see the same configuration in Docker compose - we simply add a volumes
entry and list the bind mount there.
To use a volume in Docker, the process is very similar. First, we can use the docker volume create
command to make a volume in Docker. Then, in our docker run
command, we use the name of the volume as the first part of the -v
parameter, followed by the path inside of the container where it should be mounted.
In Docker Compose, the process is very similar. However, in this case, we don’t have to manually use the docker volume create
command, as Docker Compose will handle creating the volume for us.
One thing to be aware of is where the volume is mounted within the container. Many Docker images, such as MySQL, include notes in the documentation about where data is stored in the container and which directories should be mounted as volumes to persist the data. So, in this case, the /var/lib/mysql
directory in the container is the location given in the MySQL Docker Image documentation for persistent storage, so that’s why we are mounting our volume in that location.
So, let’s update our MySQL container to include a volume for storing data. Once we do that, we’ll show that it will properly store and persist data.
[demo here]
Finally, just like any other resource, we can use the docker inspect
commands to see the volumes and bind mounts available on a container.
There we go! That’s a pretty in-depth overview of the various resources that we can add to a Docker container. We’ve learned how to map ports from the host system to ports within a container, connect containers together via various Docker networks, provide data to Docker containers in environment variables, and finally we can now persist data using bind mounts and Docker volumes. At this point, we should have enough information to really start using Docker effectively in our infrastructures. For the rest of this module, we’ll dive into various ways we can use Docker and more advanced configurations.
So far, we’ve explored how to download and use pre-built Docker images in our infrastructure. In many cases, that covers the vast majority of use cases, but what if we want to build our own Docker images? In this lesson, we’ll explore the process for creating our own Docker images from scratch. The process is pretty simple - we start with a special file called a Dockerfile
that describes the image we want to create. In that file, we’ll identify a base image to build on, and then include information to configure that image, add our software and configure it, and then set some last details before building the new image. So, let’s see how that process works in practice!
Normally, we’d probably have our own application that we want to include in the Docker container. However, since this is a course on system administration, we’re going to just use a sample program provided by Docker itself as the basis for our image. So, you can follow along with this entire tutorial by going to the Docker Get Started documentation that is at the URL shown on this slide. We’ll basically be following most of the steps in that tutorial directly.
Our first step is to get a copy of the application code itself. So, on one of our systems that has Docker installed and configured, we can start by cloning the correct repository, and then navigating to the app
directory inside of that repository. At this point, we may want to open that entire directory in a text editor to make it easy for us to make changes to the various files in the application. I’m going to open it in Visual Studio Code, since that is what I have set up on this system.
Now that we have our application, we can create a Dockerfile
that describes the Docker image that we’d like to create that contains this application. This slide shows the Dockerfile
from the tutorial. Let’s go through it line by line to see what it does.
First, we see a FROM
line, which tells us what base image we want to build the image from. In this case, we have selected the node:12-alpine
image. Based on the context, we can assume that this is a Docker image containing Node.js version 12, and that it was originally built using the Alpine Linux project, so the image is a very minimal image that may not have many additional libraries installed. This also tells us that we’ll need to use commands relevant the Alpine Linux project to install packages, so, for example, we’ll need to use apk add
instead of apt install
.
The next line is a RUN
command, which specifies commands that should be run inside of the image. In this case, we are installing a couple of libraries using the apk add
command. Since this command will change the filesystem of the image, it will end up generating a new layer on top of the node:12-alpine
image.
Next, we have a WORKDIR
entry, which simply sets the working directory that the next commands will be run within. So, in this case, we’ll be storing our application’s code in the /app
directory within the Docker image’s file system.
After that we, see a COPY
command. This one is a bit confusing, since the copy command in a Dockerfile operates in two different contexts. The first argument, .
is interpreted within the host system, and references the current directory where the Dockerfile
is stored. So, it will in effect copy all of the contents from that directory into the Docker image. The second argument, also a .
, is relative to the Docker image itself. So, these files will be placed in the current directory in the Docker image, which is /app
based on the WORKDIR
entry above. As you might guess, this also creates a new layer in our created Docker image.
Once we’ve loaded our application files into the image, we see another RUN
command. This time, we are using the yarn
package manager to install any libraries required for our Node.js application to function. If you are familiar with more recent Node.js applications, this is roughly equivalent to the npm install
command for the NPM package manager. This creates a third new layer in our resulting image.
Finally, we see two entries that set some details about the final image itself. The first entry, CMD
will set the command that is executed when a container is instantiated using this Docker image. Since we are building a Node.js application, we want to use the node
command to run our server, which is stored in the src/index.js
file. Notice that this command is also relative to the WORKDIR
that was given above!
Finally, we have an EXPOSE
entry. This simply tells Docker that the port that should be made available from a container instantiated using this image is port 3000. However, by default this won’t actually create a port mapping to that port - we still have to do that manually! However, there are some tools that can use this information to automatically route data to the Docker container itself, and this EXPOSE
entry helps it find the correct port to connect to.
There we go! That’s a basic Dockerfile
- it may seem a bit complex, but when we break it down line by line, it is actually very simple and easy to understand.
So, once we’ve created a Dockerfile
for our project, we can use the docker build
command to actually build our Docker image. In that command, we can use the -t
argument to “tag” the image with a simple to understand name so we can easily reference it later. We also need to specify the location of the Dockerfile
, which is why we include the .
at the end of the command.
Once our Docker image is built, we can use a simple docker run
command to instantiate a container based on that image. We’ll also map port 3000 so we can access it from our local machine. If everything works, we can go to a web browser and visit http://localhost:3000 to see our application running!
Now that we’ve built and run our Docker image, let’s make a quick update to the code and see how easy it is to rebuild it and run it again. First, we’ll make a quick edit in the app.js
file to change a line of text. Then, we’ll use docker build
to rebuild the Docker image. Take note of how long it takes to rebuild that image - we’ll try to make that faster a bit later!
Once it is built, we are ready to use it to instantiate a container. However, before we can run it, we must stop the running container that was built from the previous image using docker stop
, and then we can use docker run
to instantiate a new container.
At this point, we have a working Docker image that includes our application, but it is just stored on our own computer. If we want to share it with others, we have to upload it to a registry such as Docker Hub. If the project is hosted on GitHub or GitLab, we can use the various continuous integration and delivery features of those tools to automatically build our image and then host it on their registries as well. You can refer to the documentation for various registries if you are interested in hosting your image there. You can also look at some example CI/CD pipelines for both GitHub and GitLab by following the links at the top of this page.
Another task we may want to do after building our image is scanning it for vulnerabilities. Docker includes a quick scanning tool docker scan
that will look at the image and try to find various software packages that are vulnerable. It is even able to look at things such as the package.json
file for Node.js applications, or the requirements.txt
file for various Python programs, among others.
As we can see on this slide, the current version of the Getting Started application actually includes a number of vulnerabilities introduced by an older version of a couple of Node.js libraries, so we may want to work on fixing those before we publish our image.
Likewise, when we download an image from a registry, we may want to scan it first to see what vulnerabilities may be present in the image before we run it. Of course, this isn’t able to catch all possible vulnerabilities, but it is a good first step we can use when reviewing Docker images.
Finally, let’s take a minute to look at the structure of our Dockerfile
and see if we can do something to improve it a bit. Previously, our Dockerfile
basically created three new layers on top of the base image - one to install packages in the operating system itself, one to add our application files, and a third to install packages for the Node.js environment that our application uses. Thankfully, Docker itself is very good about how it manages layers on an image, and if it is able to determine that a layer can be reused, it will pull that layer from the cache and use it instead of rebuilding it. So, as long as we start with the same base image and install the same software each time, Docker will be able to use that layer from the cache. So, our first layer can easily be loaded from the cache!
However, each time we change any of the code in our application, we will end up having to rebuild the last two layers, and that can greatly slow down the process of building a new image. Thankfully, by changing our Dockerfile
just a bit, as shown on this slide, we can take advantage of the caching capability of Docker to make subsequent builds much easier. In this case, we start by just copying the package.json
and yarn.lock
files to the image, and then run yarn install
to install the libraries needed for our application. This will populate the node_modules
folder, and it creates a new layer for our image. That layer can be cached, and as long as we don’t make any changes to either the package.json
or yarn.lock
files in our application, Docker will be able to reuse that cached layer!
From there, we can copy the rest of our application files into the image and then we are good to go. However, since we’ve already installed our libraries, we don’t want to overwrite the node_modules
folder. So, we can create a special file named .dockerignore
to tell Docker to ignore various files when creating our image. It works the same way as a .gitignore
file. So, we can just tell Docker to ignore the node_modules
folder when it copies files into our image!
Try it yourself! Build an image with this new Dockerfile
, then make a change somewhere in the application itself and rebuild the image. You should now see that Docker is able to reuse the first two layers from the cache, and the build process will be much faster. So, by thinking a bit about the structure of our application and how we write our Dockerfile
, we can make the build process much more efficient.
Docker has some language-specific guides in their documentation that give Dockerfile
examples for many common languages, so you can easily start with one of their guides whenever you want to build your own images. The Docker extension for Visual Studio Code also has some great features for building a Dockerfile
based on an existing project.
That should give you everything you need to start building your own Docker images!
Docker is obviously a very useful tool for system administration, as we’ve seen throughout this module. In this lesson, however, let’s take a minute to talk about how we can use Docker in the context of software development. Depending on the application we are developing and how it will eventually be packaged and deployed, Docker can be used at nearly every stage of the development process.
First, for applications that may be packaged inside of a Docker image and deployed within a container, we may wish to create a Dockerfile
and include that as part of our source code. This makes it easy for us to use Docker to run unit tests in a CI/CD pipeline, and we can even directly deploy the finalized image to a registry once it passes all of our tests.
This slide shows a sample Dockerfile
for a Python application written using the Flask web framework. It is very similar to the Node.js example we saw earlier in this module. By including this file in our code, anyone can build an image that includes our application quickly and easily.
Another great use of Docker in software development is providing a docker-compose.yml
file in our source code that can be used to quickly build and deploy an environment that contains all of the services that our application depends on. For example, the Docker Compose file shown in this slide can be included along with any web application that uses MySQL as a database. This will quickly create both a MySQL server as well as a PHPMyAdmin instance that can make managing the MySQL server quick and easy. So, instead of having to install and manage our own database server in a development environment, we can simply launch a couple of containers in Docker.
In addition, this file can be used within CI/CD pipelines to define a set of services required for unit testing - yet another way we can use the power of Docker in our development processes.
A more advanced way to use Docker when developing software is to actually perform all of the development work directly within a Docker container. For example, if we are using a host system that has one version of a software installed, but we need to develop for another version, we can set up a Docker container that includes the software versions we need, and then do all of our development and testing directly within the container. In this way, we are effectively treating it just like a virtual machine, and we can easily configure multiple containers for various environments quickly and easily.
The Visual Studio Code remote extension makes it easy to attach to a Docker container and run all of the typical development tools directly within a container. For more information about how that works, see the links in the resources section at the top of this page.
Finally, as I’ve alluded to many times already, we can make use of Docker in our continuous integration and continuous delivery, or CI/CD, pipelines. Both GitHub and GitLab provide ways to automate the building and testing of our software, and one of the many tasks that can be performed is automatically creating and uploading a Docker image to a registry. This slide shows an example of what this looks like using GitHub actions. All that is required is a valid Dockerfile
stored in the repository, and this action will do the rest.
Similarly, GitLab allows us to create runners that can automate various processes in our repositories. So, this slide shows the same basic idea in a GitLab pipeline. In fact, this pipeline will be run within a Docker container itself, so we are effectively running Docker in Docker to create and push our new Docker image to a registry.
At the top of this page are links to a couple of sample repositories that contain CI/CD pipelines configured in both GitHub and GitLab. In each repository, the pipeline will create a Docker image and publish it to the registry attached to the repository, so we can easily pull that image into a local Docker installation and run it to create a container.
This is just a small sample of how we can use Docker as a software developer. By doing so, we can easily work with a variety of different services, automate testing and packaging of our software, and make it easy for anyone else to use and deploy our applications within a container. So, I highly encourage you to consider integrating Docker into your next development project.
As we’ve seen, deploying services using Docker can make things very easy. Unfortunately, one major limitation of Docker is that the only way connect a running container to the outside world is by mapping a port on the host system to a port on the container itself. What if we want to run multiple containers that all host similar services, such as a website? Only one container can be mapped to port 80 - any other containers would either have to use a different port or be running on another host. Thankfully, one way we can get around this is by using a special piece of software known as a reverse proxy. In a nutshell, a reverse proxy will take incoming requests from the internet, analyze them to determine which container the request is directed to, and then forward that request to the correct container. It will also make sure the response from the container is directed to the correct external host. We call this a reverse proxy since it is acting as a connection into our Docker network, instead of a traditional proxy that helps containers reach the external internet.
As it turns out, the Nginx web server was originally built to be a high-performance reverse proxy, and that is still one of the more common uses of Nginx even today. In addition, there are many other Docker images that can help us set up and manage these reverse proxy connections. In this lesson, we’ll review three different ways to create a reverse proxy in our Docker infrastructures.
The first option is to use Nginx directly. For this, we’re going to set up a Docker Compose file that contains 3 different containers. First, we have a standard Nginx container that we are naming proxy
, and we are connecting port 8080 on our host system to port 80 on this container for testing. In practice, we’d generally attach this container to port 80 on our host system as well. Notice that this is the only container connected to the default
Docker network, so it is the only one that is meant to access the internet itself. Finally, we’re using a bind mount on a volume that will contain a template configuration file for Nginx - we’ll look at that a bit later.
The other two containers are simple whoami
containers. These containers will respond to simple web requests and respond with information about the container, helping us make sure we are reaching the correct container through our reverse proxy. These two containers are both connected to the internal
network, which is also accessible from the proxy
container as well.
To configure the Nginx server as a reverse proxy, we need to provide a template for configuration. So, inside of the folder mounted as a volume in the container, we need to create a file named default.conf.template
that will be used by Nginx to generate it’s configuration file. The contents of that file are shown here on the slide.
In effect, we are creating two different servers, one that listens for connections sent to hostname one.local
, and the other will look for connections to two.local
. Depending on which hostname is specified in the connection, it will forward those connections to either container whoami1
or whoami2
. We also include a bunch of various headers and configuration settings for a reverse proxy - these can be found in the Nginx documentation or by reading some of the tutorials linked at the top of this page.
So, now that we’ve set up our environment, let’s get it running using docker compose up -d
. After all of the containers are started, we can use docker ps
to see that they are all running correctly.
Now, let’s test a connection to each container. We’ll use the simple curl
utility for this, and include the -H
parameter to specify a hostname for the connection. So, by sending a connection to localhost
on port 8080
with the hostname one.local
specified, we should get a response from container whoami1
. We can confirm that this is correct by checking the response against the output of docker ps
. We can do the same to test host two.local
as well. It looks like it is working!
So, with this setup in place, we can replace the two whoami
containers with any container that has a web interface. It could be a static website, or an application such as PHPMyAdmin that we’d like to include as part of our infrastructure. By setting up a reverse proxy, we can access many resources on the same host and port, simply by setting different hostnames. So, all we’d need to do is register the various hostnames in our DNS configuration, just like we did when working with virtual hosts in Apache as part of Lab 5, and point them all at the same IP address where our Docker infrastructure is hosted. That’s all it takes!
Another option for setting up a reverse proxy in Nginx is to use the nginx-proxy
Docker image. This image includes some special software that can analyze our Docker configuration and automatically set up the various configurations needed for a reverse proxy in Nginx. This Docker Compose file shows the basic way this setup works.
First, we are creating a new Docker container named proxy
based on the nginx-proxy
image. We’re still mapping port 80 inside of the container to port 8080 on our host system, and connecting it to the same networks as before. However, this time instead of mounting a volume using a bind mount that stores our Nginx configuration templates, we are creating a bind mount directly to the docker.sock
socket. This is how the Docker client is able to communicate with the Docker engine, and we are effectively giving this Docker container access to the underlying Docker engine that it is running on. This introduces a security concern, which we will address later in this lesson.
The only other change is the environments for the two whoami
containers. Notice that each container now contains a VIRTUAL_HOST
and VIRTUAL_PORT
entry, denoting the hostname that the container should use, and the port within the container where the connections should be sent to.
Now, when we bring up this infrastructure using docker compose up -d
, we should be able to do the same test as before to show that the connections for one.local
and two.local
are being directed to the correct containers. As we can see here, it looks like it worked! But, how does nginx-proxy
know how to configure itself?
Basically, the nginx-proxy
software will inspect the configuration of the various containers available in the Docker engine, using the bind-mounted connection directly to the docker.sock
socket. By doing so, it will see containers that have a VIRTUAL_HOST
environment variable, and optionally a VIRTUAL_PORT
variable, and it will configure a reverse proxy to that container using the hostname and port specified. So, if we add a third container to our Docker infrastructure, as long as it has a VIRTUAL_HOST
environment variable set, nginx-proxy
will automatically see it as soon as it starts, and it will set up a new reverse proxy to that container. This is very handy, as it allows us to start and stop containers as needed, and a reverse proxy will always be available without any additional configuration needed!
Unfortunately, as I mentioned earlier, there is a major security concern introduced when we allow a Docker container to have access to the docker.sock
socket on the host system. As we already saw, this allows the software running in the Docker container to access the underlying Docker engine, which is great since it can basically perform all of the configuration we need automatically, and it can even detect when new containers are started.
However, there is nothing at all preventing the software in that container, or a malicious actor who has gained access inside of the container, from performing ANY other actions on the Docker engine. So, anyone with access inside of that container could start new containers, stop running containers, change the configuration of an existing container, and so much more.
In addition, in most installations the Docker engine runs as the root
user of the host system! This means that anyone who has access to the Docker engine and armed with an unpatched flaw in Docker could easily gain root access to the entire host system! In short, if you have root
access in a container that has access to the Docker socket, you can effectively end up with root
access on the host system.
Obviously this is a very major security concern, and one that should definitely be taken into account when using Docker containers that need access to the underlying Docker engine. Thankfully, there are a few mitigation methods we can follow. For instance, we can protect access to the Docker socket with a TLS certificate, which prevents unauthorized access by any Docker clients who don’t have the certificate. We can also protect access to the Docker socket by running it through a proxy that acts like a firewall, analyzing requests from Docker clients and determining if they are allowed. Finally, in many cases, we may not want to directly expose any container with access to the Docker engine directly to the internet itself, though in this case it would basically nullify the usefulness of Nginx Proxy.
So, it ends up being a bit of a tradeoff. If we want to use tools such as nginx-proxy
in our environments, we’ll have to analyze the risk that is presented by having the Docker engine available to that container, and either take additional precautions or add additional monitoring to the system.
A third option for setting up a reverse proxy would be to use a Docker image designed specifically for that purpose. There are many of them available, but the most popular at this point is the Traefik Proxy server. Traefik Proxy is very similar to Nginx Proxy, since it also will examine the underlying Docker engine to configure reverse proxies. However, it includes many additional features, such as the ability to proxy any TCP request, introduce middleware, and automatically handle requesting TLS certificates from a provider like Let’s Encrypt.
The Docker Compose file shown here is a minimal setup for the Traefik Proxy server. In the proxy
container, we see that we are customizing the startup command by providing two options, --api.insecure=true
and --providers.docker
. This will allow us to access the Traefik Proxy dashboard, and tells Traefik to look at Docker for information about proxies to be created. We’re also mapping two ports to the Traefik proxy - first we are connecting port 8080 on the host to port 80 on the proxy, which will handle HTTP traffic. In addition, we are connecting port 8081 on the host to port 8080 inside of the container, which is the port where we can access the Traefik Proxy dashboard. We’ll take a look at that a bit later.
Then, we set up two whoami
images just like before. However, this time, instead of setting environment variables, we add a couple of labels to the containers. These labels are used to tell Traefik Proxy what hostname and port should be used for the reverse proxy, just like we saw earlier in our configuration for Nginx Proxy.
So, just like before, we can use docker compose up -d
to start these containers. Once they are up and running, we can once again use docker ps
to see the running containers, and we can use curl
to verify that the proxy is working and connecting us to the correct containers.
To explore the configuration of our Traefik Proxy, we can also load the dashboard. To do this, we’ll simply open a web browser and visit http://localhost:8001. This slide shows a screenshot of what it may look like for our current configuration.
Just like with Nginx Proxy, Traefik will automatically add proxy routes for any new containers that have the correct labels, and we can see them update here on this dashboard in real time. Of course, right now this dashboard is completely unsecured and totally open to anyone who has access to our system, so in practice we’ll probably want to secure this using some form of authentication. The Traefik Proxy documentation includes information for how to accomplish that.
So, in summary, setting up a reverse proxy is a great way to allow users from the outside internet to have access to multiple servers running on the same Docker host. In effect, we are able to easily duplicate the features of Virtual Hosts in traditional webservers, but instead of loading files from a different directory, we are directing users to a different Docker container inside of our infrastructure.
We looked at three reverse proxies - Nginx itself, Nginx Proxy, and Traefik Proxy. Of course, there are many others out there that we can choose from - these are just three that are pretty well known and used often today.
Finally, it is worth noting that a reverse proxy such as this is generally only needed for Docker containers hosted on a single host. If we choose to use orchestration platforms such as Kubernetes, much of this is handled by the platform itself. So, in the next lesson, we’ll explore Kubernetes a bit and see how it differs from what we’ve seen so far in Docker.
Instead of creating and recording my own video overview of Kubernetes, I decided it would be best to just include an existing video that does an excellent job of covering that topic. My video would have been very similar, but probably with fewer graphics and a bit more condensed.
I encourage you to watch the video above through at least the “Kubernetes Architecture” portion (around 35 minutes) to get a good feel for how Kubernetes builds upon what we’ve done in Docker and Docker Compose. If you want to get some hands-on experience with Kubernetes, you can continue in the video to the tutorial using Minikube.
If you know of a better introduction to Kubernetes video that you think would fit here, feel free to let me know. I may consider including it in a future semester.
You aren’t required to use Kubernetes for any of the labs in this course, but if you are comfortable with it you are welcome to give it a try. That said, if you need help debugging it beyond my skill, I may ask you to switch back to Docker and Docker Compose instead of pushing ahead with Kubernetes. –Russ