Note

TODO As of 2020, both KSIS and HRIS have moved to the cloud, leaving very few K-State resources hosted directly on campus. This video will be updated in future semesters to reflect that change. -Russ

Resources

Video Transcript

To begin this module, let’s take a few minutes to discuss its major topic, the cloud. At this point, some of you may be wondering what exactly I’m talking about when I refer to the cloud. My hope is that, by the end of this video, you’ll have a much clearer mental picture of exactly what the cloud is and how it is related to all of the things we’ve been working with in this class so far.

Without giving too much of the answer away, I always like to refer to this comic strip from XKCD when defining the cloud. Take a minute to read it and ponder what it says before continuing with this video.

To start, one way you can think of cloud computing is the applications and resources that you are able to access in the cloud. It could be an email address, social media website, finance tracking application, or even a larger scale computing and storage resource. All of those could be part of the cloud. To access them, we use the devices available to us, such as our computers, laptops, tablets, phones, but even our TVs, game consoles, and media players all access and use the cloud.

One of the core concepts underlying the cloud is the centralization of resources. This is not a new idea by any stretch of the imagination. In fact, many comparisons have been made between the growth of large electric power plants and the growth of the cloud. Here, you can see both of those worlds colliding, as this picture shows Amazon’s founder, Jeff Bezos, standing in front of an old electric generator from the 1890s at a beer brewery in Belgium.

In the early days of electricity, if a company wanted to use electricity in their factory, they usually had to purchase a power generator such as this one and produce their own. Companies really liked this approach, as they could be completely in control of their costs and production, and didn’t have to worry about failures and problems that they couldn’t resolve themselves.

Generating electricity is also a very inefficient process, but it can be made much more efficient by scaling up. A single company may not use all of the electricity they generate all the time, and any issues with the generator could cause the whole operation to slow down until it is resolved. But, if many companies could band together and share their electrical systems, the process can be much more efficient and the power supply more secure than each company could provide on their own.

It was during this time that several companies were formed to do just that: provide a large scale power generation and distribution network. Many companies were initially hesitant to give up their generators, fearing that they would be at the mercy of someone else for the power to run their operations, and that such a large scale system wasn’t sustainable in the long run. However, their fears proved to be false, and within a few decades nearly all of them had switched to the growing power grid.

For K-State, most of our electricity comes from Jeffrey Energy Center near Wamego, KS. It is capable of generating over 2 gigawatts of power, enough to power over one million homes. For most of us, we don’t even take the time to think about where our electricity comes from, we just take it for granted that it will be there when we need it, and that there is more than enough of it to go around.

The cloud was developed for many of those same reasons. This story is detailed in depth in the book “The Big Switch” by Nicholas Carr. I encourage you to check it out if you are interested in the history of the cloud.

The concept of a cloud got its start all the way back in the 1950s with the introduction of time sharing on mainframe computers. This allowed many users to share the same computer, with each running her or his programs when the others were busy writing or debugging. Later on, many networking diagrams started to use a cloud symbol to denote unknown networks, leading to the name “the cloud” referring to any unknown resource on the internet.

In the 1990s, large businesses started connecting their branch offices with virtual private networks, or VPNs. In this way, a branch office in New York could access shared files and resources from the main office in California. Over time, the physical location of the resources became less and less important - if it could be in California, why couldn’t it just be anywhere? By this point, people were ready to adopt the cloud into their lives.

The explosion in popularity came with the rise of Web 2.0 after the dot-com bubble burst, but also in 2006 with the introduction of Amazon’s Elastic Compute Cloud, or EC2. Originally, companies who wanted to have a website or process large amounts of data would purchase large computing systems and place them directly in their own offices. They would also have to hire staff to manage and maintain those systems, as well as deal with issues such as power and cooling. Over time, they also started to realize that they may not be making efficient use of those systems, as many of them would remain idle during slower business periods. At the same time, larger companies with idle computing resources realized they could sell those resources to others in order to recoup some of their costs, and possibly even make a nice profit from it. Other companies realized that they could make money by providing some of those resources as a service to others, such as web hosting and email.

Amazon was one such company. Since they were primarily in the retail business at that time, they had to build a system capable of handling the demands of the busy retail seasons. However, most of the time those resources were idle, and they simply cost the company money to maintain them. So, Amazon decided to sell access to those computing resources to other companies for a small fee, letting them use the servers that Amazon didn’t need for most of the year. They stumbled into a gold mine. This graph shows the total bandwidth used by Amazon’s global websites, and then the bandwidth for Amazon Web Services, part of their cloud computing offerings. Less than a year after it launched, Amazon was using more bandwidth selling cloud resources to others than their own website used, typically one of the top 10 websites in the world. Some people have referred to this graph as the “hockey stick” graph, showing the steep incline and growth of the cloud. Other companies quickly followed suit, and the cloud as we know it today was born.

For many cloud systems today, we use an architecture known as a “service-oriented” architecture. In essence, a cloud company could build their system out of a variety of services, either hosted by themselves or another cloud provider. For example, a social media site might have their own web application code, but the website itself is hosted by Amazon Web Services, and the data is stored in Google’s Firebase. In addition, their authentication would be handled by many different third-parties through OAuth, and even their code and testing is handled via GitHub, Travis and Jenkins. By combining all of those services together, a company can thrive without ever owning even a single piece of hardware. Many start-ups today are doing just this in order to quickly grow and keep up with new technologies.

According to the National Institute of Standards and Technology, or NIST, there are a few primary characteristics that must be met in order for a computer system to be considered a “cloud” system. First, it must have on-demand, self-service for customers. That means that any customer can request resources as needed on their own, without any direct interaction with the hosting company. Secondly, that resource must be available broadly over the internet, not just within a smaller subnetwork. In addition, the system should take advantage of resource pooling, meaning that several users can be assigned to the same system, allowing for better scaling of those resources. On top of that, systems in the cloud should allow for rapid elasticity, allowing users to scale up and down as needed, sometimes at a moment’s notice. Finally, since users may be scaling up and down often, the service should be sold on a measured basis, meaning that users only pay for the resources they use.

Beyond that, there are a few other reasons that cloud computing is a very important resource for system administrators today. First, cloud computing resources are location independent. They could be located anywhere in the world, and as long as they are connected to the internet, they can be accessed from anywhere. Along with resource pooling, most cloud providers practice multi-tenancy, meaning that multiple users can share the same server. Since most users won’t use all of a server’s computing resources all the time, it is much cheaper to assign several different users to the same physical system in order to reduce costs. Also, in many ways a cloud system can be more reliable than a self-hosted one. For cloud systems, it is much simpler for a larger organization to shift resources around to avoid hardware failures, whereas a smaller organization may not have the budget to maintain a full backup system. We’ve already discussed the scalability and elasticity, but for many organizations who are hoping to grow quickly, this is a very important factor in their decision to host resources in the cloud. Finally, no discussion of the cloud would be complete without talking about security. In some ways, the cloud could be more secure, as cloud providers typically have entire teams dedicated to security, and they are able to easily keep up with the latest threats and software updates. However, by putting a system on the cloud, it could become a much easier target for hackers as well.

One of the best examples demonstrating the power of the cloud is the story of Animoto. Animoto is a website that specializes in creating video montages from photos and other sources. In 2008, their service exploded in popularity on Facebook, going from 25,000 users to 250,000 users in three days. Thankfully, they had already configured their system to use the automatic scaling features of AWS, so they were able to handle the increased load. At the peak, they were configuring 40 new cloud instances for their render farm each minute, which was receiving more than 450 render requests per minute. Each render operation would take around 10 minutes to complete. Compare that to a traditional enterprise: there is no way a traditional company would be able to purchase and configure 40 machines a minute, and even if it was possible, the costs would be astronomical. For Animoto, the costs scaled pretty much linearly with their user base, so they were able to continue providing their service without worrying about the costs. Pretty cool, right?

When dealing with the cloud, there are many different service models. Some of the most common are Software as a Service, Infrastructure as a Service, and Platform as a Service. Many times you’ll see these as initialisms such as SaaS, IaaS, and PaaS, respectively. This diagram does an excellent job of showing the difference between these models. First, if you decide to host everything yourself, you have full control and management of the system. It is the most work, but provides the most flexibility. Moving to Infrastructure as a Service, you use computing resources provided by a cloud provider, but you are still responsible for configuring and deploying your software on those services. AWS and DigitalOcean are great examples of Infrastructure as a Service. Platform as a Service is a bit further into the cloud, where you just provide the application and data, but the rest of the underlying system is managed by the cloud provider. Platforms such as Salesforce and Heroku are great examples here. Finally, Software as a Service is the situation where the software itself is managed by the cloud provider. Facebook, GMail, and Mint are all great examples of Software as a Service - you are simply using their software in the cloud.

In the past, there was a great analogy comparing these models to the different ways you could acquire pizza. You could make it at home, do take and bake, order delivery, or just go out and eat pizza at your favorite pizza restaurant. However, a few commentators recently pointed out a flaw in this model. Can you find it? Look at the Infrastructure as a Service model, for take and bake pizza. In this instance, you are getting pizza from a vendor, but then providing all of the hardware yourself. Isn’t that just the opposite of how the cloud is supposed to work?

They propose this different model for looking at it. Instead of getting the pizza and taking it home, what if you could make your own pizza, then use their oven to bake it? It in this way, you maintain the ultimate level of customizability that many organizations want when they move to the cloud, without having the hassle of dealing with maintaining the hardware and paying the utility bills associated with it. That’s really what the cloud is about - you get to make the decisions about the parts you care about most, such as the toppings, but you don’t have to deal with the technical details of setting up the ovens and baking the pizza itself.

Along with the different service models for the cloud, there are a few different deployment models as well. For example, you might have an instance of a cloud resource that is private to an organization. While this may not meet the NIST definition for a cloud resource, most users in that organization really won’t know the difference. A great example of this is Beocat here at K-State. It is available as a free resource to anyone who wants to use it, but in general they don’t have to deal with setting up or maintaining the hardware. There are, of course, public clouds as well, such as AWS, DigitalOcean, and many of the examples we’ve discussed so far. Finally, there are also hybrid clouds, where some resource are on-site and others are in the cloud, and they are seamlessly connected. K-State itself is a really great example of this. Some online resources are stored on campus, such as KSIS and HRIS, while others are part of the public cloud such as Canvas and Webmail. All together, they make K-State’s online resources into a hybrid cloud that students and faculty can use.

Of course, moving an organization to the cloud isn’t simple, and there are many things to be concerned about when looking at the cloud as a possible part of your organization’s IT infrastructure. For example, cloud providers may have insecure interfaces and APIs, allowing malicious users to access or modify your cloud resources without your knowledge. You could also have data loss or leakage if your cloud systems are configured incorrectly. In addition, if the service provider experiences a major hardware failure, you might be unable to recover for several days while they resolve the problem. For example, several years ago K-State’s email system experienced just such a hardware failure, and it was nearly a week before full email access was restored across campus. Unfortunately, this happened about two weeks before finals, so it was a very stressful time for everyone involved.

There are also many security concerns. One of those is the possibility of a side-channel attack. In essence, this involves a malicious person who is using the same cloud provider as you. Once they set up their cloud system, they configure it in such a way as to scan nearby hardware and network connections, trying to get access to sensitive data from inside the cloud provider’s network itself. The recent Spectre and Meltdown CPU vulnerabilities are two great examples of side-channel attacks.

Also, if you are storing data on a cloud provider’s hardware, you may have to deal with legal data ownership issues as well. For example, if the data from a United States-based customer is stored on a server in the European Union, which data privacy laws apply? What if the data is encrypted before it is sent to the EU? What if the company is based on Japan? Beyond that, if there is a problem, you won’t have physical control of the servers or the data. If your cloud hosting provider is sold or goes out of business, you may not even be able to access your data directly. Finally, since cloud providers host many different organizations on the same hardware, you might become a bigger target for hackers. It is much more likely for a hacker to attack AWS then a small organization. So, there are many security concerns to consider when moving to the cloud.

In addition, there are a few roadblocks that may prevent your organization from being able to fully embrace the cloud. You should carefully consider each of these as you are evaluating the cloud and how useful it would be for you. For example, does the cloud provide the appropriate architecture for your application? Are you able to integrate the cloud with your existing resources? Will the change require lots of work to retrain your employees and reshape business practices to take full advantage of the cloud? I highly encourage you to read some of the linked resources below this video to find great discussions of the cloud and how it fits into an organization’s IT infrastructure.

Regardless, many organizations are currently moving toward the cloud, and the rate of adoption is increasing each year. This infographic gives some of the predictions and trends for the cloud from 2015. For me, the big takeaways from this data is that companies are devoting more and more of their IT budgets toward the cloud, but they are hoping to turn that into cost-savings down the road. At the same time, their biggest concern moving to the cloud is finding people with enough experience to manage it properly while avoiding some of the security pitfalls along the way. So, learning how to work with the cloud will help you build a very valuable skill in the IT workplace of the future.

So, back to the question from the beginning of this video: what is the cloud? According to Esteban Kolsky, a cloud marketing consultant, the cloud is really a term for many things. In one way, the cloud is the internet itself, and all of the resources available on it. From Amazon to Yelp and everything in between, each of those websites and applications provides a service to us, their consumers. At the same time, the cloud is a delivery model, or a way to get information, applications, and data into the hands of the people who need it. This class itself is being delivered via the cloud, from that point of view at least. Finally, the cloud can also be seen as a computing architecture. For many organizations today, instead of worrying about the details of their physical hardware setup, they can just use “the cloud” as their computing architecture, and discuss the advantages and disadvantages of that model just like any other system.

For me, I like to think of the cloud as just a point of view. From the consumer’s perspective, any system that they don’t have to manage themselves is the cloud. For a system administrator, it’s not so easy. For example, for many of us, we can consider the K-State CS Linux servers as the cloud, as they are always available, online, and we don’t have to manage them. For our system administrator, however, they are his or her systems to manage, and are definitely not the cloud. Similarly, for Amazon, AWS is just another service to manage. So, I think that the cloud, as a term, really just represents your point of view of the system in question and how closely you have to manage it.

What about you? If you have any thoughts or comments on how you’d define the cloud, I encourage you to post them in the course discussion forums.

In the next videos, we’ll dive into how to set up and configure your first cloud resources, using DigitalOcean as our Infrastructure as a Service provider.