HTML was designed as a stateless protocol. This means that there is no expectation for the server to keep track of prior requests made to it. Each incoming HTTP request is effectively treated as if it is the first request ever made to the server.
This was an important consideration for making the web possible. Consider what would happen if our server needed to keep track of what every visitor did on the site - especially when you have thousands of unique visitors every second? You would need a lot of memory, and a mechanism to tell which user is which. That’s a lot of extra requirements to getting a web server working.
But at the same time, you probably recognize that many of the websites you use every day must be doing just that. If you use Office 365 or Google Docs, they manage to store your documents, and not serve you someone else’s when you log in. When you visit an online store and place products in your shopping basket, they remain there as you navigate around the site, loading new pages. So how are these websites providing state?
The answer that was adopted as a W3C standard -
RFC 2695 is cookies. Cookies are nothing more than text that is sent along with HTTP Requests and Responses. A server can ask a client to store a cookie with a
Set-Cookie header. From that point on, any request the client makes of that server will include a
Cookie header with the cookie information.
It’s kind of like if your forgetful teacher gives you a note to show him the next time you come to office hours, to help remind him of what you talked about previously. This way, the server doesn’t have to track state for any visitor - it can ask the visitor’s client (their browser) to do it for them.
What is a Cookie?
Let’s dig deeper into this idea and implementation of cookies.
Much like a web request or response, a cookie is nothing more than a stream of data. It gets passed between client and server through their headers. Specifically, the
Set-Cookie header sends the cookie to the client, and the client will automatically return it in subsequent requests using the
Say we want to establish a session id, (SID), for every unique visitor to our site. We would have our server send the SID in a
The browser would store this value, and on subsequent requests to the same domain, include a corresponding
Domain cookie attribute is specified (see below), a cookie is only sent to the domain that it originated from. It will not be sent to subdomains, i.e. a cookie that came from
ksu.edu will not be sent to the server with hostname
Structure of a Cookie
As you might have guessed from the example, the string of a cookie is a set of key/value pairs using the equals sign (
=) to assign the value to the key. But we can also include multiple cookie pairs using a semicolon (
;) to separate the cookie pairs, i.e.:
Set-Cookie: SID=234; lang=en-US
Both cookie pairs are sent back in a
Cookie: SID=234; lang=en-US
We can chain multiple pairs of cookie pairs. After the last cookie pair, we need another semicolon, then any cookie attributes, also separated by semicolons. For example:
Set-Cookie: SID=234; lang=en-US; EXPIRES=Wed, 04 April 2019 011:30:00 CST
adds a cookie attribute specifying an expiration date for the cookie. The legal cookie attributes are laid out in RFC6265, but we’ll reprint them here.
There are a limited number of defined cookie attributes, and each changes the nature of the cookie and how a browser will work with it in an important way. Let’s examine each in turn:
Expires As you might expect, this sets an expiration date for the cookie. After the date, the browser will throw away the cookie, removing it from its internal cache. It will also stop sending the cookie with requests. If you need to erase a value in a cookie you’ve already sent to a user, you can respond to their next request with a new
Set-Cookie header with the
Expires value set in the past. To avoid issues with time zones, the date supplied must conform to the
HTTP-Date format set out in
Max-Age also sets an expiration date, but in terms of a number of seconds from the current time. A value of 0 or less (negative) will remove the cookie from the browser’s cache, and stop sending the cookie in future requests. IE 7, 8, and 9 do not support the
Max-Age attribute; for all other browsers,
Max-Age will take precedence over
Domain As described above, cookies are by default sent only to the domain from which they originated, and no subdomains. If the domain is explicitly set with the
Domain attribute, subdomains are included.
Path The value of this attribute further limits what requests a cookie will be sent to. If the
Path is specified, only requests whose url matches the value will be sent. This does include subpaths, i.e.
Path=/documents will match the urls
Secure Cookies including this attribute (it does not need a value) will only be sent by the browser with HTTPS requests, not HTTP requests. Note that sending the cookie over HTTPS does not encrypt the cookie data - as a header, it is sent in the clear for both HTTP and HTTPS.
Document.cookie property. Setting this attribute (it does not need a value) tells the browser not to make it accessible.
SameSite (in Draft) This new cookie attribute is intended to help a server assert that its cookies should not be sent with any cross-site requests originating from this page (i.e.
<img> tags, etc. whose source is a different site, or AJAX requests against other sites). This is intended to help combat cross-site scripting attacks. As you might expect, this is only supported in the newest browser versions.
Cookies and Security
As with all web technologies, we need to consider the security implications. Cookies are received and returned by the client - and we know that anything the client is responsible for can be manipulated by an adversarial agent. Let’s think through the implications.
All cookies received by a browser and cached there. How long they are stored depends on the
Expires attributes. Browsers are told to delete cookies older than their
Max-Age, or whose
Expires duration has passed. If neither of these are set, the cookie is considered a session cookie - it is deleted when the browser is closed (this is why you should always close a browser you use on a public computer).
While session cookies only exist in memory, cookies with expiration dates may be persisted as text files in the browser’s data directory. If this is the case, it is a simple matter for a knowledgeable adversary with access to the computer to retrieve them. Also remember access doesn’t necessarily mean physical access - malware running on your computer can effectively copy this data as well.
Moreover, cookies are just text - say for example you add the identity of your user with a cookie
Set-Cookie: user-id=735. An adversary could sign up for a legitimate account with your application, and then manipulate their cookie to return
Cookie: user-id:1. Since user ids are often incrementally generated primary keys in a database, it’s a good assumption that one or two low-numbered ones will belong to administrator accounts - and this adversary has just impersonated one!
So if we need cookies to add state to our web applications, but cookies (and therefore that state) are subject to interception and manipulation - how can we protect our users and our site?
Strategies for Making Cookies Safe(er)
Some common strategies are:
- Use HTTPS
- Store as little information in a cookie as possible
- Encrypt the cookie’s values
- Sign the cookie
- Use Short Lifespans
Using Secure HTTP helps prevent cookies from being intercepted during transit, as the request body and headers are encrypted before they move across the transport layer of the Internet. This does not protect cookies once they arrive at a user’s computer, however. Still, it is a great first step.
Storing as Little in the Cookie as Possible
A good general strategy is to store as little of importance in the cookie as possible. This also helps keep the size of a site’s cookies down, which means less traffic across the network, and less to store on a user’s computer. The RFC suggests browsers support a minimum size of 4096 bytes per cookie - but this is not a requirement.
But where do we store the session information, if not in the cookie? This is the subject of the next section, https://textbooks.cs.ksu.edu/cis526/06-dynamic-web-servers/09-sessions/. This strategy usually entails storing nothing more than a session identifier in the cookie, and storing the actual data that corresponds to that session identifier on the server. Effectively, we make our server statefull instead of stateless.
Encrypting Cookie Values
Another common strategy is encrypting the value of a cookie using a reversible encryption algorithm before sending it to the client. This kind of encryption technique is a mathematical function that can easily be reversed, provided you know the key used during the encryption process. This key should be stored in the server and never shared, only used internally to encrypt outgoing cookies and decrypt incoming ones.
This strategy only works if you don’t need to know the cookie’s value on the client (as the client will not have the key). Also, be wary of sending any details of the encryption with the encrypted data - anything that helps identify the encryption strategy or provides information like initialization vectors can help an adversary break the encryption. Also, check what algorithms are considered strong enough by current standards; this is always a moving target.
Signing Cookie Values
A third common technique is signing the cookie with a hash. A hash is the result of a one-way cryptographic function - it is easy to perform but very difficult to reverse. How it works is you hash the value of the cookie, and then send both the original data’s value and the hashed value as cookies. When you receive the cookie back from the client, you re-hash the value found in the cookie, and compare it to the hashed value in the cookie. If the two values are different, then the cookie has been modified.
Use Short Lifespans
The above strategies only keep adversaries from modifying cookies. There is still the danger of valid cookies being captured and used by an adversary - a technique known as session hijacking. Using HTTPS can prevent session hijacking from within the network, but will not protect against cookies lifted from the client’s computer.
Making sessions expire quickly can help mitigate some of the risk, especially when the adversary is using physical access to the computer to retrieve cookies - you wouldn’t expect them to do this while the user is still using the computer.
But non-physical access (such as a compromised computer) can lift a cookie while it is in use. Regenerating a session identifier (assuming you are using the first strategy) on each login can help, especially if you prompt users to log in before allowing especially risky actions. Adding some additional tracking information like the originating IP address to the session and prompting the user to validate a session by logging back in when this changes can also help.
Be aware that these strategies can also be very frustrating for users who are prompted to sign in repeatedly. As always, you need to balance security concerns with user satisfaction for the kind of activities your application supports.
Combinations of the Above
Clearly these strategies can be used in combination, and doing so will provide a better degree of safety for your clients. But the can also increase computation time, network traffic, and user frustration within your web application. Deciding just how much security your application needs is an important part of the design process.