Routing
Are we there yet?
Are we there yet?
Once again, we’ll return to the request-response pattern diagram.
We revisit this diagram because it is so central to how HTTP servers work. At the heart, a server’s primary responsibility is to respond to an incoming request. Thus, in writing a web server, our primary task is to determine what to respond with. With static web servers, the answer is pretty simple - we map the virtual path supplied by the URL to a file path on the file server. But the URL supplied in a request doesn’t have to correspond to any real object on the server - we can create any object we want, and send it back.
In our examples, we’ve built several functions that build specific kinds of responses. Our serveFile()
function serves a response representing a static file. Our listDirectory()
function generates an index for a directory, or if the directory contained a index.html
file, served it instead. And our dynamic servePost()
from our blog served a dynamically generated HTML page that drew data from a database.
Each of these functions created a response. But how do we know which of them to use? That is the question that we will be grappling with in this chapter - and the technical term for it is routing.
In web development, routing refers to the process of matching an incoming request with generating the appropriate response. For most web servers (and definitely for Node-based ones), we abstract the process of generating the response into a function. We often call these functions endpoints as their purpose is to serve the response, effectively ending the processing of an incoming request with an appropriate response.
With a Node webserver, endpoint functions typically take in a req
(an instance of http.IncomingMessage) and res
(an instance of http.ServerResponse) objects. These objects form an abstraction around the HTTP request and response.
Routing in a Node webserver therefore consists of examining the properties of the req
object and determining which endpoint function to invoke. Up to this point, we’ve done this logic in a handleRequest()
function, in a fairly ad-hoc way. Consider what we might need to do for a dynamically generated blog - we’d need to serve static files (like CSS and JS files), as well as dynamically generated pages for blog posts and the home page. What might our handleRequest()
look like in that case?
Let’s assume we have a serveHome()
function to serve the home page, a serveFile()
function to serve static files, and a servePost()
function to serve dynamically generated posts. Determining if the request is for the homepage is easy, as we know the path should just be a forward slash:
// Determine if the request is for the index page
if(req.url === '/') return serveHome(req, res);
But what about determining if a request is for a post or a static file? We could use fs.stat
to determine if a file exists:
// Determine if a corresponding file exists
fs.stat(path.join("public", req.url), (err, stat) => {
if(err) {
// Error reading file, might be a post?
servePost(req, res);
}
else {
// File exists, serve it
serveFile(req, res);
}
});
This would work, but it’s a bit ugly. Also, using a filesystem method like fs.stat
is potential bottleneck in our application; we’d like to avoid it if at all possible.
If we make all our posts use the url /posts
, we could use the query string to determine which post we want to display. This approach is a bit cleaner, though it does mean we need to separate the pathname from our URL:
public handleRequest(req, res) {
// Separate the pathname from the url
const pathname = new URL(req.url, "http://localhost").pathname;
// Determine if the request is for the index page
if(pathname === '/') return serveHome(req, res);
// Determine if the request is for a post
if(pathname === '/post') return servePost(req, res);
// Treat all other requests as a file
serveFile(req, res);
}
This is a bit cleaner of an approach, but it still has a very ad-hoc feel and requires us to incorporate query strings into some of our URLs.
As the web transitioned to dynamic pages, the concept of the URL path evolved. With a static server, the path indicates an actual file that exists on the fileserver. But with dynamic pages, the url doesn’t have to correspond to anything real. Consider the case of Placeholder.com, a service that offers image URLs you can use in a site layout to simulate images. For example, the image below is generated by the site by using the URL By using the URL //via.placeholder.com/350x150
:
Does Placeholder.com have thousands of images of varying sizes stored on a disk somewhere? No! It generates the image data dynamically based on the request - here it reads the 350x150
and uses it to determine the size of the image needed. It also reads other aspects of the URL and uses those to set other aspects, i.e. the url //via.placeholder.com/350x150/C497FF/FFFFFF?text=Hello
generates the image:
Clearly, we’re conveying information to the Placeholder.com server through the url. This is much like passing parameters to a function - it is how we communicate what we want the server to do with our request!
Let’s revisit our blog server. What if we adopted a similar approach there? Our URLs might then look like:
Where we use the entire title as part of the URL! This makes it more human-readable, especially helpful when searching your browser history or scanning bookmarks. But how would we handle this approach in our handleRequest()
? You might recognize that we have a reoccurring pattern, which of course suggests we try a RegExp
:
public handleRequest(req, res) {
// Separate the pathname from the url
const pathname = new URL(req.url, "http://localhost").pathname;
// Determine if the request is for the index page
if(pathname === '/') return serveHome(req, res);
// Determine if the request is for a post
if(\$/post/\w+^\g.test(pathname)) return servePost(req, res);
// Treat all other requests as a file
serveFile(req, res);
}
With this regular expression, we’ve effectively built a wildcard into our URL. Any URL with a pattern /posts/[postname]
will be treated as a post!
We’ve seen then how the request conveys information useful to selecting the appropriate endpoint with the path, and also the query string. But there is more information available in our request than just those two tidbits. And we have all of that information available to us for determining the correct response to send. With that in mind, let’s revisit the parts of the request in terms of what kinds of information they can convey.
Now that we’re thinking of the URL as a tool for passing information to the server, let’s reexamine its parts:
The protocol is the protocol used for communication, and the host is the domain or IP address of the server. Note that these are not included in Node’s req.url
parameter, as these are used to find the right server and talk to it using the right protocol - by the time your Node server is generating the req
object, the request has already found your server and is communicating with the protocol.
This is also why when we’ve been parsing req.url
with the URL object, we’ve supplied a host and protocol of http://localhost
:
var url = new URL(req.url, "http://localhost");
This gives the URL object a placeholder for the missing protocol and host, so that it can parse the supplied URL without error. Just keep in mind that by doing so the url.protocol
and url.host
properties of url
equal this “fake” host and protocol.
The path is the path to the virtual resource requested, accessible in our parsed URL as url.pathname
. Traditionally, this corresponded to a file path, which is how we used it in our fileserver examples. But as we have seen with Placeholder.com, it can also be used to convey other information - in that case, the size of the image, its background color, and foreground color.
The query or querystring is a list of key-value pairs, proceeded by the ?
. Traditionally this would be used to modify some aspect of the request, such as requesting a particular data format or portion of a dataset. It is also how forms are submitted with a GET request - the form data is encoded into the query string.
The hash or fragment is proceeded by a #
, and traditionally indicates an element on the page that the browser should auto-scroll to. This element should have an id
attribute that matches the one supplied by the hash. For example, clicking this link: #the-url-revisited will scroll back to the header for this section (note we left the path blank, so the browser assumes we want to stay on this page).
Clearly our URL can clearly convey a lot of data to our server. But that’s not the only information that we can use in deciding how to respond to a request.
In addition, we also receive the HTTP Method used for the request, i.e. GET, POST, PATCH, PUT, or DELETE. These again map back to the traditional role of a web server as a file server - GET retrieves a resource, POST uploads one, PATCH and PUT modify one, and DELETE removes it. But we can also create, read, update, and destroy virtual resources, i.e. resources generated and stored on the server that are not files. These play an especially important role in RESTful routes, which we’ll discuss soon.
We can also convey additional information for the server to consider as part of the request headers. This is usually used to convey information tangential to the request, like authentication and cookies. We’ll also discuss this a bit further on.
We’ve already seen that with our blog, we could convey which post to display with different URL strategies, i.e.:
And that is just to display posts. What about when we want our blog software to allow the writer to submit new posts? Or edit existing ones? That’s a lot of different URLS we’ll need to keep track of.
Roy Fielding tackled this issue in Chapter 5 of his Ph.D. dissertation “Architectural Styles and the Design of Network-based Software Architectures.” He recognized that increasingly dynamic web servers were dealing with resources that could be created, updated, read, and destroyed, much like the resources in database systems (not surprisingly, many of these resources were persistently stored in such a database system). These operations are so pervasive in database systems that we have an acronym for them: CRUD.
Roy mapped these CRUD methods to a HTTP URL + Method pattern he called Representational State Transfer (or REST). This pattern provided a well-defined way to express CRUD operations as HTTP requests.
Take our blog posts example. The RESTful routes associated with posts would be:
URL | HTTP Method | CRUD Operation |
---|---|---|
posts/ | GET | Read (all) |
posts/:id | GET | Read (one) |
posts/ | POST | Create |
posts/:id | POST | Update |
posts/:id | DELETE | Delete |
The :id
corresponds to an actual identifier of a specific resource - most often the id
column in the database. REST also accounts for nesting relationships. Consider if we added comments to our posts. Comments would need to correspond to a specific post, so we’d nest the routes:
URL | HTTP Method | CRUD Operation |
---|---|---|
posts/:post_id/comments | GET | Read (all) |
posts/:post_id/comments/:comment_id | GET | Read (one) |
posts/:post_id/comments | POST | Create |
posts/:post_id/comments/:comments_id | POST | Update |
posts/:post_id/comments/:comments_id | DELETE | Delete |
Notice that we now have two wildcards for most routes, one corresponding to the post
and one to the comment
.
If we didn’t want to support an operation, for example, updating comments, we could just omit that route.
REST was so straightforward and corresponded so well to how many web servers were operating, that it quickly became a widely adopted technique in the web world. When you hear of a RESTful API or RESTful routes, we are referring to using this pattern in the URLs of a site.
Many web development frameworks built upon this concept of routes by supplying a router, and object that would store route patterns and perform the routing operation. One popular Node library express, is at its heart a router. If we were to write our Node blog using Express, the syntax to create our routes would be:
const express = require('express');
var app = express();
// Home page
app.get('/', serveHome);
// Posts
app.get('posts/', servePosts);
app.get('posts/:id', servePost);
app.post('posts/', createPost);
app.post('posts/:id', updatePost);
app.delete('posts/:id', deletePost);
// Comments
app.get('posts/:post_id/comments', servePosts);
app.get('posts/:post_id/comments/:id', servePost);
app.post('posts/:post_id/comments', createPost);
app.post('posts/:post_id/comments/:id', updatePost);
app.delete('posts/:post_id/comments/:id', deletePost);
module.exports = app;
The app
variable is an instance of the express Application class, which itself is a wrapper around Node’s http.Server class. The Express Application
adds (among other features), routing using the route methods app.get()
, app.post()
, app.put()
, and app.delete()
. These take routes either in string or regular expression form, and the wildcard values are assigned to an object in req.params
. For example, we might write our servePost()
method as:
servePost(req, res) {
var id = req.params.id;
var post = db.prepare("SELECT * FROM posts WHERE ID = ?", id);
// TODO: Render the post HTML
}
The parameter name in params is the same as the token after the :
in the wildcard.
Routers can greatly simplify the creation of web applications, and provide a clear and concise way to specify routes. For this reason, they form the heart of most dynamic web development frameworks.
Before we step away from routing, we should take the time to discuss a specific style of web application that goes hand-in-hand with routing - an Application Programming Interface (API). An API is simply a interface for two programs to communicate, to allow one to act as a service for the other. Web APIs are APIs that use hyper-text transfer protocol (http) as the mechanism for this communication. Thus, the client program makes HTTP requests against the API server and receives responses. The Placeholder.com we discussed earlier in this chapter is an example of a web API - one that generates and serves placeholder images.
In fact, an API can serve any kind of resource - binary files, image files, text files, etc. But most serve data in one of a few forms:
There are thousands of Web APIs available for you to work with. Some you might be interested in investigating include:
A lighter-weight alternative to a full-fledged API is a webhook. A webhook is simply an address a web application is instructed to make a HTTP request against when a specific event happens. For example, you can set your GitHub repository to trigger a webhook when a new commit is made to it. You simply provide the URL it should send a request to, and it will send a request with a payload (based on the event).
For example, the Discord chat platform allows users to create webhook endpoint that will post messages into a channel. The webhook will be the in the form lf a URL that specifies the Discord server and channel that the message should be sent to, as well as some authentication parameters to identify the source of the webhook. You can then configure a webhook in GitHub which sends a message to the URL when a commit is made to GitHub. Now, each time a user pushes new data to GitHub, you’ll be able to see a message in Discord.
A more advanced webhook endpoint can be used to trigger various actions, such as a continuous deployment routine. In that way, you can have a repository automatically deploy each time it is updated, simply through the use of webhooks!
In other words, a webhook is a technique for implementing event listeners using HTTP. One web service (GitHub in the example) provides a way for you to specify an event listener defined in another web application by the route used to communicate with it. This represents a looser coupling than a traditional API; if the request fails for any reason, it will not cause problems for the service that triggered it. It also has the benefit of being event-driven, whereas to check for new data with an API, you would need to poll (make a request to query) the API at regular intervals.
In this chapter, we exported the idea of routes, a mechanism for mapping a request to the appropriate endpoint function to generate a response. Routes typically consist of both the request URL and the request method. RESTful routes provide a common strategy for implementing CRUD methods for a server-generated resource; any programmer familiar with REST will quickly be able to suss out the appropriate route.
We also explored routers are objects that make routing more manageable to implement. These allow you to define a route and tie it to an endpoint function used to generate the result. Most will also capture wildcards in the route, and supply those to the endpoint function in some form. We briefly looked at the Express framework, a Node framework that adds a router to the vanilla Node http.Server class.
Finally, we discussed using routes to serve other programs instead of users with APIs and WebHooks.