Chapter 8

Routing

Are we there yet?

Subsections of Routing

Introduction

Once again, we’ll return to the request-response pattern diagram.

The Request-Response Pattern The Request-Response Pattern

We revisit this diagram because it is so central to how HTTP servers work. At the heart, a server’s primary responsibility is to respond to an incoming request. Thus, in writing a web server, our primary task is to determine what to respond with. With static web servers, the answer is pretty simple - we map the virtual path supplied by the URL to a file path on the file server. But the URL supplied in a request doesn’t have to correspond to any real object on the server - we can create any object we want, and send it back.

In our examples, we’ve built several functions that build specific kinds of responses. Our serveFile() function serves a response representing a static file. Our listDirectory() function generates an index for a directory, or if the directory contained a index.html file, served it instead. And our dynamic servePost() from our blog served a dynamically generated HTML page that drew data from a database.

Each of these functions created a response. But how do we know which of them to use? That is the question that we will be grappling with in this chapter - and the technical term for it is routing.

Request Routing

In web development, routing refers to the process of matching an incoming request with generating the appropriate response. For most web servers (and definitely for Node-based ones), we abstract the process of generating the response into a function. We often call these functions endpoints as their purpose is to serve the response, effectively ending the processing of an incoming request with an appropriate response.

With a Node webserver, endpoint functions typically take in a req (an instance of http.IncomingMessage) and res (an instance of http.ServerResponse) objects. These objects form an abstraction around the HTTP request and response.

Routing in a Node webserver therefore consists of examining the properties of the req object and determining which endpoint function to invoke. Up to this point, we’ve done this logic in a handleRequest() function, in a fairly ad-hoc way. Consider what we might need to do for a dynamically generated blog - we’d need to serve static files (like CSS and JS files), as well as dynamically generated pages for blog posts and the home page. What might our handleRequest() look like in that case?

Let’s assume we have a serveHome() function to serve the home page, a serveFile() function to serve static files, and a servePost() function to serve dynamically generated posts. Determining if the request is for the homepage is easy, as we know the path should just be a forward slash:

    // Determine if the request is for the index page
    if(req.url === '/') return serveHome(req, res);

But what about determining if a request is for a post or a static file? We could use fs.stat to determine if a file exists:

    // Determine if a corresponding file exists 
    fs.stat(path.join("public", req.url), (err, stat) => {
        if(err) {
            // Error reading file, might be a post?
            servePost(req, res);
        }
        else {
            // File exists, serve it 
            serveFile(req, res);
        }
    });

This would work, but it’s a bit ugly. Also, using a filesystem method like fs.stat is potential bottleneck in our application; we’d like to avoid it if at all possible.

If we make all our posts use the url /posts, we could use the query string to determine which post we want to display. This approach is a bit cleaner, though it does mean we need to separate the pathname from our URL:

public handleRequest(req, res) {
    
    // Separate the pathname from the url 
    const pathname = new URL(req.url, "http://localhost").pathname;

    // Determine if the request is for the index page
    if(pathname === '/') return serveHome(req, res);

    // Determine if the request is for a post 
    if(pathname === '/post') return servePost(req, res);

    // Treat all other requests as a file 
    serveFile(req, res);
}

This is a bit cleaner of an approach, but it still has a very ad-hoc feel and requires us to incorporate query strings into some of our URLs.

Resources

As the web transitioned to dynamic pages, the concept of the URL path evolved. With a static server, the path indicates an actual file that exists on the fileserver. But with dynamic pages, the url doesn’t have to correspond to anything real. Consider the case of Placeholder.com , a service that offers image URLs you can use in a site layout to simulate images. For example, the image below is generated by the site by using the URL By using the URL //via.placeholder.com/350x150:

350 x 150 pixel placeholder image 350 x 150 pixel placeholder image

Does Placeholder.com have thousands of images of varying sizes stored on a disk somewhere? No! It generates the image data dynamically based on the request - here it reads the 350x150 and uses it to determine the size of the image needed. It also reads other aspects of the URL and uses those to set other aspects, i.e. the url //via.placeholder.com/350x150/C497FF/FFFFFF?text=Hello generates the image:

Another placeholder image with custom color and text Another placeholder image with custom color and text

Clearly, we’re conveying information to the Placeholder.com server through the url. This is much like passing parameters to a function - it is how we communicate what we want the server to do with our request!

Let’s revisit our blog server. What if we adopted a similar approach there? Our URLs might then look like:

Where we use the entire title as part of the URL! This makes it more human-readable, especially helpful when searching your browser history or scanning bookmarks. But how would we handle this approach in our handleRequest()? You might recognize that we have a reoccurring pattern, which of course suggests we try a RegExp:

public handleRequest(req, res) {
    
    // Separate the pathname from the url 
    const pathname = new URL(req.url, "http://localhost").pathname;

    // Determine if the request is for the index page
    if(pathname === '/') return serveHome(req, res);

    // Determine if the request is for a post 
    if(\$/post/\w+^\g.test(pathname)) return servePost(req, res);

    // Treat all other requests as a file 
    serveFile(req, res);
}

With this regular expression, we’ve effectively built a wildcard into our URL. Any URL with a pattern /posts/[postname] will be treated as a post!

Request Revisted

We’ve seen then how the request conveys information useful to selecting the appropriate endpoint with the path, and also the query string. But there is more information available in our request than just those two tidbits. And we have all of that information available to us for determining the correct response to send. With that in mind, let’s revisit the parts of the request in terms of what kinds of information they can convey.

The URL, Revisited

Now that we’re thinking of the URL as a tool for passing information to the server, let’s reexamine its parts:

A diagram of a URL A diagram of a URL

The protocol is the protocol used for communication, and the host is the domain or IP address of the server. Note that these are not included in Node’s req.url parameter, as these are used to find the right server and talk to it using the right protocol - by the time your Node server is generating the req object, the request has already found your server and is communicating with the protocol.

This is also why when we’ve been parsing req.url with the URL object, we’ve supplied a host and protocol of http://localhost:

var url = new URL(req.url, "http://localhost");

This gives the URL object a placeholder for the missing protocol and host, so that it can parse the supplied URL without error. Just keep in mind that by doing so the url.protocol and url.host properties of url equal this “fake” host and protocol.

The path is the path to the virtual resource requested, accessible in our parsed URL as url.pathname. Traditionally, this corresponded to a file path, which is how we used it in our fileserver examples. But as we have seen with Placeholder.com, it can also be used to convey other information - in that case, the size of the image, its background color, and foreground color.

The query or querystring is a list of key-value pairs, proceeded by the ?. Traditionally this would be used to modify some aspect of the request, such as requesting a particular data format or portion of a dataset. It is also how forms are submitted with a GET request - the form data is encoded into the query string.

The hash or fragment is proceeded by a #, and traditionally indicates an element on the page that the browser should auto-scroll to. This element should have an id attribute that matches the one supplied by the hash. For example, clicking this link: #the-url-revisited will scroll back to the header for this section (note we left the path blank, so the browser assumes we want to stay on this page).

Clearly our URL can clearly convey a lot of data to our server. But that’s not the only information that we can use in deciding how to respond to a request.

Request Method, Revisited

In addition, we also receive the HTTP Method used for the request, i.e. GET, POST, PATCH, PUT, or DELETE. These again map back to the traditional role of a web server as a file server - GET retrieves a resource, POST uploads one, PATCH and PUT modify one, and DELETE removes it. But we can also create, read, update, and destroy virtual resources, i.e. resources generated and stored on the server that are not files. These play an especially important role in RESTful routes, which we’ll discuss soon.

Headers, Revisited

We can also convey additional information for the server to consider as part of the request headers. This is usually used to convey information tangential to the request, like authentication and cookies. We’ll also discuss this a bit further on.

REST

We’ve already seen that with our blog, we could convey which post to display with different URL strategies, i.e.:

And that is just to display posts. What about when we want our blog software to allow the writer to submit new posts? Or edit existing ones? That’s a lot of different URLS we’ll need to keep track of.

Representational State Transfer (REST)

Roy Fielding tackled this issue in Chapter 5 of his Ph.D. dissertation “Architectural Styles and the Design of Network-based Software Architectures.” He recognized that increasingly dynamic web servers were dealing with resources that could be created, updated, read, and destroyed, much like the resources in database systems (not surprisingly, many of these resources were persistently stored in such a database system). These operations are so pervasive in database systems that we have an acronym for them: CRUD.

Roy mapped these CRUD methods to a HTTP URL + Method pattern he called Representational State Transfer (or REST). This pattern provided a well-defined way to express CRUD operations as HTTP requests.

Take our blog posts example. The RESTful routes associated with posts would be:

URL HTTP Method CRUD Operation
posts/ GET Read (all)
posts/:id GET Read (one)
posts/ POST Create
posts/:id POST Update
posts/:id DELETE Delete

The :id corresponds to an actual identifier of a specific resource - most often the id column in the database. REST also accounts for nesting relationships. Consider if we added comments to our posts. Comments would need to correspond to a specific post, so we’d nest the routes:

URL HTTP Method CRUD Operation
posts/:post_id/comments GET Read (all)
posts/:post_id/comments/:comment_id GET Read (one)
posts/:post_id/comments POST Create
posts/:post_id/comments/:comments_id POST Update
posts/:post_id/comments/:comments_id DELETE Delete

Notice that we now have two wildcards for most routes, one corresponding to the post and one to the comment.

If we didn’t want to support an operation, for example, updating comments, we could just omit that route.

REST was so straightforward and corresponded so well to how many web servers were operating, that it quickly became a widely adopted technique in the web world. When you hear of a RESTful API or RESTful routes, we are referring to using this pattern in the URLs of a site.

Routers

Many web development frameworks built upon this concept of routes by supplying a router, and object that would store route patterns and perform the routing operation. One popular Node library express , is at its heart a router. If we were to write our Node blog using Express, the syntax to create our routes would be:

const express = require('express');
var app = express();

// Home page 
app.get('/', serveHome);

// Posts 
app.get('posts/', servePosts);
app.get('posts/:id', servePost);
app.post('posts/', createPost);
app.post('posts/:id', updatePost);
app.delete('posts/:id', deletePost);

// Comments
app.get('posts/:post_id/comments', servePosts);
app.get('posts/:post_id/comments/:id', servePost);
app.post('posts/:post_id/comments', createPost);
app.post('posts/:post_id/comments/:id', updatePost);
app.delete('posts/:post_id/comments/:id', deletePost);

module.exports = app;

The app variable is an instance of the express Application class, which itself is a wrapper around Node’s http.Server class. The Express Application adds (among other features), routing using the route methods app.get(), app.post(), app.put(), and app.delete(). These take routes either in string or regular expression form, and the wildcard values are assigned to an object in req.params. For example, we might write our servePost() method as:

servePost(req, res) {
    var id = req.params.id;
    var post = db.prepare("SELECT * FROM posts WHERE ID = ?", id);
    // TODO: Render the post HTML
}

The parameter name in params is the same as the token after the : in the wildcard.

Routers can greatly simplify the creation of web applications, and provide a clear and concise way to specify routes. For this reason, they form the heart of most dynamic web development frameworks.

APIs

Before we step away from routing, we should take the time to discuss a specific style of web application that goes hand-in-hand with routing - an Application Programming Interface (API). An API is simply a interface for two programs to communicate, to allow one to act as a service for the other. Web APIs are APIs that use hyper-text transfer protocol (http) as the mechanism for this communication. Thus, the client program makes HTTP requests against the API server and receives responses. The Placeholder.com we discussed earlier in this chapter is an example of a web API - one that generates and serves placeholder images.

In fact, an API can serve any kind of resource - binary files, image files, text files, etc. But most serve data in one of a few forms:

  • HTML that is intended to be embedded in a web page, common to advertisement revenue services, embeddable social media feeds, YouTube and other video embedding sites.
  • Data structured as JSON, a very common format for submitting or retrieving any kind of data across the web. For data that will be used by JavaScript running in the browser, it has the advantage of being the native format.
  • Data structured as XML, another common format, XML tends to be slightly larger than the equivalent JSON, but it can have a published and enforced structure.

Some Example Web APIs

There are thousands of Web APIs available for you to work with. Some you might be interested in investigating include:

  • The National Weather Service API - we used this API early in the course to retrieve weather forecast data. As the NWS is supported by tax dollars, this resource is free to use.
  • Census APIs similarly, the U.S. Census Bureau makes much of their data sets available through APIs
  • NASA APIs as does NASA.
  • Google Maps has extensive APIs for maps and geolocation, with a fee structure.
  • Open Street Maps offers an open-source alternative.
  • Facebook has a large collection of APIs for working with their data and navigating social network relationships.
  • GitHub provides an API for developing tools to work with their site.

Web Hooks

A lighter-weight alternative to a full-fledged API is a webhook . A webhook is simply an address a web application is instructed to make a HTTP request against when a specific event happens. For example, you can set your GitHub repository to trigger a webhook when a new commit is made to it. You simply provide the URL it should send a request to, and it will send a request with a payload (based on the event).

For example, the Discord chat platform allows users to create webhook endpoint that will post messages into a channel. The webhook will be the in the form lf a URL that specifies the Discord server and channel that the message should be sent to, as well as some authentication parameters to identify the source of the webhook. You can then configure a webhook in GitHub which sends a message to the URL when a commit is made to GitHub. Now, each time a user pushes new data to GitHub, you’ll be able to see a message in Discord.

A more advanced webhook endpoint can be used to trigger various actions, such as a continuous deployment routine. In that way, you can have a repository automatically deploy each time it is updated, simply through the use of webhooks!

In other words, a webhook is a technique for implementing event listeners using HTTP. One web service (GitHub in the example) provides a way for you to specify an event listener defined in another web application by the route used to communicate with it. This represents a looser coupling than a traditional API; if the request fails for any reason, it will not cause problems for the service that triggered it. It also has the benefit of being event-driven, whereas to check for new data with an API, you would need to poll (make a request to query) the API at regular intervals.

Summary

In this chapter, we exported the idea of routes, a mechanism for mapping a request to the appropriate endpoint function to generate a response. Routes typically consist of both the request URL and the request method. RESTful routes provide a common strategy for implementing CRUD methods for a server-generated resource; any programmer familiar with REST will quickly be able to suss out the appropriate route.

We also explored routers are objects that make routing more manageable to implement. These allow you to define a route and tie it to an endpoint function used to generate the result. Most will also capture wildcards in the route, and supply those to the endpoint function in some form. We briefly looked at the Express framework, a Node framework that adds a router to the vanilla Node http.Server class.

Finally, we discussed using routes to serve other programs instead of users with APIs and WebHooks.