URIs and URLs

Before a web request can be made, the browser needs to know where the resource requested can be found. This is the role that a Universal Resource Locator (a URL) plays. A URL is a specific kind of Universal Resource Indicator (URI) that specifies how a specific resource can be retrieved.

Info

URLs and URIs The terms URL and URI are often used interchangeably in practice. However, a URL is a specific subset of URIs that indicate how to retrieve a resource over a network; while a URI identifies a unique resource, it does not necessarily indicate how to retrieve it. For example, a book’s ISBN can be represented as a URI in the form urn:isbn:0130224189. But this URI cannot be put into a browser’s Location to retrieve the associated book.

A URI consists of several parts, according to the definition (elements in brackets are optional):

URI = scheme:[//[userinfo@]host[:port]]path[?query][#fragment]

Let’s break this down into individual parts:

scheme: The scheme refers to the resource is identified and (potentially) accessed. For web development, the primary schemes we deal with are http (hyper-text transfer protocol), https (secure hyper-text transfer protocol), and file (indicating a file opened from the local computer).

userinfo: The userinfo is used to identify a specific user. It consists of a username optionally followed by a colon (:) and password. We will discuss its use in the section on HTTP authentication, but note that this approach is rarely used today, and carries potential security risks.

host: The host consists of either a fully quantified domain name (i.e. google.com, cs.ksu.edu, or gdc.ksu.edu) or an ip address (i.e. 172.217.1.142 or [2607:f8b0:4004:803::200e]). IPv4 addresses must be separated by periods, and IPv6 addresses must be closed in brackets. Additionally, web developers will often use the loopback host, localhost, which refers to the local machine rather than a location on the network.

port: The port refers to the port number on the host machine. If it is not specified (which is typical), the default port for the scheme is assumed: port 80 for HTTP, and port 443 for HTTPS.

path: The path refers to the path to the desired resource on the server. It consists of segments separated by forward slashes (/).

query: The query consists of optional collection of key-value pairs (expressed as key:value), separated by ampersands (&), and preceded by a question mark (?). The query string is used to supply modifiers to the requested resource (for example, applying a filter or searching for a term).

fragment: The fragment is an optional string proceeded by a hashtag (#). It identifies a portion of the resource to retrieve. It is most often used to auto-scroll to a section of an HTML document, and also for navigation in some single-page web applications.

Thus, the URL https://google.com indicates we want to use the secure HTTP scheme to access the server at google.com using its port 443. This should retrieve Google’s main page.

Similarly, the url https://google.com/search?q=HTML will open a Google search result page for the term HTML (Google uses the key q to identify search terms).