Chapter 3

Documentation

Coding for Humans

Introduction

As part of the strategy for tackling the challenges of the software crisis, good programming practice came to include writing clear documentation to support both the end-users who will utilize your programs, as well as other programmers (and yourself) in understanding what that code is doing so that it is easy to maintain and improve.

Key Terms

Some key terms to learn in this chapter are:

User documentation
Developer documentation
Markdown
XML
Autodoc tools
Intellisense

Key Skills

The key skill to learn in this chapter is how to use C# XML code comments to document the C# code you write.

Documentation

Documentation refers to the written materials that accompany program code. Documentation plays multiple, and often critical roles. Broadly speaking, we split documentation into two categories based on the intended audience:

User Documentation is meant for the end-users of the software
Developer Documentation is meant for the developers of the software

As you might expect, the goals for these two styles of documentation are very different. User documentation instructs the user on how to use the software. Developer documentation helps orient the developer so that they can effectively create, maintain, and expand the software.

Historically, documentation was printed separately from the software. This was largely due to the limited memory available on most systems. For example, the EPIC software we discussed had two publications associated with it: a User Manual, which explains how to use it, and Model Documentation which presents the mathematic models that programmers adapted to create the software. There are a few very obvious downsides to printed manuals: they take substantial resources to produce and update, and they are easily misplaced.

User Documentation

As memory became more accessible, it became commonplace to provide digital documentation to the users. For example, with Unix (and Linux) systems, it became commonplace to distribute digital documentation alongside the software it documented. This documentation came to be known as man pages based on the man command (short for manual) that would open the documentation for reading. For example, to learn more about the linux search tool grep, you would type the command:

$ man grep

Which would open the documentation distributed with the grep tool. Man pages are written in a specific format; you can read more about it here.

While a staple of the Unix/Linux filesystem, there was no equivalent to man pages in the DOS ecosystem (the foundations of Windows) until Powershell was introduced, which has the Get-Help tool. You can read more about it here.

However, once software began to be written with graphical user interfaces (GUIs), it became commonplace to incorporate the user documentation directly into the GUI, usually under a “Help” menu. This served a similar purpose to man pages of ensuring user documentation was always available with the software. Of course, one of the core goals of software design is to make the software so intuitive that users don’t need to reference the documentation. It is equally clear that developers often fall short of that mark, as there is a thriving market for books to teach certain software.

Not to mention the thousands of YouTube channels devoted to teaching specific programs!

Developer Documentation

Developer documentation underwent a similar transformation. Early developer documentation was often printed and placed in a three-ring binder, as Neal Stephenson describes in his novel Snow Crash: ¹

Fisheye has taken what appears to be an instruction manual from the heavy black suitcase. It is a miniature three-ring binder with pages of laser-printed text. The binder is just a cheap unmarked one bought from a stationery store. In these respects, it is perfectly familiar to Him: it bears the earmarks of a high-tech product that is still under development. All technical devices require documentation of a sort, but this stuff can only be written by the techies who are doing the actual product development, and they absolutely hate it, always put the dox question off to the very last minute. Then they type up some material on a word processor, run it off on the laser printer, send the departmental secretary out for a cheap binder, and that's that.

Shortly after the time this novel was written, the internet became available to the general public, and the tools it spawned would change how software was documented forever. Increasingly, web-based tools are used to create and distribute developer documentation. Wikis, bug trackers, and autodocumentation tools quickly replaced the use of lengthy, and infrequently updated word processor files.

Neal Stephenson, “Snow Crash.” Bantam Books, 1992. ↩︎

Documentation Formats

Developer documentation often faces a challenge not present in other kinds of documents - the need to be able to display snippets of code. Ideally, we want code to be formatted in a way that preserves indentation. We also don’t want code snippets to be subject to spelling- and grammar-checks, especially auto-correct versions of these algorithms, as they will alter the snippets. Ideally, we might also apply syntax highlighting to these snippets. Accordingly, a number of textual formats have been developed to support writing text with embedded program code, and these are regularly used to present developer documentation. Let’s take a look at several of the most common.

HTML

Since its inception, HTML has been uniquely suited for developer documentation. It requires nothing more than a browser to view - a tool that nearly every computer is equipped with (in fact, most have two or three installed). And the <code> element provides a way of styling code snippets to appear differently from the embedded text, and <pre> can be used to preserve the snippet’s formatting. Thus:

<p>This algorithm reverses the contents of the array, <code>nums</code></p>
<pre>
<code>
for(int i = 0; i < nums.Length/2; i++) {
    int tmp = nums[i];
    nums[i] = nums[nums.Length - 1 - i];
    nums[nums.Length - 1 - i] = tmp;
}
</code>
</pre>

Will render in a browser as:

This algorithm reverses the contents of the array, nums


for(int i = 0; i < nums.Length/2; i++) {
    int tmp = nums[i];
    nums[i] = nums[nums.Length - 1 - i];
    nums[nums.Length - 1 - i] = tmp;
}

JavaScript and CSS libraries like highlight.js, prism, and others can provide syntax highlighting functionality without much extra work.

Of course, one of the strongest benefits of HTML is the ability to create hyperlinks between pages. This can be invaluable in documenting software, where the documentation about a particular method could include links to documentation about the classes being supplied as parameters, or being returned from the method. This allows developers to quickly navigate and find the information they need as they work with your code.

Markdown

However, there is a significant amount of boilerplate involved in writing a webpage (i.e. each page needs a minimum of elements not specific to the documentation to set up the structure of the page). The extensive use of HTML elements also makes it more time-consuming to write and harder for people to read in its raw form. Markdown is a markup language developed to counter these issues. Markdown is written as plain text, with a few special formatting annotations, which indicate how it should be transformed to HTML. Some of the most common annotations are:

Starting a line with hash (#) indicates it should be a <h1> element, two hashes (##) indicates a <h2>, and so on…
Wrapping a statement with underscores (_) or asterisks (*) indicates it should be wrapped in a <i> element
Wrapping a statement with double underscores (__) or double asterisks (**) indicates it should be wrapped in a <b> element
Links can be written as [link text](url), which is transformed to <a href="url">link text</a>
Images can be written as ![alt text](url), which is transformed to <img alt="alt text" src="url"/>

Code snippets are indicated with backtick marks (`). Inline code is written surrounded with single backtick marks, i.e. `int a = 1` and in the generated HTML is wrapped in a <code> element. Code blocks are wrapped in triple backtick marks, and in the generated HTML are enclosed in both <pre> and <code> elements. Thus, to generate the above HTML example, we would use:


This algorithm reverses the contents of the array, `nums`
```
for(int i = 0; i < nums.Count/2; i++) {
    int tmp = nums[i];
    nums[i] = nums[nums.Count - 1 - i];
    nums[nums.Count - 1 - i] = tmp;
}
```

Most markdown compilers also support specifying the language (for language-specific syntax highlighting) by following the first three backticks with the language name, i.e.:


```csharp
List = new List;
```

Nearly every programming language features at least one open-source library for converting Markdown to HTML. Microsoft even includes a C# one in the Windows Community Toolkit. In addition to being faster to write than HTML, and avoiding the necessity to write boilerplate code, Markdown offers some security benefits. Because it generates only a limited set of HTML elements, which specifically excludes some most commonly employed in web-based exploits (like using <script> elements for script injection attacks), it is often safer to allow users to contribute markdown-based content than HTML-based content. Note: this protection is dependent on the settings provided to your HTML generator - most markdown converters can be configured to allow or escape HTML elements in the markdown text

In fact, this book was written using Markdown, and then converted to HTML using the Hugo framework, a static website generator built using the Go programming language.

Additionally, chat servers like RocketChat and Discord support using markdown in posts!

GitHub even incorporates a markdown compiler into its repository displays. If your file ends in a .md extension, GitHub will evaluate it as Markdown and display it as HTML when you navigate your repo. If your repository contains a README.md file at the top level of your project, it will also be displayed as the front page of your repository. GitHub uses an expanded list of annotations known as GitHub-flavored markdown that adds support for tables, task item lists, strikethroughs, and others.

Info

It is best practice to include a README.md file at the top level of a project. This document provides an overview of the project, as well as helpful instructions on how it is to be used and where to go for more information. For open-source projects, you should also include a LICENSE file that contains the terms of the license the software is released under.

XML

Extensible Markup Language (XML) is a close relative of HTML - they share the same ancestor, Standard Generalized Markup Language (SGML). It allows developers to develop their own custom markup languages based on the XML approach, i.e. the use of elements expressed via tags and attributes. XML-based languages are usually used as a data serialization format. For example, this snippet represents a serialized fictional student:

<student>
    <firstName>Willie</firstName>
    <lastName>Wildcat</lastName>
    <wid>8888888</wid>
    <degreeProgram>BCS</degreeProgram>
</student>

While XML is most known for representing data, it is one of Microsoft’s go-to tools. For example, they have used it as the basis of Extensible Application Markup Language (XAML), which is used in Windows Presentation Foundation as well as cross-platform Xamrin development. So it shouldn’t be a surprise that Microsoft also adopted it for their autodocumentation code commenting strategy. We’ll take a look at this next.

Autodocs

One of the biggest innovations in documenting software was the development of autodocumentation tools. These were programs that would read source code files, and combine information parsed from the code itself and information contained in code comments to generate documentation in an easy-to-distribute form (often HTML). One of the earliest examples of this approach came from the programming language Java, whose API specification was generated from the language source files using JavaDoc.

This approach meant that the language of the documentation was embedded within the source code itself, making it far easier to update the documentation as the source code was refactored. Then, every time a release of the software was built (in this case, the Java language), the documentation could be regenerated from the updated comments and source code. This made it far more likely developer documentation would be kept up-to-date.

Microsoft adopted a similar strategy for the .NET languages, known as XML comments. This approach was based on embedding XML tags into comments above classes, methods, fields, properties, structs, enums, and other code objects. These comments are set off with a triple forward slash (///) to indicate the intent of being used for autodoc generation. Comments using double slashes (//) and slash-asterisk notation (/* */) are ignored in this autodoc scheme.

For example, to document an Enum, we would write:

/// <summary>
/// An enumeration of fruits used in pies 
/// </summary>
public enum Fruit {
    Cherry,
    Apple,
    Blueberry,
    Peach
}

At a bare minimum, comments should include a <summary> element containing a description of the code structure being described.

Let’s turn our attention to documenting a class:

public class Vector2 {

    public float X {get; set;}

    public float Y {get; set;}

    public Vector2(float x, float y) {
        X = x;
        Y = y;
    }

    public void Scale(float scalar) {
        X *= scalar;
        Y *= scalar;
    }

    public float DotProduct(Vector2 other) {
        return this.X * other.X + this.Y * other.Y;
    }

    public float Normalize() {
        float magnitude = Math.Sqrt(Math.Pow(this.X, 2), Math.Pow(this.Y, 2));
        if(magnitude == 0) throw new DivideByZeroException();
        X /= magnitude;
        Y /= magnitude;
    }
}

We would want to add a <summary> element just above the class declaration, i.e.:

/// <summary>
/// A class representing a two-element vector composed of floats 
/// </summary>

Properties should be described using the <summary> element, i.e.:

/// <summary>
/// The x component of the vector 
/// </summary>

And methods should use <summary>, plus <param> elements to describe parameters. It has an attribute of name that should be set to match the parameter it describes:

/// <summary>
/// Constructs a new two-element vector 
/// </summary>
/// <param name="x">The X component of the new vector</param>
/// <param name="y">The Y component of the new vector</param>

The <paramref> can be used to reference a parameter in the <summary>:

/// <summary>
/// Scales the Vector2 by the provided <paramref name="scalar"/> 
/// </summary>
/// <param name="scalar">The value to scale the vector by</param>

If a method returns a value, this should be indicated with the <returns> element:

/// <summary>
/// Computes the dot product of this and an <paramref name="other"> vector
/// </summary>
/// <param name="other">The vector to compute a dot product with</param>
/// <returns>The dot product</returns>

And, if a method might throw an exception, this should be also indicated with the <exception> element, which uses the cref attribute to indicate the specific exception:

/// <summary>
/// Normalizes the vector
/// </summary>
/// <remarks>
/// This changes the length of the vector to one unit.  The direction remains unchanged 
/// </remarks>
/// <exception cref="System.DivideByZeroException">
/// Thrown when the length of the vector is 0.
/// </exception>

Note too, the use of the <remarks> element in the above example to add supplemental information. The <example> element can also be used to provide examples of using the class, method, or other code construct. There are more elements available, like <see> and <seealso> that generate links to other documentation, <para>, and <list> which are used to format text, and so on.

Of especial interest are the <code> and <c> elements, which format code blocks and inline code, respectively.

See the official documentation for a complete list and discussion.

Thus, our completely documented class would be:

/// <summary>
/// A class representing a two-element vector composed of floats 
/// </summary>
public class Vector2 {

    /// <summary>
    /// The x component of the vector 
    /// </summary>
    public float X {get; set;}

    /// <summary>
    /// The y component of the vector 
    /// </summary>
    public float Y {get; set;}

    /// <summary>
    /// Constructs a new two-element vector 
    /// </summary>
    /// <param name="x">The X component of the new vector</param>
    /// <param name="y">The Y component of the new vector</param>
    public Vector2(float x, float y) {
        X = x;
        Y = y;
    }

    /// <summary>
    /// Scales the Vector2 by the provided <paramref name="scalar"/> 
    /// </summary>
    /// <param name="scalar">The value to scale the vector by</param>
    public void Scale(float scalar) {
        X *= scalar;
        Y *= scalar;
    }

    /// <summary>
    /// Computes the dot product of this and an <paramref name="other"> vector
    /// </summary>
    /// <param name="other">The vector to compute a dot product with</param>
    /// <returns>The dot product</returns>
    public float DotProduct(Vector2 other) {
        return this.X * other.X + this.Y * other.Y;
    }

    /// <summary>
    /// Normalizes the vector
    /// </summary>
    /// <remarks>
    /// This changes the length of the vector to one unit.  The direction remains unchanged 
    /// </remarks>
    /// <exception cref="System.DivideByZeroException">
    /// Thrown when the length of the vector is 0.
    /// </exception>
    public float Normalize() {
        float magnitude = Math.Sqrt(Math.Pow(this.X, 2), Math.Pow(this.Y, 2));
        if(magnitude == 0) throw new DivideByZeroException();
        X /= magnitude;
        Y /= magnitude;
    }
}

With the exception of the <remarks>, the XML documentation elements used in the above code should be considered the minimum for best practices. That is, every Class, Struct, and Enum should have a <summary>. Every property should have a <summary>. And every method should have a <summary>, a <param> for every parameter, a <returns> if it returns a value (this can be omitted for void) and an <exception> for every exception it might throw.

There are multiple autodoc programs that generate documentation from XML comments embedded in C# code, including open-source Sandcastle Help File Builder and the simple Docu, as well as multiple commercial products.

However, the perhaps more important consumer of XML comments is Visual Studio, which uses these comments to power its Intellisense features, displaying text from the comments as tooltips as you edit code. This intellisense data is automatically built into DLLs built from Visual Studio, making it available in projects that utilize compiled DLLs as well.

Summary

In this chapter, we examined the need for software documentation aimed at both end-users and developers (user documentation and developer documentation respectively). We also examined some formats this documentation can be presented in: HTML, Markdown, and XML. We also discussed autodocumentation tools, which generate developer documentation from specially-formatted comments in our code files.

We examined the C# approach to autodocumentation, using Microsoft’s XML code comments formatting strategy. We explored how this data is used by Visual Studio to power its Intellisense features, and provide useful information to programmers as they work with constructs like classes, properties, and methods. For this reason, as well as the ability to produce HTML-based documentation using an autodocumentation tool, it is best practice to use XML code comments in all your C# programs.

Documentation

Subsections of Documentation

Introduction

Key Terms

Key Skills

Documentation

User Documentation

Developer Documentation

Documentation Formats

HTML

Markdown

XML

Autodocs

Summary