StringBuilders

In the previous section, we saw that building large strings from small pieces by concatenating them together is very inefficient. This inefficiency is due to the fact that strings are immutable. In order to overcome the inefficiency of concatenation, we need an alternative data structure that we can modify. The StringBuilder class fills this need.

Like strings, StringBuilders implement sequences of characters, but the contents of StringBuilders can be changed. The StringBuilder class has six constructors. The simplest StringBuilder constructor takes no parameters and constructs an empty StringBuilder — i.e., a StringBuilder containing no characters:

StringBuilder sb = new();

We can then modify a StringBuilder in various ways. First, we may append a char to the end using its Append method. This method not only changes the contents of the StringBuilder, but it also returns a reference to it. Thus if we have char variables, a, b, and c, and a StringBuilder variable sb, we can write code such as:

sb.Append(a).Append(b).Append(c);

The first call to Append appends the contents of a to sb, and returns sb. Thus, the second call to Append also applies to sb - it appends the contents of b. Likewise, the third call appends the contents of c.

Because this method changes a StringBuilder, rather than constructing a new one, its implementation is very efficient - in most cases, only the appended character needs to be copied (see “Implementation of StringBuilders” for details). This class has other Append methods as well, including one that appends the contents of a given string. This method likewise only needs to copy the appended characters.

Let us now return to the problem of converting all lower-case letters in a string to upper-case, converting all upper-case letters to lower-case, and leaving all other characters unchanged. We can use a StringBuilder as an intermediate data structure to do this much more efficiently than the code presented in the previous section:

StringBuilder sb = new();
for (int i = 0; i < text.Length; i++)
{
    char c = text[i];
    if (char.IsLower(c))
    {
        sb.Append(char.ToUpper(c));
    }
    else if (char.IsUpper(c))
    {
        sb.Append(char.ToLower(c));
    }
    else
    {
        sb.Append(c);
    }
}
string result = sb.ToString();

On most iterations, the above loop only copies one character. In addition, the call to the StringBuilder’s ToString method copies each character in the result. If text is 100,000 characters long, this is a total of 200,000 character copies. Using the StringBuilder implementation described in the next section, there are some iterations that copy more than one character, but even if we account for this, it turns out that fewer than 400,000 characters are copied, as opposed to over five billion character copies when strings are used directly (see the previous section). The .NET StringBuilder implementation performs even better. In either case, the above code runs in O(n) time, where n is the length of text; i.e., as n gets large, the running time is at worst proportional to n. Thus, its performance degrades much less rapidly than the O(n2) code that uses strings directly.

A program that runs the above code and the code given in the previous section on user-provided text files can be obtained by creating a Git repository (see “Git Repositories”) using this URL. A noticeable performance difference can be seen on text files larger than 100K - for example, the full text of Lewis Carroll’s Through the Looking Glass.

StringBuilders have some features in common with strings. For example, we access individual characters in a StringBuilder by indexing; i.e., if sb is a StringBuilder variable, then sb[0] accesses its first character, sb[1] accesses its second character, etc. The main difference here is that with a StringBuilder, we may use indexing to change characters; e.g., we may do the following, provided sb contains at least 3 characters:

sb[2] = 'x';

A StringBuilder also has a Length property, which gets the number of characters contained. However, we may also set this property to any nonnegative value, provided there is enough memory available to provide this length. For example, we may write:

sb.Length = 10;

If the new length is shorter than the old, characters are removed from the end of the StringBuilder. If the new length is longer that the old, chars containing the Unicode NULL value (0 in decimal) are appended.

Warning

The Unicode NULL value is different from a null reference. Because char is a value type, a char cannot store a null reference.

StringBuilders have many other methods to allow various kinds of manipulation - see the documentation for the StringBuilder class for details. There is also a StringBuilder constructor that takes a string as its only parameter and constructs a StringBuilder containing the contents of that string.