StringBuilders
In the previous section, we saw that building large strings from small pieces by concatenating them together is very inefficient. This inefficiency is due to the fact that strings are immutable. In order to overcome the inefficiency of concatenation, we need an alternative data structure that we can modify. The StringBuilder class fills this need.
Like strings, StringBuilders implement sequences of characters, but the contents of StringBuilders can be changed. The StringBuilder class has six constructors. The simplest StringBuilder constructor takes no parameters and constructs an empty StringBuilder — i.e., a StringBuilder containing no characters:
StringBuilder sb = new();
We can then modify a StringBuilder in various ways. First, we may
append a char to the end using its
Append
method. This method not only changes the contents of the
StringBuilder, but it also returns a reference to it. Thus if we
have char variables, a
, b
, and c
, and a StringBuilder
variable sb
, we can write code such as:
sb.Append(a).Append(b).Append(c);
The first call to Append appends the contents of a
to sb
, and
returns sb
. Thus, the second call to Append also applies to sb
-
it appends the contents of b
. Likewise, the third call appends the
contents of c
.
Because this method changes a StringBuilder, rather than constructing a new one, its implementation is very efficient - in most cases, only the appended character needs to be copied (see “Implementation of StringBuilders” for details). This class has other Append methods as well, including one that appends the contents of a given string. This method likewise only needs to copy the appended characters.
Let us now return to the problem of converting all lower-case letters in a string to upper-case, converting all upper-case letters to lower-case, and leaving all other characters unchanged. We can use a StringBuilder as an intermediate data structure to do this much more efficiently than the code presented in the previous section:
StringBuilder sb = new();
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (char.IsLower(c))
{
sb.Append(char.ToUpper(c));
}
else if (char.IsUpper(c))
{
sb.Append(char.ToLower(c));
}
else
{
sb.Append(c);
}
}
string result = sb.ToString();
On most iterations, the above loop only copies one character. In
addition, the call to the StringBuilder’s ToString method copies
each character in the result. If text
is 100,000 characters long, this
is a total of 200,000 character copies. Using the StringBuilder
implementation described in the next
section, there
are some iterations that copy more than one character, but even if we
account for this, it turns out that fewer than 400,000 characters are
copied, as opposed to over five billion character copies when
strings are used directly (see the previous
section). The .NET
StringBuilder implementation performs even
better. In either case, the above code runs in O(n) time, where n
is the length of text
; i.e., as n gets large, the running time is at
worst proportional to n. Thus, its performance degrades much less
rapidly than the O(n2) code that uses strings
directly.
A program that runs the above code and the code given in the previous section on user-provided text files can be obtained by creating a Git repository (see “Git Repositories”) using this URL. A noticeable performance difference can be seen on text files larger than 100K - for example, the full text of Lewis Carroll’s Through the Looking Glass.
StringBuilders have some features in common with strings. For
example, we access individual characters in a StringBuilder by
indexing; i.e., if sb
is a StringBuilder variable, then sb[0]
accesses its first character, sb[1]
accesses its second character,
etc. The main difference here is that with a StringBuilder, we may
use indexing to change characters; e.g., we may do the following,
provided sb
contains at least 3 characters:
sb[2] = 'x';
A StringBuilder also has a Length property, which gets the number of characters contained. However, we may also set this property to any nonnegative value, provided there is enough memory available to provide this length. For example, we may write:
sb.Length = 10;
If the new length is shorter than the old, characters are removed from the end of the StringBuilder. If the new length is longer that the old, chars containing the Unicode NULL value (0 in decimal) are appended.
The Unicode NULL value is different from a null reference. Because char is a value type, a char cannot store a null reference.
StringBuilders have many other methods to allow various kinds of manipulation - see the documentation for the StringBuilder class for details. There is also a StringBuilder constructor that takes a string as its only parameter and constructs a StringBuilder containing the contents of that string.