Linked Lists

Using arrays to implement data structures has performance advantages in some cases, but this technique has its limitations. With this chapter, we begin a study of data structures that use reference types in a powerful way. Rather than forming sequences by placing data items in adjacent cells of an array, we instead use references to chain data elements together in a sequence. For some applications, this ends up being more efficient than using an array. As we will see in a later chapter, this chaining technique can be further exploited to link data items in a hierarchical way, providing even more flexible and efficient access.

Subsections of Linked Lists

Introduction to Linked Lists

To build a linked list, we first need to define a simple class, which we will call LinkedListCell<T>. Instances of this class will be the individual building blocks that we will chain together to form linked lists. T will be the type of the data item we will store in each cell - i.e., the type of the data items that we will store in the linked list.

A LinkedListCell<T> will contain two public properties, which can each be implemented using the default implementation:

  • The Data property gets or sets the data item (of type T) stored in the cell.
  • The Next property gets or sets the next LinkedListCell<T>? in the linked list. If there is no next cell, it gets null.

Because this is a class, it is a reference type; hence, the Next property will store a reference to another LinkedListCell<T>.

The only other member of this class is a public constructor. Because we don’t want to make Data nullable unless the user code specifies a nullable type for T, we need to make sure it is initialized to an appropriate value. For this purpose, we use a public constructor that takes the following parameters:

  • The initial Data value (of type T).
  • The next cell in the list (of type LinkedListCell<T>?).

It simply sets the values of the two properties to the given parameters.

Although the LinkedListCell<T> class is simple, we can use its Next property to chain together long sequences of its instances:

A linked list A linked list

In the above figure, p is a LinkedListCell<string> variable. Each box in the figure represents an instance of LinkedListCell<string>. The boxes are each divided into two regions to indicate the two public properties for each cell. Because string is a reference type, we have shown each Data property as a reference to a string. The rightmost arrow that is bent downward is used to represent null. The entire sequence of LinkedListCell<string>s is called a linked list. Given this linked list:

  • p.Data is “Now”;
  • p.Next.Data is “is”;
  • p.Next.Next.Data is “the”;
  • p.Next.Next.Next.Data is “time”; and
  • p.Next.Next.Next.Next is null (if we try to get its Data property, we will get a NullReferenceException).
Tip

When writing code for using and manipulating linked lists, it is helpful to draw pictures of the lists, as we do throughout this chapter.

Suppose we want to insert the string “Notice:” at the beginning of this linked list. We use the LinkedListCell<T> constructor to initialize a new cell:

LinkedListCell<string> cell = new("Notice", p);

This yields the following:

Linking in the cell Linking in the cell

This is what we want, unless we want p to refer to the beginning of the linked list. We can take care of this by copying the value of cell to p:

p = cell;

This yields the following (we are not showing cell because we are no longer interested in it, but it still refers to the same cell):

Completing the insertion Completing the insertion

We can also undo the above statement by copying into p the reference in the Next property of the cell to which p refers:

p = p.Next;

(If this statement occurs in a context in which the compiler cannot determine that p is not null, an ! will need to be inserted prior to .Next.) This yields the following:

Removing the first cell Removing the first cell

This has the effect of removing “Notice:” from the linked list to which p refers. Though we haven’t shown it, cell still refers to the cell containing “Notice:”; hence, we still have access to the linked list beginning with this cell. However, if the program had no references remaining to this cell, we would have no way to retrieve it, and it would be available for garbage collection. This illustrates how we must take care not to lose a part of a linked list (unless we are finished with it) when we are manipulating it.

With a little more work, cells can be inserted into or removed from arbitrary locations in a linked list. We will discuss how to do this in subsequent sections. For now let us simply observe that linked lists do not always continue until they reach a null - they can contain cycles, as follows:

A linked list with a cycle A linked list with a cycle

This is occasionally what we want, but more often, this is a programming error that leads to an infinite loop.

Implementing Stacks and Queues with Linked Lists

Because linked lists store data elements in linear sequences, they can be used to give alternative implementations of stacks and queues. One advantage to using linked lists is that we don’t have to worry about filling up something like an array - we can just keep allocating cells as long as we need to (unless we run out of memory).

Implementing a stack using a linked list is particularly easy because all accesses to a stack are at the top. One end of a linked list, the beginning, is always directly accessible. We should therefore arrange the elements so that the top element of the stack is at the beginning of the linked list, and the bottom element of the stack is at the end of the linked list. We can represent an empty stack with null.

We therefore need a private LinkedListCell<T>? field to implement a generic stack Stack<T> using a linked list. This field will refer to the cell containing the data item at the top of the stack. If the stack is empty, this field will be null; hence, this field should be null initially. A public Count property will be used to keep track of the number of elements in the stack.

The public methods Push, Peek, and Pop are then fairly straightforward to implement. For Push we need to add the given element to a new cell at the beginning of the linked list, as shown in the previous section, and update the Count. To implement Peek, if the stack is nonempty, we simply return the Data property of the cell at the beginning of the linked list; otherwise, we throw an InvalidOperationException. Note that we can determine whether the stack is empty by examining either the LinkedListCell<T>? field or the Count property; however, examining the LinkedListCell<T>? field allows the compiler to determine that the Data property of the first cell can be accessed without throwing a NullReferenceException.

To implement Pop:

  1. Using Peek, obtain the element to be returned.
  2. Remove the first element from the linked list as shown in the previous section.
  3. Update the Count.
  4. Return the retrieved value.

Note that the call to Peek in step 1 ensures that the stack is nonempty before we remove the first element; however, the compiler won’t be able to determine this.

Implementing a queue is a little more involved because we need to operate at both ends of the linked list. For efficiency, we should keep a reference to the last cell in the linked list, as this will allow us to access both ends of the linked list directly. We will therefore have the following:

Implementing a queue with a linked list Implementing a queue with a linked list

We now need to decide which end to make the front of the queue. As we saw in the previous section, both inserting and removing can be done efficiently at the beginning of a linked list. Likewise, it is easy to insert an element at the end if we have a reference to the last cell. Suppose, for example, that last refers to the last cell in a linked list, and that cell refers to a LinkedListCell<T> that we want to insert at the end. Suppose further that the linked list is not empty (that will be a special case that we’ll need to handle). Thus, we have the following:

A cell to insert at the end of a linked\nlist A cell to insert at the end of a linked\nlist

To insert this cell at the end of the linked list, we just need to copy the reference in cell to the Next property of the cell to which last refers:

last.Next = cell;

On the other hand, removing the last cell is problematic, even if we have a reference to it. The problem is that in order to remove it from the linked list, we need to change the Next property of the preceding cell. Unfortunately, the only way to obtain that cell is to start at the beginning of the list and work our way through it. If the linked list is long, this could be quite inefficient.

Note

It doesn’t help any to keep a reference to the next-to-last cell, as we encounter the same problem when we need to update this reference after removing the last cell — we don’t have a reference to its preceding cell.

Because we need to remove elements from the front of a queue, but not from the back, we conclude that it will work best to make the beginning of the linked list the front of the queue. We therefore need the following private fields to implement a generic queue Queue<T>:

  • A LinkedListCell<T>? giving the element at the front of the queue. This will be the beginning of the linked list of queue elements.
  • A LinkedListCell<T>? giving the element at the back of the queue. This will be the last cell in the linked list of queue elements.

As we mentioned earlier, adding an element to an empty queue is a special case that we will need to handle separately. For this reason, it doesn’t matter what values the two LinkedListCell<T>? fields contain when the queue is empty - we can always detect when the queue is empty by checking the Count. The initialization of the two LinkedListCell<T>? fields is therefore unimportant. It is easiest to just leave them null.

Let us now consider the implementation of the Enqueue method. We need to consider two cases. We’ll first consider the case in which the queue is empty. In this case, we need to build the following linked list:

A queue containing a single linked list cell A queue containing a single linked list cell

We therefore need to:

  1. Construct a new LinkedListCell<T> containing the element we want to enqueue and no next cell.
  2. Assign it to the field denoting the front of the queue.
  3. Assign it to the field denoting the back of the queue.
  4. Update the Count.

If the queue is nonempty, the only step that changes is Step 2. Because the queue is nonempty, we don’t want to make the new cell the front of the queue; instead, we need to insert it at the end of the linked list, as outlined above.

The implementations of the Peek and Dequeue methods are essentially the same as the implementations of the Peek and Pop methods, respectively, for a stack.

The implementations described in this section are simpler than the implementations using arrays, mainly due to the fact that we don’t need to rebuild the structure when we fill up the space available. While these implementations are also pretty efficient, it turns out that the array-based implementations tend to out-perform the linked-list-based implementations. This might be counterintuitive at first because rebuilding the structures when the array is filled is expensive. However, due to the fact that we double the size of the array each time we need a new one, this rebuilding is done so rarely in practice that it ends up having minimal impact on performance. Due to hardware and low-level software issues, the overhead involved in using arrays usually ends up being less.

Finding Prime Numbers

An integer greater than $ 1 $ is said to be prime if it is not divisible by any positive integers other than itself and $ 1 $. Thus, $ 2 $, $ 3 $, and $ 5 $ are prime, but not $ 1 $ (it is not greater than $ 1 $) or $ 4 $ (it is divisible by $ 2 $). Because every integer is divisible by itself and $ 1 $, we will call any other positive factors nontrivial factors; thus, a prime number is an integer greater than $ 1 $ that has no nontrivial factors. The study of prime numbers dates back to at least the third century BC. One of the earliest known algorithms finds all prime numbers less than a given integer $ n $. This algorithm is known as the Sieve of Eratosthenes, and is attributed to the Greek mathematician Eratosthenes of Cyrene (c. 276 BC - c. 194 BC).

The most basic version of this algorithm operates as follows:

  1. Place all integers greater than $ 1 $ and less than $ n $ in order in a list.
  2. For each element $ k $ in the list, remove all subsequent elements that are divisible by $ k $.
  3. The remaining values are the prime numbers less than $ n $.

For example, suppose $ n = 20 $. We then place the integers from $ 2 $ to $ 19 $ in a list: $$ 2\ 3\ 4\ 5\ 6\ 7\ 8\ 9\ 10\ 11\ 12\ 13\ 14\ 15\ 16\ 17\ 18\ 19 $$ We then remove all numbers following $ 2 $ that are divisible by $ 2 $: $$ \require{cancel} 2\ 3\ \cancel{4}\ 5\ \cancel{6}\ 7\ \cancel{8}\ 9\ \cancel{10}\ 11\ \cancel{12}\ 13\ \cancel{14}\ 15\ \cancel{16}\ 17\ \cancel{18}\ 19 $$ We then remove all numbers following $ 3 $ that are divisible by $ 3 $: $$ 2\ 3\ \cancel{4}\ 5\ \cancel{6}\ 7\ \cancel{8}\ \cancel{9}\ \cancel{10}\ 11\ \cancel{12}\ 13\ \cancel{14}\ \cancel{15}\ \cancel{16}\ 17\ \cancel{18}\ 19 $$ The algorithm continues, but none of the succeeding iterations finds any values to remove. Therefore, $ 2, 3, 5, 7, 11, 13, 17 $, and $ 19 $ are the prime numbers less than $ 20 $.

To see why this algorithm gives us exactly the prime numbers less than $ n $, first note that because we only remove a number when we find a nontrivial factor, we only remove non-primes from the list. What may be a little less obvious is that we remove all non-primes from the list. To see this, suppose $ m $ is a non-prime less than $ n $, and let $ a $ be its smallest nontrivial factor. Then $ a $ must be prime because any nontrivial factor of $ a $ would be less than $ a $ and would also divide $ m $. $ a $ therefore will not be removed from the list. When $ k = a $ in Step 2, $ m $ will be removed.

There is actually a good reason why the first two iterations in the above example removed all of the non-primes — once the algorithm reaches a divisor $ k $ such that $ k^2 \geq n $ (in this example, $ 5^2 = 25 \geq 20 $), all of the non-primes will have been removed. To see why this is true, let $ m $ and $ a $ be as above. We can then write $$ m = ab $$ where $ a \leq b $, and $ m $ is removed from the list when $ k = a $. We can then multiply both sides of the above equation by $ a/b $, yielding: $$ \frac{am}{b} = a^2. $$ Finally, because $ a \leq b $, $ a/b \leq 1 $. Therefore, $$ m \geq a^2. $$ We conclude that if $ m $ is a non-prime greater than $ 1 $, it is removed when the algorithm reaches some value $ k $ with $ k^2 < n $. We can therefore optimize the algorithm by stopping when $ k^2 \geq n $.

We can implement this algorithm using a linked list. A linked list is an appropriate data structure for this algorithm because once the list is built, all of the processing involves iterating through it from beginning to end — the same direction the links go.

To implement Step 1, it is easier to build the list from back to front, as we don’t need to maintain a separate reference to the end of the list. This step then consists of a loop that iterates from $ n - 1 $ down to $ 2 $, with each iteration adding to the front of the list a cell containing the loop index.

In order to be able to implement Step 2, we will need to know how to remove a cell from a linked list. Suppose, for example, that we want to remove the cell referring to “the” from the following linked list:

A linked list from which we want to remove x A linked list from which we want to remove x

To remove it, we need the cell that precedes it to be followed by the cell that follows it:

What we need to change in order to remove the\ncell What we need to change in order to remove the\ncell

In order to change that reference, we need a reference to the cell that precedes the cell we want to remove:

The additional reference we need The additional reference we need

We can then remove the cell following the cell referenced by q as follows:

q.Next = q.Next.Next;

Now that we know how to remove a cell from a linked list, let’s consider Step 2 of the algorithm. For one value of $ k $, we need to remove all subsequent values that are divisible by $ k $. In terms of the linked list, we need to start this process with the cell containing $ k $. For example, consider the second iteration from the example above — i.e., when $ k = 3 $:

The beginning of an iteration with k = 3 The beginning of an iteration with k = 3

We need to iterate p through the linked list, checking the next cell on each iteration to see whether its contents are divisible by $ k $. We can check for divisibility by $ k $ using the remainder operator — i.e., $ k $ divides $ m $ if $ m \mathbin{\texttt{%}} k $ is 0. Thus, the first iteration would see if $ 3 $ divides $ 5 $. It doesn’t, so we advance p to the next cell (containing $ 5 $). We then see if $ 3 $ divides $ 7 $. Again it doesn’t, so we advance p to the next cell (containing $ 7 $). At this point, $ 3 $ divides $ 9 $, so we remove the cell containing $ 9 $ as shown above. This gives us the following linked list:

After 9 has been removed After 9 has been removed

Note that we have not yet advanced p, and indeed we don’t want to, as $ 11 $ is the next value we want to check. Thus, on each iteration, if $ k $ divides the value in the cell following p, we remove that cell; otherwise, we advance p to that cell. We iterate this loop as long as there is a cell following p.

The loop described above represents a single iteration of the loop described for Step 2. Thus, for Step 2, we need to iterate a variable through the list, performing the above on each iteration. We stop when we either have run off the end of the list or have reached a value of $ k $ such that $ k^2 \geq n $. Note that at the end of each iteration, we want to advance to the next cell.

Warning

Make sure when iterating through a linked list that you keep a reference to the beginning of the list. Otherwise, you will lose all of your list.