Implementing a Graph :: Data Structures in C#

Implementing a Graph

Traditionally, there are two main techniques for implementing a graph. Each of these techniques has advantages and disadvantages, depending on the characteristics of the graph. In this section, we describe the implementation of the DirectedGraph<TNode, TEdgeData> class from Ksu.Cis300.Graphs.dll. This implementation borrows from both traditional techniques to obtain an implementation that provides good performance for any graph. In what follows, we will first describe the two traditional techniques and discuss the strengths and weaknesses of each. We will then outline the implementation of DirectedGraph<TNode, TEdgeData>.

The first traditional technique is to use what we call an adjacency matrix. This matrix is an $ n \times n $ boolean array, where $ n $ is the number of nodes in the graph. In this implementation, each node is represented by an int value $ i $, where $ 0 \leq i \lt n $. The value at row $ i $ and column $ j $ will be true if there is an edge from node $ i $ to node $ j $.

The main advantage to this technique is that we can very quickly determine whether an edge exists — we only need to look up one element in an array. There are several disadvantages, however. First, we are forced to use a specific range of int values as the nodes. If we wish to have a generic node type, we need an additional data structure (such as a Dictionary<TNode, int>) to map each node to its int representation. It also fails to provide a way to associate a value with an edge; hence, we would need an additional data structure (such as a TEdgeData[int, int]) to store this information.

Perhaps the most serious shortcoming for the adjacency matrix, however, is that if the graph contains a large number of nodes, but relatively few edges, it wastes a huge amount of space. Suppose, for example, that we have a graph representing street information, and suppose there are about one million nodes in this graph. We might expect the graph to contain around three million edges. However, an adjacency matrix would require one trillion entries, almost all of which will be false. Similarly, finding the edges from a given node would require examining an entire row of a million elements to find the three or four outgoing edges from that node.

The other traditional technique involves using what we call adjacency lists. An adjacency list is simply a linked list containing descriptions of the outgoing edges from a single node. These lists are traditionally grouped together in an array of size $ n $, where $ n $ is again the number of nodes in the graph. As with the adjacency matrix technique, the nodes must be nonnegative ints less than $ n $. The linked list at location $ i $ of the array then contains the descriptions of the outgoing edges from node $ i $.

One advantage to this technique is that the amount of space it uses is proportional to the size of the graph (i.e., the number of nodes plus the number of edges). Furthermore, obtaining the outgoing edges from a given node simply requires traversing the linked list containing the descriptions of these edges. Note also that we can store the data associated with an edge within the linked list cell describing that edge. However, this technique still requires some modification if we wish to use a generic node type. A more serious weakness, though, is that in order to determine if a given edge exists, we must search through potentially all of the outgoing edges from a given node. If the number of edges is large in comparison to the number of nodes, this search can be expensive.

As we mentioned above, our implementation of DirectedGraph<TNode, TEdgeData> borrows from both of these traditional techniques. We start by modifying the adjacency lists technique to use a Dictionary<TNode, LinkedListCell<TNode>?> instead of an array of linked lists. Thus, we can accommodate a generic node type while maintaining efficient access to the adjacency lists. While a dictionary lookup is not quite as efficient as an array lookup, a dictionary would provide the most efficient way of mapping nodes of a generic type to int array indices. Using a dictionary instead of an array eliminates the need to do a subsequent array lookup. The linked list associated with a given node in this dictionary will then contain the destination node of each outgoing edge from the given node.

In addition to this dictionary, we use a Dictionary<(TNode, TNode), TEdgeData> to facilitate efficient edge lookups. The notation (T1, T2) defines a tuple, which is an ordered pair of elements, the first of type T1, and the second of type T2. Elements of this type are described with similar notation, (x, y), where x is of type T1 and y is of type T2. These elements can be accessed using the public properties Item1 and Item2. In general, longer tuples can be defined similarly.

This second dictionary essentially fills the role of an adjacency matrix, while accommodating a generic node type and using space more efficiently. Specifically, a tuple whose Item1 is u and whose Item2 is v will be a key in this dictionary if there is an edge from node u to node v. The value associated with this key will be the data associated with this edge. Thus, looking up an edge consists of a single dictionary lookup.

The two dictionaries described above are the only private fields our implementation needs. We will refer to them as _adjacencyLists and _edges, respectively. Because we can initialize both fields to new dictionaries, there is no need to define a constructor. Furthermore, given these two dictionaries, most of the public methods and properties (see “Introduction to Graphs”) can be implemented using a single call to one of the members of one of these dictionaries:

void AddNode(TNode node): We can implement this method using the Add method of _adjacencyLists. We associate an empty linked list with this node.
void AddEdge(TNode source, TNode dest, TEdgeData value): See below.
bool TryGetEdge(TNode source, TNode dest, out TEdgeData? value): We can implement this method using the TryGetValue method of _edges.
int NodeCount: Because _adjacencyLists contains all of the nodes as keys, we can implement this property using this dictionary’s Count property.
int EdgeCount: We can implement this property using the Count property of _edges.
bool ContainsNode(TNode node): We can implement this method using the ContainsKey method of _adjacencyLists.
bool ContainsEdge(TNode source, TNode dest): We can implement this method using the ContainsKey method of _edges.
IEnumerable<TNode> Nodes: We can implement this property using the Keys property of _adjacencyLists.
IEnumerable<Edge<TNode, TEdgeData>> OutgoingEdges(TNode source): See below.

Let’s now consider the implementation of the AddEdge method. Recall from “Introduction to Graphs” that this method adds an edge from source to dest with data item value. If either source or dest is not already in the graph, it will be added. If either source or dest is null, it will throw an ArgumentNullException. If source and dest are the same, or if the edge already exists in the graph, it will throw an ArgumentException.

In order to avoid changing the graph if the parameters are bad, we should do the error checking first. However, there is no need to check whether the edge already exists, provided we update _edges using its Add method, and that we do this before making any other changes to the graph. Because a dictionary’s Add method will throw an ArgumentException if the given key is already in the dictionary, it takes care of this error checking for us. The key that we need to add will be a (TNode, TNode) containing the two nodes, and the value will be the value.

After we have updated _edges, we need to update _adjacencyLists. To do this, we first need to obtain the linked list associated with the key source in _adjacencyLists; however, because source may not exist as a key in this dictionary, we should use the TryGetValue method to do this lookup (note that if source is not a key in this dictionary, the out parameter will be set to null, which we can interpret as an empty list). We then construct a new linked list cell containing dest as its data and insert it at the beginning of the linked list we retrieved. We then set this linked list as the new value associated with source in _adjacencyLists. Finally, if _adjacencyLists doesn’t already contain dest as a key, we need to add it with null as its associated value.

Finally, we need to implement the OutgoingEdges method. Because this method returns an IEnumerable<Edge<TNode, TEdgeData>>, it needs to iterate through the cells of the linked list associated with the given node in _adjacencyLists. For each of these cells, it will need to yield return (see “Enumerators”) an Edge<TNode, TEdgeData> describing the edge represented by that cell. The source node for this edge will be the node given to this method. The destination node will be the node stored in the cell. The edge data can be obtained from the dictionary _edges.