Graph Algorithms

Path Searches

 1function DEPTHFIRSTSEARCH(GRAPH,SRC,TAR)
 2    STACK = empty array
 3    DISCOVERED = empty set
 4    PARENT_MAP = empty dictionary
 5    append SRC to STACK
 6    while STACK is not empty
 7        CURR = top of the stack
 8        if CURR not in DISCOVERED
 9            if CURR is TAR
10                 PATH = empty array
11                 TRACE = TAR
12                 while TRACE is not SRC
13                     append TRACE to PATH
14                     set TRACE equal to PARENT_MAP[TRACE]
15                 reverse the order of PATH
16                 return PATH
17            add CURR to DISCOVERED
18            NEIGHS = neighbors of CURR
19            for EDGE in NEIGHS
20                NODE = first entry in EDGE
21                append NODE to STACK
22                if PARENT_MAP does not have key NODE
23                    in the PARENT_MAP dictionary set key NODE with value CURR
24    return nothing
  • Time: The time analysis for the depth first search can be a bit complex. Lines 1 through 5 would execute in near constant time. When we start the while loop on line 6, it is more difficult to analyze how many times this can execute as STACK can contain duplicates. In the case that we have a sparse graph, this would be bound by the number of nodes. For a dense graph however, the number of executions would be bound by the number of edges. The code within the while loop would be bound by the number of nodes because of the check that we have not already discovered the node in line 8. If we haven’t discovered it, we would take either the logic of lines 8 through 16 or lines 17 through 23 but never both in the same iteration. Both of these blocks are bound by the number of nodes in our graph. Thus the worst case time requirement would be $n^2$.
  • Space: Depending on the density of our graph, the space required will be linear with respect to the number of nodes or edges. This is due to the fact that STACK can contain duplicate nodes. If we have a sparse graph then it will be bound by the number of nodes. If we have a dense graph then the space is bound by the number of edges.
    • STACK: linear with respect to the number of edges
    • DISCOVERED: linear with respect to the number of nodes
    • PARENT_MAP: linear with respect to the number of nodes
    • CURR: 1
    • PATH: linear with respect to the number of nodes
    • TRACE: 1
    • NEIGHS: linear with respect to the number of neighbors
    • EDGE: 1
    • NODE: 1
 1function BREADTHFIRSTSEARCH(GRAPH,SRC,TAR)
 2    QUEUE = empty queue
 3    DISCOVERED = empty set
 4    PARENT_MAP = empty dictionary
 5    add SRC to DISCOVERED
 6    add SRC to QUEUE
 7    while QUEUE is not empty
 8        CURR = first element in QUEUE
 9        if CURR is TAR 
10            PATH = empty list 
11            TRACE = TAR
12            while TRACE is not SRC
13                    append TRACE to PATH
14                    set TRACE equal to PARENT_MAP[TRACE]
15                reverse the order of PATH
16                return PATH
17        NEIGHS = neighbors of CURR
18        for EDGE in NEIGHS
19            NODE = first entry in EDGE
20            if NODE is not in DISCOVERED
21                add NODE to DISCOVERED
22                if PARENT_MAP does not have key NODE
23                    in the PARENT_MAP dictionary set key NODE with value CURR
24                append NODE to QUEUE
25    return nothing
  • Time: The run time expected for breadth first search is a bit more straight forward as QUEUE will never have duplicates. Lines 1 through 6 will all execute in constant time. The while loop will occur $n$ times where $n$ is the number of nodes. Based on the logic, either 9-16 will execute or 17-24 will execute. Both of these are bound by the number of nodes in terms of time. Each iteration of the while loop will take $n$ time and we do the while loop $n$ times; thus the running time will be $n^2$.
  • Space: The space required for BFS will be linear with respect to the number of nodes. We have 5 variables which have size bound by the number of nodes. If the number of nodes doubles, then we expect the amount of space to roughly double.
    • QUEUE: linear with respect to the number of nodes
    • DISCOVERED: linear with respect to the number of nodes
    • PARENT_MAP: linear with respect to the number of nodes
    • CURR: 1
    • PATH: linear with respect to the number of nodes
    • TRACE: 1
    • NEIGHS: linear with respect to the number of nodes
    • EDGE: 1
    • NODE: 1

Minimum Spanning Trees

 1function KRUSKAL(GRAPH)
 2    MST = GRAPH without the edges attribute(s)
 3    ALLSETS = an empty list which will contain the sets
 4    for NODE in GRAPH NODES
 5        SET = a set with element NODE
 6        add SET to ALLSETS
 7    EDGES = list of GRAPH's edges
 8    SORTEDEDGES = EDGES sorted by edge weight, smallest to largest
 9    for EDGE in SORTEDEDGES
10        SRC = source node of EDGE
11        TAR = target node of EDGE
12        SRCSET = the set from SETS in which SRC is contained
13        TARSET = the set form SETS in which TAR is contained
14        if SRCSET not equal TARSET
15            UNIONSET = SRCSET union TARSET
16            add UNIONSET to ALLSETS
17            remove SRCSET from ALLSETS
18            remove TARSET from ALLSETS
19            add EDGE to MST as undirected edge
20    return MST
  • Time: The time to initialize MST would be linear with respect to the number of nodes. Regardless of the graph implementation, inserting nodes is constant time and we would do it for the number of nodes in GRAPH. Lines 4-6 would take linear time with respect to the number of nodes. Then lines 9-19 would take linear time with respect to the number of edges as it would execute $e$ times and each operation can be done in constant time, except for searching through the sets and performing set operations, which require $log_2(n)$ time. Thus, Kruskal’s algorithm will take time on the order of $e \times log_2(n)$ in the worst case.

  • Space:The required space for Kruskal’s algorithm is dependent on the implementation of the MST. A matrix graph would require $n^2$ space and a list graph we would require $n+e$ space.

    • MST: matrix graph $n^2$ or list graph $n+e$
    • ALLSETS: linear with respect to the number of nodes
    • NODE: 1
    • GRAPH NODES: linear with respect to the number of nodes
    • SET: 1
    • EDGES: linear with respect to the number of edges
    • SORTEDEDGES: linear with respect to the number of edges
    • SRC: 1
    • TAR: 1
    • SRCSET: 1
    • TARSET: 1
    • UNIONSET: 1
 1function PRIM(GRAPH, START)
 2    MST = GRAPH without the edges attribute(s)
 3    VISITED = empty set
 4    add START to VISITED
 5    AVAILEDGES = list of edges where START is the source
 6    sort AVAILEDGES
 7    while VISITED is not all of the nodes
 8        SMLEDGE = smallest edge in AVAILEDGES
 9        SRC = source of SMLEDGE
10        TAR = target of SMLEDGE
11        if TAR not in VISITED
12            add SMLEDGE to MST as undirected edge
13            add TAR to VISITED
14            add the edges where TAR is the source to AVAILEDGES
15        remove SMLEDGE from AVAILEDGES
16        sort AVAILEDGES
17    return MST
  • Time without Priority Queue: The time to initialize MST would be linear with respect to the number of nodes. Regardless of the graph implementation, inserting nodes is constant time and we would do it for the number of nodes in GRAPH. With a matrix graph, setting up AVAILEDGES would take linear time with respect to the number of nodes. With a list graph, this would happen in constant time. Then, we need to get the smallest edge from the AVAILEDGES list, which would be a linear time operation based on the number of edges, and we must do that once for up to each edge in the graph. So, the worst case running time for Prim’s algorithm is $e^2$. (Our implementation is actually a bit slower than this since we sort the list of available edges each time, but that is technically not necessary - our implementation is closer to $e^2 \times log_2(e)$!)
  • Time with Priority Queue: We do get an improvement when we choose to implement this with a priority queue. For the most part, the performance is the same. Using a priority queue, heapify would optimize the sorting to happen in linear time with respect to the number of elements. In that case, we can reduce the total running time to on the order of $e \times log_2(n)$, which is the same as Kruskal’s algorithm.
  • Space: The required space for Prim’s algorithm is also dependent on the implementation of the MST. A matrix graph would require $n^2$ space and a list graph we would require $n+e$ space.
    • MST: matrix graph $n^2$ or list graph $n+e$
    • VISITED: linear with respect to the number of nodes
    • AVAILEDGES: linear with respect to the number of edges
    • SMLEDGE: 1
    • SRC: 1
    • TAR: 1

Shortest Path

 1DIJKSTRAS(GRAPH, SRC)
 2   SIZE = size of GRAPH
 3   DISTS = array with length equal to SIZE
 4   PREVIOUS = array with length equal to SIZE
 5   set all of the entries in PREVIOUS to none
 6   set all of the entries in DISTS to infinity 
 7   DISTS[SRC] = 0 
 8   PQ = min-priority queue
 9   loop IDX starting at 0 up to SIZE
10        insert (DISTS[IDX],IDX) into PQ
11    while PQ is not empty
12        MIN = REMOVE-MIN from PQ
13        for NODE in neighbors of MIN
14            WEIGHT = graph weight between MIN and NODE
15            CALC = DISTS[MIN] + WEIGHT
16            if CALC < DISTS[NODE]
17                DISTS[NODE] = CALC
18                PREVIOUS[NODE] = MIN
19                PQIDX = index of NODE in PQ
20                PQ decrease-key (PQIDX, CALC)
21    return DISTS and PREVIOUS
  • Time: The time required for Dijkstra’s algorithm will be $n^2$ in the worst case. We would expect lines 1 through 8 to take constant time. The for loop on line 9 would be bound by the number of nodes. We would expect the for loop on line 13 to be bound by the number of nodes. This for loop will execute each time PQ is not empty (line 11), which is bound by the number of nodes. Thus, the block of code starting at 11 will take $n^2$ time to run in the worst case. This means that if we double the number of nodes, then the running time will be quadrupled. The worst case for Dijkstra’s algorithm is characterized by being a very dense graph, meaning each node has a lot of neighbors. If the graph is sparse and our priority queue is efficient, we could expect this running time to be more along the lines of $(n + e) \times log_2(n)$, where $e$ is the number of edges.
  • Space: The required space for Dijkstra’s algorithm will be linear with respect to the number of nodes. We have 4 variables which have linear size with respect to the number of nodes. We say that this is linear because if we were to double the number of nodes, we would roughly double the space requirement.
    • SIZE: 1
    • DISTS: linear with respect to the number of nodes
    • PREVIOUS: linear with respect to the number of nodes
    • PQ: linear with respect to the number of nodes
    • IDX: 1
    • MIN: 1
    • NODE: 1
    • NEIGHBORS: linear with respect to the number of nodes
    • WEIGHT: 1
    • CALC: 1
    • PQIDX: 1