Chapter 9

Trees

All about trees!

Subsections of Trees

Introduction

YouTube Video

Video Materials

For the next data structure in the course, we will cover trees, which are used to show hierarchical data. Trees can have many shapes and sizes and there is a wide variety of data that can be organized using them. Real world data that is appropriate for trees can include: family trees, management structures, file systems, biological classifications, anatomical structures and much more.


We can look at an example of a tree and the relationships they can show. Consider this file tree; it has folders and files in folders.

File Tree File Tree

If we wanted to access the file elm.txt, we would have to follow this file path: Foo/B/Q/E/elm.txt. We can similarly store the file structure as a tree like we below. As before, if we wanted to get to the file elm.txt we would navigate the tree in the order: Foo -> B -> Q -> E -> elm.txt. As mentioned before, trees can be used on very diverse data sets; they are not limited to file trees!

File Tree as Tree File Tree as Tree

What are trees?

In the last module we talked about strings which are a linear data structure. To be explicit, this means that the elements in a string form a line of characters. A tree, by contrast, is a hierarchal structure which is utilized best in multidimensional data. Going back to our file tree example, folders are not limited to just one file, there can be multiple files contained in a single folder- thus making it multidimensional.

Consider the string “abc123”; this is a linear piece of data where there is exactly one character after another. We can use trees to show linear data as well.

String as Tree String as Tree

While trees can be used for linear data, it would be excessive and inefficient to implement them for single strings. In an upcoming module, we will see how we can use trees to represent any number of strings! For example, this tree below contains 7 words: ‘a’, ‘an’, ‘and’, ‘ant’, ‘any’, ‘all’, and ‘alp’.

Small Words Small Words

In the next sections, we will discuss the properties of a tree data structure and how we would design one ourselves. Once we have a good understanding of trees and the properties of trees, we will implement our own.

Subsections of Introduction

General Terms

Warning

The video incorrectly states the degree of a tree is equal to the degree of the root. The correct definition is that the degree of a tree is the largest degree of any node in the tree.

YouTube Video

Video Materials

To get ourselves comfortable in working with trees, we will outline some standard vocabulary. Throughout this section, we will use the following tree as a guiding example for visualizing the definitions.
Blank Blank

Definitions

  • Node - the general term for a structure which contains an item, such as a character or even another data structure.
  • Edge - the connection between two nodes. In a tree, the edge will be pointing in a downward direction.

Nodes and Edges Nodes and Edges

This tree has five edges and six nodes. There is no limit to the number of nodes in a tree. The only stipulation is that the tree is fully connected. This means that there cannot be disjoint portions of the tree. We will look at examples in the next section.

Info

A rule of thumb for discerning trees is this: if you imagine holding the tree up by the root and gravity took effect, then all edges must be pointing downward. If an edge is pointing upward, we will have a cycle within our structure so it will not be a tree.


  • Root - the topmost node of the tree

Root Root

To be a tree, there must be exactly one root. Having multiple roots, will result in a cycle or a tree that is not fully connected. In short, a cycle is a loop in the tree.


  • Parent - a node with an edge that connects to another node further from the root. We can also define the root of a tree with respect to this definition; Root: a node with no parent.
  • Child - a node with an edge that connects to another node closer to the root.
    • In a general tree, the children of a node are an unordered set. There is not a fixed or defined order for generic trees. Parent and Child Parent and Child Parent and Child Parent and Child In a tree, child nodes must have exactly one parent node. If a child node has more than one parent, then a cycle will occur. If there is a node without a parent node, then this is necessarily the root node. There is no limit to the number of child nodes a parent node can have, but to be a parent node, the node must have at least one child node.

  • Leaf - a node with no children. Leaf Leaf This tree has four leaves. There is no limit to how many leaves can be in a tree.

  • Degree
    • Degree of a node - the number of children a node has. The degree of a leaf is zero.
    • Degree of a tree - the degree of an entire tree is the largest degree of any node found in the tree. Degree Degree

The degree of the nodes are shown as the values on the nodes in this example. The degree of the tree is the largest degree of any node in the tree. Thus, the degree for this tree is 3.


  • A tree is defined recursively. This means that each child of the root is the root of another tree and the children of those are roots to trees as well. Again, this is a recursive definition so it will continue to the leaves. The leaves are also trees with just a single node as the root.
    Subtrees Subtrees In our example tree, we have six trees. Each tree is outlined in a red dashed circle:
    • the main tree,
      • the tree off the left of the main root,
        • the tree off the left of this root,
        • the tree in the center of this root,
        • the tree off the right of this root, and
      • the tree off the right of the main root with a single node

Subsections of General Terms

What Makes a Tree a Tree

YouTube Video

Video Materials

Trees can come in many shapes and sizes. There are, however some constraints to making a valid tree.

  • A tree has a single root
  • A child has exactly one parent
  • A tree is fully connected (IE a single tree)
  • A tree has no cycles (IE no loops)

Valid Trees

Any combination of the following represents a valid tree:

  • A tree with a single node; just a root, Single Single

  • A tree where each node has a single child, or Linear Linear

  • A tree where nodes have many children. Linear Linear

Invalid Trees

Below are some examples of invalid trees.

  • A cycle Cycle Cycle Again, cycles are essentially loops that occur in our tree. In this example, we see that our leaf has two parents. One way to determine whether your data structure has a cycle is if there is more than one way to get from the root to any node.

  • A cycle Cycle Cycle Here we can see another cycle. In this case, the node immediately after the root has two parents, which is a clue that a cycle exists. Another test

  • Two Roots Two Roots Two Roots Trees must have a single root. In this instance, it may look like we have a tree with two roots. Working through this, we also see that the node in the center has two parents.

  • Two Trees Double Tree Double Tree This example would be considered two trees, not a tree with two parts. In this figure, we have two fully connected components. Since they are not connected to each other, this is not a single tree.

Subsections of What Makes a Tree a Tree

Tree Operations

YouTube Video

Video Materials

Along with understanding how trees work, we want to also be able to implement a tree of our own. We will now outline key components of a tree class.


MyTree

Recall that trees are defined recursively so we can build them from the leaves up where each leaf is a tree itself. Each tree will have three properties: the item it contains as an object, its parent node of type MyTree, and its children as a list of MyTrees. Upon instantiation of a new MyTree, we will set the item value and initialize the parent node to None and the children to an empty list of type MyTree.

UML UML

Suppose that we wanted to construct the following tree.

Tree Tree

We would start by initializing each node as a tree with no parent, no children, and the item in this instance would be the characters. Then we build it up level by level by add the appropriate children to the respective parent.

Info

Disclaimer: This implementation will not prevent all cycles. In the next module, we will introduce steps to prevent cycles and maintain true tree structures.


Finding a child

In this method, we will take a value as input and then check if that value is the item of a child of the current node. If the value is not the item for any of the node’s children then we should return none.

function FINDCHILD(VALUE)
    FOR CHILD in CHILDREN
        IF CHILD's ITEM is VALUE
            return  CHILD
    return NONE
end function

Getting children, item, parent, or degree

Each of these will be rather straight forward; children, item, and parent are all attributes of our node, so we can have a getter function that returns the respective values. The slightly more involved task will be getting the degree. Recall that the degree of a node is equal to the number of children. Thus, we can simply count the number of children and return this number for the degree.


Checking node type

We will have two functions to check the node type: one to determine if the node is a leaf and one to determine if it is a root. The definition of a leaf is a node that has no children. Thus, to check if a node is a leaf, we can simply check if the number of children is equal to zero. Similarly, since the definition of a root is a node with no parent, we can check that the parent attribute of the node is None.


Adding a child

YouTube Video

When we wish to add a child, we must fisrt make sure we are able to add the child.

  1. Check that the child is an instance of MyTree
  2. Make sure the child doesn’t already have a parent
  3. Make sure the child isn’t already a child of the parent

We will return true if the child was successfully added and false otherwise while raising the appropriate errors.

function ADDCHILD(CHILD)
    IF CHILD has PARENT
        throw exception
    IF CHILD is CHILD of PARENT
        return FALSE
    ELSE
        append CHILD to PARENT's children
        set CHILD's parent to PARENT
        return TRUE
end function

As an example, lets walk through the process of building the tree above:

  1. Instantiate MyTree a with item ‘A’
  2. Instantiate MyTree b with item ‘B’
  3. Instantiate MyTree c with item ‘C’
  4. Instantiate MyTree d with item ‘D’
  5. Instantiate MyTree e with item ‘E’
  6. Instantiate MyTree f with item ‘F’
  7. Instantiate MyTree g with item ‘G’
  8. Instantiate MyTree h with item ‘H’
  9. Instantiate MyTree i with item ‘I’
  10. Add child tree g to tree d
  11. Add child tree h to tree d
  12. Add child tree i to tree d
  13. Add child tree e to tree b
  14. Add child tree f to tree b

Once we have completed that, visually, we would have the tree above and in code we would have:

  • MyTree a with parent_node = None, item = ‘A’, children = {b,c,d}
  • MyTree b with parent_node = a, item = ‘B’, children = {e,f}
  • MyTree c with parent_node = a, item = ‘C’, children = { }
  • MyTree d with parent_node = a, item = ‘D’, children = {g,h,i}
  • MyTree e with parent_node = b, item = ‘E’, children = { }
  • MyTree f with parent_node = b, item = ‘F’, children = { }
  • MyTree g with parent_node = d, item = ‘G’, children = { }
  • MyTree h with parent_node = d, item = ‘H’, children = { }
  • MyTree i with parent_node = d, item = ‘I’, children = { }

Note: When adding a child we must currently be at the node we want to be the parent. Much like when you want to add a file to a folder, you must specify exactly where you want it. If you don’t, this could result in a wayward child.


Removing a child

YouTube Video

In the case of removing a child, we first need to check that the child we are attempting to remove is an instance of MyTree. We will return true if we successfully remove the child and false otherwise.

function REMOVECHILD(CHILD)
    IF CHILD in PARENT'S children
        REMOVE CHILD from PARENT's children
        SET CHILD's PARENT to NONE
        return TRUE
    ELSE
        return FALSE
end function

As with adding a child, we need to ensure that we are in the ‘right place’ when attempting to remove a child. When removing a child, we are not ’erasing’ it, we are just cutting the tie from parent to child and child to parent. Consider removing d from a. Visually, we would have two disjoint trees, shown below:

Tree 2 Tree 2

In code, we would have:

  • MyTree a with parent_node = None, item = ‘A’, children = {b,c}
  • MyTree b with parent_node = a, item = ‘B’, children = {e,f}
  • MyTree c with parent_node = a, item = ‘C’, children = { }
  • MyTree d with parent_node = None, item = ‘D’, children = {g,h,i}
  • MyTree e with parent_node = b, item = ‘E’, children = { }
  • MyTree f with parent_node = b, item = ‘F’, children = { }
  • MyTree g with parent_node = d, item = ‘G’, children = { }
  • MyTree h with parent_node = d, item = ‘H’, children = { }
  • MyTree i with parent_node = d, item = ‘I’, children = { }

Subsections of Tree Operations

Terms I

YouTube Video

Many of the terms used in trees relate to terms used in family trees. Having this in mind can help us to better understand some of the terminology involved with abstract trees. Here we have a sample family tree. Family Tree Family Tree

  • Ancestor - The ancestors of a node are those reached from child to parent relationships. We can think of this as our parents and our parent’s parents, and so on.
    • Let’s look at all of the ancestors of each of our nodes in the family tree.
      • Ava’s ancestors: Uzzi, Joe, Myra. This is because, Uzzi is the parent of Ava, Joe is the parent of Uzzi, and Myra is the parent of Joe. Try to work out the following and click the name to reveal the ancestors.
        Zia, Myra - Zia is the parent of Uma and Myra is the parent of Zia.
        None - Myra does not have a parent node.
        Myra - Myra is the parent of Raju.
        Uzzi, Joe, Myra - Uzzi is the parent of Bev, Joe is the parent of Uzzi, and Myra is the parent of Joe.
  • Descendant - The descendants of a node are those reached from parent to child relationships. We can think of this as our children and our children’s children and so on.
    • Let’s look at all of the descendants of each of our nodes in the family tree.
      • Ava’s descendants: None. Ava has no child nodes and thus, no descendants. Try to work out the following and click the name to reveal the descendants.
        Ang - Ang is the child of Uma
        Raju, Joe, Zia, Uzzi, Bert, Uma, Bev, Ava, Ang, Isla, Eoin - All of the nodes in a tree will be descendants of the root. To work it out: Raju, Joe and Zia are the children of Myra, Uma is the child of Zia, Ang is the child of Uma, and we can work the rest out for Joe’s children.
        None - Raju has no child nodes.
        Isla, Eoin - Isla is the child of Bev and Eoin is the child of Isla.
  • Siblings - Nodes which share the same parent
    • We can think about the siblings of all of our nodes in the family tree.
      • Ava’s siblings: Bev - Uzzi is the parent node of Ava; Uzzi has two child nodes, Ava and Bev. Try to work out the following and click the name to reveal the siblings.
        None - Zia is the parent node of Uma; Zia has only one child node, Uma.
        None - Myra is the root and thus does not have a parent node resulting in no siblings.
        Joe, Zia - Myra is the parent node of Raju; Myra has three child nodes, Joe, Zia, and Raju
        Ava - Uzzi is the parent node of Bev; Uzzi has two child nodes, Bev and Ava.

Terms II

YouTube Video

We can describe the sizes of trees and position of nodes using different terminology, like level, depth, and height.

Family Tree Family Tree

  • Level - The level of a node characterizes the distance between the node and the root. The root of the tree is considered level 1. As you move away from the tree, the level increases by one.
    • For our family tree example, what nodes are in the following levels? Think about the answer and then click corresponding arrow.
      Myra - Level 1 is always the root
      Raju, Joe, Zia - These are the nodes which are 1 edge away from the root.
      Uzzi, Bert, Uma - These are the nodes which are 2 edges away from the root.
      Bev, Ava, Ang - These are the nodes which are 3 edges away from the root.
      Isla - This is the only node which is 4 edges away from the root.
      Eoin - This is the only node which is 5 edges away from the root.
  • Depth - The depth of a node is its distance to the root. Thus, the root has depth zero. Level and depth are related in that: level = 1 + depth.
    • For our family tree example, what nodes have the following depths?
      Myra - The root will always be at depth 0.
      Raju, Joe, Zia - These are the nodes which are 1 edge away from the root.
      Uzzi, Bert, Uma - These are the nodes which are 2 edge away from the root.
      Bev, Ava, Ang - These are the nodes which are 3 edge away from the root.
      Isla - This is the only node which is 4 edges away from the root.
      Eoin - This is the only node which is 5 edges away from the root.
  • Height of a Node - The height of a node is the longest path to a leaf descendant. The height of a leaf is zero.
    • For our family tree example, what nodes have the following heights?
      Raju, Eoin, Ava, Bert, Ang - The leaves always have height 0.
      Isla, Uma - Isla -> Eoin and Uma -> Ang
      Bev, Zia - Bev -> Isla -> Eoin and Zia -> Uma -> Ang
      Uzzi - Uzzi -> Bev -> Isla -> Eoin
      Joe - Joe -> Uzzi -> Bev -> Isla -> Eoin
      Myra - Myra -> Joe -> Uzzi -> Bev -> Isla -> Eoin
  • Height of a Tree - The height of a tree is equal to the height of the root.
    • Our family tree would have height 5

Terms III

YouTube Video

When working with multidimensional data structures, we also need to consider how they would be stored in a linear manner. Remember, pieces of data in computers are linear sequences of binary digits. As a result, we need a standard way of storing trees as a linear structure.

Traversal Tree Traversal Tree

  • Path - a path is a sequence of nodes and edges, which connect a node with its descendant. We can look at some paths in the tree above:

    • From Q to O: QRO
      QWY
      RP
  • Traversal is a general term we use to describe going through a tree. The following traversals are defined recursively.

Preorder Traversal

  1. Access the root, record its value.
  2. Run the preorder traversal each of the children
YouTube Video
  • The Pre refers to the root, meaning the root goes before the children.
  • Remember: Root Children
  • For the above tree, the preorder traversal could result in: QWYUERIOPTA Preorder Traversal Preorder Traversal

Postorder Traversal

  1. Run the postorder traversal on each of the children
  2. Access the root, record its value
YouTube Video
  • The Post refers to the root, meaning the root goes after the children.
  • Remember: Children Root
  • For the above tree, the postorder traversal could result in: YUWEIOPRATQ Postorder Traversal Postorder Traversal

When we talk about traversals for general trees we have used the phrase ’the traversal could result in’. We would like to expand on why ‘could’ is used here. Each of these general trees are the same but their traversals could be different. The key concept in this is that for a general tree, the children are an unordered set of nodes; they do not have a defined or fixed order. The relationships that are fixed are the parent/child relationships. Traversal Tree Traversal Tree Traversal Tree1 Traversal Tree1 Traversal Tree2 Traversal Tree2

Tree Preorder Postorder
Tree 1 QWYUERIOPTA YUWEIOPRATQ
Tree 2 QETARIOPWUY EATIOPRUYWQ
Tree 3 QROPITAEWUY OPIRATEUYWQ

Binary Tree

YouTube Video

Binary Tree Binary Tree

A binary tree is a type of tree with some special conditions. First, it must follow the guidelines of being a tree:

  • There must be single root,
  • each child node must have a single parent node,
  • it must be fully connected (no disjoint parts), and
  • there can be no cycles (no loops).

The special conditions that we impose on binary trees are the following:

  • Each node has at most 2 children (nodes can have 0, 1, or 2 children), and
  • unlike general trees, the children in a binary tree are not an unordered set. The children must be ordered such that:
    • all of the descendants in the left tree are less than the parent’s value, and
    • all of the descendants in the right tree are greater than the parent’s value

To reinforce these concepts, we will look at examples of binary trees and examples that are not binary trees.

Binary Tree Examples

YouTube Video

Valid Binary Trees

Single Node Single Node

This is a valid binary tree. We have a single node, the root, with no children. As with general trees, binary trees are built recursively. Thus, each node and its child(ren) are trees themselves.

Unbalanced Unbalanced

This is also a valid binary tree. All of the left children are less than their parent. The node with item ‘10’ is also in the correct position as it is less than 12, 13, and 14 but greater than 9.

Balanced Balanced

We have the same nodes but our root is now 12 whereas before it was 14. This is also a valid binary tree.

Alphabet Binary Tree Alphabet Binary Tree

Here we have an example of a binary tree with alphabetical items. As long as we have items which have a predefined order, we can organize them using a binary tree.

Invalid Binary Trees

Alphabet Non-Binary Tree Alphabet Non-Binary Tree

We may be inclined to say that this is a binary tree: each node has 0, 1, or 2 children and amongst children and parent nodes, the left child is smaller than the parent and the right child is greater than the parent. However, in binary trees, all of the nodes in the left tree must be smaller than the root and all of the nodes in the right tree must be larger than the root. In this tree, D is out of place. Node D is less than node T but it is also less than node Q. Thus, node D must be on the right of node Q.

Too Many Children Too Many Children

In this case, we do not have a binary tree. This does fit all of the criteria for being a tree but not the criteria for a binary tree. Nodes in binary trees can have at most 2 children. Node 30 has three children.

Binary Tree Traversals

YouTube Video

In the first module we discussed two types of traversals: preorder and postorder. Within that discussion, we noted that for general trees, the preorder and postorder traversal may not be unique. This was due to the fact that children nodes are an unordered set.

Info

We are now working with binary trees which have a defined child order. As a result, the preorder and postorder traversals will be unique! These means that for a binary tree when we do a preorder traversal there is exactly one string that is possible. The same applies for postorder traversals as well.

Recall that these were defined as such:

  • Preorder Traversal (Remember: Root Children):
    1. Access the root
    2. Run the preorder traversal on the children
  • Postorder Traversal (Remember: Children Root):
    1. Run the postorder traversal on the children
    2. Access the root.

Now for binary trees, we can modify their definitions to be more explicit:

  • Preorder Traversal (Remember: Root Left Right):
    1. Access the root
    2. Run the preorder traversal on the left child
    3. Run the preorder traversal on the right child
  • Postorder Traversal (Remember: Left Right Root):
    1. Run the postorder traversal on the left child
    2. Run the postorder traversal on the right child
    3. Access the root.

Let’s practice traversals on the following binary tree.

Traversal Tree Traversal Tree

Preorder Traversal

Preorder Preorder

Postorder Traversal

Postorder Postorder

In-Order Traversal

YouTube Video

Since we have fixed order on the children, we can introduce another type of traversal: in-order traversal.

  • In-order Traversal:

    1. Run the in-order traversal on the left child
    2. Access the root, write its value
    3. Run the in-order traversal on the right child
    • Remember: Left Root Right

In-order In-order

Balance

Unbalanced Unbalanced

While this is a valid binary tree, it is not balanced. Let’s look at the following tree.

Balanced Balanced

We have the same nodes but our root is now 12 whereas before it was 14. This is a valid binary tree. We call this a balanced binary tree. A balanced binary tree looks visually even amongst the left and right trees in terms of number of nodes.

Note: Balancing is not necessary for a valid binary tree. It is, however, important in terms of time efficiency to have a balanced tree. For example, the number of actions when inserting an element is about the same as the number of levels in the tree. If we tried to add the value 11 into the unbalanced tree, we would traverse 5 nodes. If we tried to add the value 11 in to the balanced tree, we would traverse just 3 nodes.

We believe that balancing binary trees is out of the scope of this course. If you are interested in how we might balance a tree, feel free to check out these videos by Dr. Joshua Weese.

YouTube Video YouTube Video

Summary

In this module we have introduce vocabulary related to trees and what makes a tree a tree. To recap, we have introduced the following:

  • Child - a node with an edge that connects to another node closer to the root.
  • Degree
    • Degree of a node - the number of children a node has. The degree of a leaf is zero.
    • Degree of a tree - the degree of an entire tree is the largest degree of any node found in the tree.
  • Edge - connection between two nodes. In a tree, the edge will be pointing in a downward direction.
  • Leaf - a node with no children.
  • Node - the general term for a structure which contains an item, such as a character or even another data structure.
  • Parent - a node with an edge that connects to another node further from the root. We can also define the root of a tree with respect to this definition;
  • Root - the topmost node of the tree; a node with no parent.

Now we will work on creating our own implementation of a tree. These definitions will serve as a resource to us when we need refreshing on meanings; feel free to refer back to them as needed.

We discussed more terminology related to trees as well as tree traversals. To recap the new vocabulary:

  • Ancestor - The ancestors of a node are those reached from child to parent relationships. We can think of this as our parents and the parents of our parents, and so on.
  • Depth - The depth of a node is its distance to the root. Thus, the root has depth zero. Level and depth are related in that: level = 1 + depth.
  • Descendant - The descendants of a node are those reached from parent to child relationships. We can think of this as our children and our children’s children and so on.
  • Height of a Node - The height of a node is the longest path to a leaf descendant. The height of a leaf is zero.
  • Height of a Tree - The height of a tree is equal to the height of the root.
  • Level - The level of a node characterizes the distance the node is from the root. The root of the tree is considered level 1. As you move away from the tree, the level increases by one.
  • Path - a sequence of nodes and edges which connect a node with its descendant.
  • Siblings - Nodes which share the same parent
  • Traversal is a general term we use to describe going through a tree. The following traversals are defined recursively.
    • Preorder Traversal (Remember: Root Children or Root Left Right):
      1. Access the root
      2. Run the preorder traversal on the children
    • Postorder Traversal (Remember: Children Root or Left Right Root):
      1. Run the postorder traversal on the children
      2. Access the root.
    • Inorder Traversal on Binary Trees (Remember: Left Root Right)