Chapter 10

Introduction to Trees

Welcome!

This page is the main page for Introduction to Trees

Introduction

For the next data structure in the course, we will cover trees, which are used to show hierarchical data. Trees can have many shapes and sizes and there is a wide variety of data that can be organized using them. Real world data that is appropriate for trees can include: family trees, management structures, file systems, biological classifications, anatomical structures and much more.

We can look at an example of a tree and the relationships they can show. Consider this file tree; it has folders and files in folders.

If we wanted to access the file elm.txt, we would have to follow this file path: Foo/B/Q/E/elm.txt. We can similarly store the file structure as a tree like we below. As before, if we wanted to get to the file elm.txt we would navigate the tree in the order: Foo -> B -> Q -> E -> elm.txt. As mentioned before, trees can be used on very diverse data sets; they are not limited to file trees!

What are trees?

In the last module we talked about strings which are a linear data structure. To be explicit, this means that the elements in a string form a line of characters. A tree, by contrast, is a hierarchal structure which is utilized best in multidimensional data. Going back to our file tree example, folders are not limited to just one file, there can be multiple files contained in a single folder- thus making it multidimensional.

Consider the string “abc123”; this is a linear piece of data where there is exactly one character after another. We can use trees to show linear data as well.

While trees can be used for linear data, it would be excessive and inefficient to implement them for single strings. In an upcoming module, we will see how we can use trees to represent any number of strings! For example, this tree below contains 7 words: ‘a’, ‘an’, ‘and’, ‘ant’, ‘any’, ‘all’, and ‘alp’.

In the next sections, we will discuss the properties of a tree data structure and how we would design one ourselves. Once we have a good understanding of trees and the properties of trees, we will implement our own.

General Terms

YouTube Video

Video Materials

To get ourselves comfortable in working with trees, we will outline some standard vocabulary. Throughout this section, we will use the following tree as a guiding example for visualizing the definitions.

Definitions

Node - the general term for a structure which contains an item, such as a character or even another data structure.
Edge - the connection between two nodes. In a tree, the edge will be pointing in a downward direction.

This tree has five edges and six nodes. There is no limit to the number of nodes in a tree. The only stipulation is that the tree is fully connected. This means that there cannot be disjoint portions of the tree. We will look at examples in the next section.

Info

A rule of thumb for discerning trees is this: if you imagine holding the tree up by the root and gravity took effect, then all edges must be pointing downward. If an edge is pointing upward, we will have a cycle within our structure so it will not be a tree.

Root - the topmost node of the tree

To be a tree, there must be exactly one root. Having multiple roots, will result in a cycle or a tree that is not fully connected. In short, a cycle is a loop in the tree.

Parent - a node with an edge that connects to another node further from the root. We can also define the root of a tree with respect to this definition; Root: a node with no parent.
Child - a node with an edge that connects to another node closer to the root.
- In a general tree, the children of a node are an unordered set. There is not a fixed or defined order for generic trees. In a tree, child nodes must have exactly one parent node. If a child node has more than one parent, then a cycle will occur. If there is a node without a parent node, then this is necessarily the root node. There is no limit to the number of child nodes a parent node can have, but to be a parent node, the node must have at least one child node.

Leaf - a node with no children. This tree has four leaves. There is no limit to how many leaves can be in a tree.

Degree
- Degree of a node - the number of children a node has. The degree of a leaf is zero.
- Degree of a tree - the number of children the root of the tree has. The degree of the nodes are shown as the values on the nodes in this example. The degree of the tree is equal to the degree of the root. Thus, the degree for this tree is 2.

A tree is defined recursively. This means that each child of the root is the root of another tree and the children of those are roots to trees as well. Again, this is a recursive definition so it will continue to the leaves. The leaves are also trees with just a single node as the root.
In our example tree, we have six trees. Each tree is outlined in a red dashed circle:
- the main tree,
  - the tree off the left of the main root,
    - the tree off the left of this root,
    - the tree in the center of this root,
    - the tree off the right of this root, and
  - the tree off the right of the main root with a single node

What Makes a Tree a Tree

YouTube Video

Video Materials

Trees can come in many shapes and sizes. There are, however some constraints to making a valid tree.

A tree has a single root
A child has exactly one parent
A tree is fully connected (IE a single tree)
A tree has no cycles (IE no loops)

Valid Trees

Any combination of the following represents a valid tree:

A tree with a single node; just a root,
A tree where each node has a single child, or
A tree where nodes have many children.

Invalid Trees

Below are some examples of invalid trees.

A cycle Again, cycles are essentially loops that occur in our tree. In this example, we see that our leaf has two parents. One way to determine whether your data structure has a cycle is if there is more than one way to get from the root to any node.
A cycle Here we can see another cycle. In this case, the node immediately after the root has two parents, which is a clue that a cycle exists. Another test
Two Roots Trees must have a single root. In this instance, it may look like we have a tree with two roots. Working through this, we also see that the node in the center has two parents.
Two Trees This example would be considered two trees, not a tree with two parts. In this figure, we have two fully connected components. Since they are not connected to each other, this is not a single tree.

MyTree I

YouTube Video

Video Materials

Along with understanding how trees work, we want to also be able to implement a tree of our own. We will now outline key components of a tree class.

MyTree

Recall that trees are defined recursively so we can build them from the leaves up where each leaf is a tree itself. Each tree will have three properties: the item it contains as an object, its parent node of type MyTree, and its children as a list of MyTrees. Upon instantiation of a new MyTree, we will set the item value and initialize the parent node to None and the children to an empty list of type MyTree.

Suppose that we wanted to construct the following tree. We would start by initializing each node as a tree with no parent, no children, and the item in this instance would be the characters. Then we build it up level by level by add the appropriate children to the respective parent.

Info

Disclaimer: This implementation will not prevent all cycles. In the next module, we will introduce steps to prevent cycles and maintain true tree structures.

Finding a child

In this method, we will take a value as input and then check if that value is the item of a child of the current node. If the value is not the item for any of the node’s children then we should return none.

function FINDCHILD(VALUE)
    FOR CHILD in CHILDREN
        IF CHILD's ITEM is VALUE
            return  CHILD
    return NONE
end function

Getting children, item, parent, or degree

Each of these will be rather straight forward; children, item, and parent are all attributes of our node, so we can have a getter function that returns the respective values. The slightly more involved task will be getting the degree. Recall that the degree of a node is equal to the number of children. Thus, we can simply count the number of children and return this number for the degree.

Checking node type

We will have two functions to check the node type: one to determine if the node is a leaf and one to determine if it is a root. The definition of a leaf is a node that has no children. Thus, to check if a node is a leaf, we can simply check if the number of children is equal to zero. Similarly, since the definition of a root is a node with no parent, we can check that the parent attribute of the node is None.

Adding a child

YouTube Video

When we wish to add a child, we must fisrt make sure we are able to add the child.

Check that the child is an instance of MyTree
Make sure the child doesn’t already have a parent
Make sure the child isn’t already a child of the parent

We will return true if the child was successfully added and false otherwise while raising the appropriate errors.

function ADDCHILD(CHILD)
    IF CHILD has PARENT
        throw exception
    IF CHILD is CHILD of PARENT
        return FALSE
    ELSE
        append CHILD to PARENT's children
        set CHILD's parent to PARENT
        return TRUE
end function

As an example, lets walk through the process of building the tree above:

Instantiate MyTree a with item ‘A’
Instantiate MyTree b with item ‘B’
Instantiate MyTree c with item ‘C’
Instantiate MyTree d with item ‘D’
Instantiate MyTree e with item ‘E’
Instantiate MyTree f with item ‘F’
Instantiate MyTree g with item ‘G’
Instantiate MyTree h with item ‘H’
Instantiate MyTree i with item ‘I’
Add child tree g to tree d
Add child tree h to tree d
Add child tree i to tree d
Add child tree e to tree b
Add child tree f to tree b

Once we have completed that, visually, we would have the tree above and in code we would have:

MyTree a with parent_node = None, item = ‘A’, children = {b,c,d}
MyTree b with parent_node = a, item = ‘B’, children = {e,f}
MyTree c with parent_node = a, item = ‘C’, children = { }
MyTree d with parent_node = a, item = ‘D’, children = {g,h,i}
MyTree e with parent_node = b, item = ‘E’, children = { }
MyTree f with parent_node = b, item = ‘F’, children = { }
MyTree g with parent_node = d, item = ‘G’, children = { }
MyTree h with parent_node = d, item = ‘H’, children = { }
MyTree i with parent_node = d, item = ‘I’, children = { }

Note: When adding a child we must currently be at the node we want to be the parent. Much like when you want to add a file to a folder, you must specify exactly where you want it. If you don’t, this could result in a wayward child.

Removing a child

YouTube Video

In the case of removing a child, we first need to check that the child we are attempting to remove is an instance of MyTree. We will return true if we successfully remove the child and false otherwise.

function REMOVECHILD(CHILD)
    IF CHILD in PARENT'S children
        REMOVE CHILD from PARENT's children
        SET CHILD's PARENT to NONE
        return TRUE
    ELSE
        return FALSE
end function

As with adding a child, we need to ensure that we are in the ‘right place’ when attempting to remove a child. When removing a child, we are not ’erasing’ it, we are just cutting the tie from parent to child and child to parent. Consider removing d from a. Visually, we would have two disjoint trees, shown below:

In code, we would have:

MyTree a with parent_node = None, item = ‘A’, children = {b,c}
MyTree b with parent_node = a, item = ‘B’, children = {e,f}
MyTree c with parent_node = a, item = ‘C’, children = { }
MyTree d with parent_node = None, item = ‘D’, children = {g,h,i}
MyTree e with parent_node = b, item = ‘E’, children = { }
MyTree f with parent_node = b, item = ‘F’, children = { }
MyTree g with parent_node = d, item = ‘G’, children = { }
MyTree h with parent_node = d, item = ‘H’, children = { }
MyTree i with parent_node = d, item = ‘I’, children = { }

Summary

In this module we have introduce vocabulary related to trees and what makes a tree a tree. To recap, we have introduced the following:

Child - a node with an edge that connects to another node closer to the root.
Degree
- Degree of a node - the number of children a node has. The degree of a leaf is zero.
- Degree of a tree - the number of children the root of the tree has.
Edge - connection between two nodes. In a tree, the edge will be pointing in a downward direction.
Leaf - a node with no children.
Node - the general term for a structure which contains an item, such as a character or even another data structure.
Parent - a node with an edge that connects to another node further from the root. We can also define the root of a tree with respect to this definition;
Root - the topmost node of the tree; a node with no parent.

Now we will work on creating our own implementation of a tree. These definitions will serve as a resource to us when we need refreshing on meanings; feel free to refer back to them as needed.

Introduction to Trees

Welcome!

Subsections of Introduction to Trees

Introduction

What are trees?

Subsections of Introduction

General Terms

Definitions

Subsections of General Terms

What Makes a Tree a Tree

Valid Trees

Invalid Trees

Subsections of What Makes a Tree a Tree

MyTree I

MyTree

Finding a child

Getting children, item, parent, or degree

Checking node type

Adding a child

Removing a child

Subsections of MyTree I

Summary