Introduction to Trees
Welcome!
This page is the main page for Introduction to Trees
This page is the main page for Introduction to Trees
For the next data structure in the course, we will cover trees, which are used to show hierarchical data. Trees can have many shapes and sizes and there is a wide variety of data that can be organized using them. Real world data that is appropriate for trees can include: family trees, management structures, file systems, biological classifications, anatomical structures and much more.
We can look at an example of a tree and the relationships they can show. Consider this file tree; it has folders and files in folders.
If we wanted to access the file elm.txt
, we would have to follow this file path: Foo/B/Q/E/elm.txt
. We can similarly store the file structure as a tree like we below. As before, if we wanted to get to the file elm.txt
we would navigate the tree in the order: Foo -> B -> Q -> E -> elm.txt
. As mentioned before, trees can be used on very diverse data sets; they are not limited to file trees!
In the last module we talked about strings which are a linear data structure. To be explicit, this means that the elements in a string form a line of characters. A tree, by contrast, is a hierarchal structure which is utilized best in multidimensional data. Going back to our file tree example, folders are not limited to just one file, there can be multiple files contained in a single folder- thus making it multidimensional.
Consider the string “abc123”; this is a linear piece of data where there is exactly one character after another. We can use trees to show linear data as well.
While trees can be used for linear data, it would be excessive and inefficient to implement them for single strings. In an upcoming module, we will see how we can use trees to represent any number of strings! For example, this tree below contains 7 words: ‘a’, ‘an’, ‘and’, ‘ant’, ‘any’, ‘all’, and ‘alp’.
In the next sections, we will discuss the properties of a tree data structure and how we would design one ourselves. Once we have a good understanding of trees and the properties of trees, we will implement our own.
To get ourselves comfortable in working with trees, we will outline some standard vocabulary. Throughout this section, we will use the following tree as a guiding example for visualizing the definitions.
Node
- the general term for a structure which contains an item, such as a character or even another data structure.Edge
- the connection between two nodes. In a tree, the edge will be pointing in a downward direction.
This tree has five edges and six nodes. There is no limit to the number of nodes in a tree. The only stipulation is that the tree is fully connected
. This means that there cannot be disjoint portions of the tree. We will look at examples in the next section.
A rule of thumb for discerning trees is this: if you imagine holding the tree up by the root and gravity took effect, then all edges must be pointing downward. If an edge is pointing upward, we will have a cycle
within our structure so it will not be a tree.
Root
- the topmost node of the tree
To be a tree, there must be exactly one root. Having multiple roots, will result in a cycle
or a tree that is not fully connected
. In short, a cycle
is a loop in the tree.
Parent
- a node with an edge that connects to another node further from the root. We can also define the root of a tree with respect to this definition; Root
: a node with no parent.Child
- a node with an edge that connects to another node closer to the root.
cycle
will occur. If there is a node without a parent node, then this is necessarily the root node. There is no limit to the number of child nodes a parent node can have, but to be a parent node, the node must have at least one child node.Leaf
- a node with no children.
This tree has four leaves. There is no limit to how many leaves can be in a tree.Degree
Degree of a node
- the number of children a node has. The degree of a leaf is zero.Degree of a tree
- the number of children the root of the tree has.
The degree of the nodes are shown as the values on the nodes in this example. The degree of the tree is equal to the degree of the root. Thus, the degree for this tree is 2.tree
is defined recursively. This means that each child of the root is the root of another tree and the children of those are roots to trees as well. Again, this is a recursive definition so it will continue to the leaves. The leaves are also trees with just a single node as the root.Trees can come in many shapes and sizes. There are, however some constraints to making a valid tree.
Any combination of the following represents a valid tree:
Below are some examples of invalid trees.
A cycle Again, cycles are essentially loops that occur in our tree. In this example, we see that our leaf has two parents. One way to determine whether your data structure has a cycle is if there is more than one way to get from the root to any node.
A cycle Here we can see another cycle. In this case, the node immediately after the root has two parents, which is a clue that a cycle exists. Another test
Two Roots Trees must have a single root. In this instance, it may look like we have a tree with two roots. Working through this, we also see that the node in the center has two parents.
Two Trees This example would be considered two trees, not a tree with two parts. In this figure, we have two fully connected components. Since they are not connected to each other, this is not a single tree.
Along with understanding how trees work, we want to also be able to implement a tree of our own. We will now outline key components of a tree class.
Recall that trees are defined recursively so we can build them from the leaves up where each leaf is a tree itself. Each tree will have three properties: the item it contains as an object, its parent node of type MyTree, and its children as a list of MyTrees. Upon instantiation of a new MyTree, we will set the item value and initialize the parent node to None and the children to an empty list of type MyTree.
Suppose that we wanted to construct the following tree. We would start by initializing each node as a tree with no parent, no children, and the item in this instance would be the characters. Then we build it up level by level by add the appropriate children to the respective parent.
Disclaimer: This implementation will not prevent all cycles. In the next module, we will introduce steps to prevent cycles and maintain true tree structures.
In this method, we will take a value as input and then check if that value is the item of a child of the current node. If the value is not the item for any of the node’s children then we should return none.
function FINDCHILD(VALUE)
FOR CHILD in CHILDREN
IF CHILD's ITEM is VALUE
return CHILD
return NONE
end function
Each of these will be rather straight forward; children, item, and parent are all attributes of our node, so we can have a getter function that returns the respective values. The slightly more involved task will be getting the degree. Recall that the degree of a node is equal to the number of children. Thus, we can simply count the number of children and return this number for the degree.
We will have two functions to check the node type: one to determine if the node is a leaf and one to determine if it is a root. The definition of a leaf is a node that has no children. Thus, to check if a node is a leaf, we can simply check if the number of children is equal to zero. Similarly, since the definition of a root is a node with no parent, we can check that the parent attribute of the node is None.
When we wish to add a child, we must fisrt make sure we are able to add the child.
MyTree
We will return true if the child was successfully added and false otherwise while raising the appropriate errors.
function ADDCHILD(CHILD)
IF CHILD has PARENT
throw exception
IF CHILD is CHILD of PARENT
return FALSE
ELSE
append CHILD to PARENT's children
set CHILD's parent to PARENT
return TRUE
end function
As an example, lets walk through the process of building the tree above:
a
with item ‘A’b
with item ‘B’c
with item ‘C’d
with item ‘D’e
with item ‘E’f
with item ‘F’g
with item ‘G’h
with item ‘H’i
with item ‘I’g
to tree d
h
to tree d
i
to tree d
e
to tree b
f
to tree b
Once we have completed that, visually, we would have the tree above and in code we would have:
a
with parent_node = None, item = ‘A’, children = {b
,c
,d
}b
with parent_node = a
, item = ‘B’, children = {e
,f
}c
with parent_node = a
, item = ‘C’, children = { }d
with parent_node = a
, item = ‘D’, children = {g
,h
,i
}e
with parent_node = b
, item = ‘E’, children = { }f
with parent_node = b
, item = ‘F’, children = { }g
with parent_node = d
, item = ‘G’, children = { }h
with parent_node = d
, item = ‘H’, children = { }i
with parent_node = d
, item = ‘I’, children = { }Note: When adding a child we must currently be at the node we want to be the parent. Much like when you want to add a file to a folder, you must specify exactly where you want it. If you don’t, this could result in a wayward child.
In the case of removing a child, we first need to check that the child we are attempting to remove is an instance of MyTree
. We will return true if we successfully remove the child and false otherwise.
function REMOVECHILD(CHILD)
IF CHILD in PARENT'S children
REMOVE CHILD from PARENT's children
SET CHILD's PARENT to NONE
return TRUE
ELSE
return FALSE
end function
As with adding a child, we need to ensure that we are in the ‘right place’ when attempting to remove a child. When removing a child, we are not ’erasing’ it, we are just cutting the tie from parent to child and child to parent. Consider removing d
from a
. Visually, we would have two disjoint trees, shown below:
In code, we would have:
a
with parent_node = None, item = ‘A’, children = {b
,c
}b
with parent_node = a
, item = ‘B’, children = {e
,f
}c
with parent_node = a
, item = ‘C’, children = { }d
with parent_node = None, item = ‘D’, children = {g
,h
,i
}e
with parent_node = b
, item = ‘E’, children = { }f
with parent_node = b
, item = ‘F’, children = { }g
with parent_node = d
, item = ‘G’, children = { }h
with parent_node = d
, item = ‘H’, children = { }i
with parent_node = d
, item = ‘I’, children = { }In this module we have introduce vocabulary related to trees and what makes a tree a tree. To recap, we have introduced the following:
Child
- a node with an edge that connects to another node closer to the root.Degree
Degree of a node
- the number of children a node has. The degree of a leaf is zero.Degree of a tree
- the number of children the root of the tree has.Edge
- connection between two nodes. In a tree, the edge will be pointing in a downward direction.Leaf
- a node with no children.Node
- the general term for a structure which contains an item, such as a character or even another data structure.Parent
- a node with an edge that connects to another node further from the root. We can also define the root of a tree with respect to this definition;Root
- the topmost node of the tree; a node with no parent.Now we will work on creating our own implementation of a tree. These definitions will serve as a resource to us when we need refreshing on meanings; feel free to refer back to them as needed.