Sets
Welcome!
This page is the main page for Sets
This page is the main page for Sets
A set is a collection of elements that are usually related to each other. They can be sets of numbers, letters, people, cars, or even sets themselves! Thus, the elements stored in a stack, queue or list can all be considered as sets. When we define the elements in a set, we typically enclose them in curly brackets as follows.
$$ \text{A} = \{ \text{Emily}, \text{Don}, \text{Mohammed}, \text{Huichen} \} $$Here, $\text{A}$ is the name of the set and Emily, Don, Mohammed, and Huichen are members of set $\text{A}$.
Sets do have a few key properties:
There are also a couple of important sets that we need to know.
{}
is the set that contains, well nothing! There is exactly one empty set across all “universes of interest”.If we want to visualize sets and the relationships between them, we can view them using a Venn Diagram as shown below.
The drawing is equivalent to saying the following.
$$ \text{A} = \{ 17, 9, 32, 15, 3, 5 \} \\ \text{B} = \{ 32, 1, 22, 15, 4, 6, 14 \} $$If we overlap the two sets so that the common elements (in this case, 15 and 32) are in the overlapping section, we see that we do not get two copies of 15 and 32, but that we have just one copy of each. Thus, we have preserved the property that sets do not have duplicates. This overlapping area is called the intersection of the two sets.
In the following operations on sets, we will use variants of the diagram below, which shows set $\text{A}$ and set $\text{B}$ overlapping. The yellow part of set $\text{A}$ denotes elements of set $\text{A}$ that are not in set $\text{B}$, while the green part of $\text{B}$ denotes elements of set $\text{B}$ that are not in set $\text{A}$. The orange part of the diagram denotes the intersection of $\text{A}$ and $\text{B}$, which includes elements of the sets that exist in both set $\text{A}$ and set $\text{B}$.
Now that we can create and show sets of various types of elements, there are a few interesting operations that we can perform on sets. Given two sets, $\text{A}$ and $\text{B}$, we can perform the following operations:
We will describe each of these operations in more detail below.
The union of two sets, $\text{A}$ and $\text{B}$, is the set of elements that belongs to either $\text{A}$ or to $\text{B}$. The following figure shows the Venn Diagram for the union of set $\text{A}$ and set $\text{B}$. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \cup \text{B}$.
For our example sets $\text{A}$ and $\text{B}$ above, the union would be equivalent to the following.
$$ \text{A} \cup \text{B} = \{ 17, 9, 32, 15, 3, 5, 1, 22, 4, 6, 14 \} $$The intersection of two sets, $\text{A}$ and $\text{B}$, includes all the elements that belong to both $\text{A}$ and $\text{B}$. The following figure shows the Venn Diagram for the intersection of set $\text{A}$ and set B. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \cap \text{B}$.
For our example sets $\text{A}$ and $\text{B}$ above, the union would be equivalent to the following.
$$ \text{A} \cap \text{B} = \{ 15, 32 \} $$The difference of two sets, $\text{A}$ and $\text{B}$, includes all the elements in $\text{A}$ that are not in $\text{B}$. The following figure shows the Venn Diagram for the difference between set $\text{A}$ and set $\text{B}$. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \setminus \text{B}$.
For our example sets $\text{A}$ and $\text{B}$ above, $\text{A} \setminus \text{B}$ would be equivalent to the following.
$$ \text{A} \setminus \text{B} = \{ 17, 9, 3, 5 \} $$It is important to realize that the set difference is not symmetrical, which means that the operation $\text{A} \setminus \text{B}$ does not result in the same set as $\text{B} \setminus \text{A}$. $\text{B} \setminus \text{A}$ is the set composed of the elements that belong to $\text{B}$ but not to $\text{A}$. The following figure shows the Ven Diagrams for $\text{B} \setminus \text{A}$.
For our example sets $\text{A}$ and $\text{B}$ above, $\text{B} \setminus \text{A}$ would be equivalent to the following.
$$ \text{B} - \text{A} = \{ 1, 22, 4, 6, 14 \} $$We say two sets are disjoint it they have no common elements. The isDisjoint
operation returns true
if two sets are disjoint and false
if they are not. For example, sets $\text{A}$ and $\text{B}$ below are disjoint since they do not share any of the same numbers. A simple way of computing the value of isDisjoint
is just to check to see if the intersection of the two sets is the empty set. If it is, then the two sets are disjoint.
We say a set $\text{B}$ is a subset of set $\text{A}$ if all of the elements of set $\text{B}$ are also elements of set A. The isSubset
operation is an operation that determines if the subset relationship is true
or false
for any two sets. In the example below, set $\text{B}$ is a subset of set $\text{A}$ since all the elements of $\text{B}$ are also in the set A. Notice also that if set $\text{A}$ and $\text{B}$ are equal, then $\text{A}$ is a subset of $\text{B}$ and $\text{B}$ is a subset of $\text{A}$ as well!
Like the isDisjoint
operation above, we can compute the isSubset
operation using the intersection of sets $\text{A}$ and $\text{B}$. If set $\text{B}$ is equal to the intersection of sets $\text{A}$ and $\text{B}$, then $\text{B}$ is a subset of $\text{A}$.
We say a set $\text{A}$ is the superset of set $\text{B}$ if all the elements of $\text{B}$ are also elements of $\text{A}$. You may be thinking to yourself, “this sounds a lot like the subset operation”. And you would be correct. If $\text{B}$ is the subset of $\text{A}$, then $\text{A}$ is a superset of $\text{B}$. It is saying the same thing two different ways.
Of course, if we can compute the isSubset
operation, we can use that directly to compute the isSuperset
operation. Or, we can use the intersection operation as well. If set $\text{B}$ is equal to the intersection of sets $\text{A}$ and $\text{B}$, then $\text{A}$ is a superset of $\text{B}$.
The product, or Cartesian product, of two sets results in a set that includes all pairs of elements such that the first element of the pair is from the first set and the second is from the second set. Let’s give a quick example to make this easier to understand. We denote a pair of elements as $(a,b)$, where $a$ is the first element in the pair and $b$ is the second element in the pair.
We start with two sets, $\text{A}$ and $\text{B}$ as defined below.
$$ \text{A} = \{ 1, 2 \} \\ \text{B} = \{ a, b, c \} $$Then the Cartesian product of $\text{A}$ and $\text{B}$ is
$$ \text{A} \times \text{B} = \{ (1,a), (1,b), (1,c), (2,a), (2,b), (2,c) \} $$And the Cartesian product of $\text{B}$ and $\text{A}$ is
$$ \text{B} \times \text{A} = \{ (a,1), (a, 2), (b,1), (b,2), (c,1), (c,2) \} $$Another way to think about the product is to consider a matrix where the columns are the elements of the first set and the rows are elements of the second set. Then, each cell in the matrix contains the ordered pair where the first element is the column and the second element is the row. The matrix for our example would look like the following.
The powerset of set $\text{A}$ is the set of all possible subsets of $\text{A}$ and is denoted as $\wp(\text{A})$. If we start with the set $\text{A} = \{ 1, 2, 3 \}$, then its powerset is
$$ \{ \emptyset, \{1 \}, \{2 \}, \{3 \}, \{1, 2 \}, \{1, 3 \}, \{2, 3 \}, \{1, 2, 3 \} \} $$Notice that the empty set, $\emptyset$, is a subset of the powerset of $\text{A}$ as well.
Different data structures can be used to represents sets and efficiently support their operations. Among the data structures considered so far, it is possible to use both lists and arrays to represent sets. Even though sets by definition are unordered, we could store the elements in order, which has its pros and cons. Storing elements in order will speed up the contains
operation since we won’t have to check each element in the set. However, it will also slow down insertions since we have to insert the elements in order.
Further, storing elements in order will improve the efficiency of the union and intersection operations since we could compute both the union and the intersection in one pass (similar to merge sort). Likewise, we can speed up our searching for elements in the superset and subset operations by taking advantage of the ordering of elements.
Whether or not keeping the elements in order makes sense for your application depends on how many times you will actually insert data into the set, versus the number of times you will be using the other set operations.
We can use arrays to store sets, much like we did for array-based stacks or queues. In fact, if we implemented an array-based list, we could implement sets directly on top of that structure. Unfortunately, if we need to keep the set elements ordered for efficiency, we would compound the inefficiency of inserting or removing items from an array since we would have to move other elements in the array each time to keep it sorted. Even if we’re not interested in sorting our set, we would still face the fixed-capacity issue and would have to manage the size of the set container using the doubleCapacity
and halveCapacity
operations. On the positive side, if we kept the set elements sorted in the array, we could use binary search to implement a very efficient contains
operation.
We can also use linked lists to implement sets. Lists provide all of the operations we need to implement the basic set operations such as insert, remove, and list iterator. The only property we need to implement is that sets do not include duplicate elements. This is easily taken care of by checking to see if the set contains an element before we insert it. Since linked lists are very flexible and don’t require us to worry about capacity issues, we will use them as the basis for our set operations in this module.
In this section, we will walk through the pseudocode for some basic set operations. We will build our set class using the doubly linked list class. In this way, we can build from the functionality that is already developed. So, in the set class, we will declare mySet
as a doubly linked list and use that throughout our operations.
mySet = new doubleLinkedList()
The first operation we will look at is the contains
operation, which checks to see if a given object already exists in our set. This operation is generally very useful, which we will see in the operations below.
The pseudocode for the contains
operation is shown below. We will use the list iterator built into our doubly linked list class to help us walk through the list searching for the object o
. So, in line 1, we call the reset
function and then the getNext
operation in line 2 to get the first item in the mySet
list.
function contains(object o) returns Boolean
mySet.reset() (1)
obj = mySet.getNext() (2)
while obj != null (3)
if (obj == o) (4)
return true (5)
end if
obj = mySet.getNext() (6)
end while
return false (7)
end function
We then enter a while
loop in line 3, where we will stay until either we find the object in our list, or we reach the end of the list. Once inside the loop, we check to see if the object from the list is equal to the object we are searching for in line 4. If it is, we return true
in line 5 and we are done. If they are not equal, we get the next object from mySet
and return to the top of the loop. If we reach the end of the list without finding our object, we will exit the loop and return false
in line 7.
The add
operation is relatively straightforward. The only thing we need to check is whether or not the object is already in mySet
. In this case, instead of raising an exception when we try to add a duplicate object, we will return true
if we add the object and false
if we do not. This is because often, the user only really cares that the object is in the set after calling add
and does not care if it was already there.
As shown, the pseudocode for the add
operation is fairly straightforward. In lines 1 and 2, we call the contains
operation to determine if the object already exists in mySet
and return the value false
if it does. Otherwise, we simply call the list operation append
to insert the object into the list.
function add(object o) returns Boolean
if (contains(o)) (1)
return false (2)
else
mySet.append(o) (3)
return true (4)
end if
end function
The remove
operation is very similar to the contains
operation, except this time we want to remove the item if it exists in the set. We can use the removeCurrent()
operation to remove the current item that our iterator is pointing to from the list. Again, like the add
operation, we will return false
if the operation does not actually remove an item from the list. Otherwise, the operation will return true
.
function remove(object o) returns Boolean
mySet.reset() (1)
obj = mySet.getNext() (2)
while obj != null (3)
if (obj == o) (4)
mySet.removeCurrent() (5)
return true (6)
end if
obj = mySet.getNext() (7)
end while
return false (8)
end function
The intersection
operation creates a new set that has only elements that exist in both sets under consideration. In code, we basically accomplish this by looping through the elements in one set and then checking to see if they exist in the other set. If they do, then we include them in the intersection.
To follow that basic algorithm, the pseudocode below uses the list iterator operations to loop through and look at each element in mySet
. We will create a new set, result,
to hold the intersection of mySet
and set2
in line 1. Then we get the first element from mySet
by calling the list reset
operation in line 2 and the getNext
operation in line 3.
function intersection(set2) returns set
result = new set() (1)
mySet.reset() (2)
obj = mySet.getNext() (3)
while obj != null (4)
if (set2.contains(obj)) (5)
result.add(obj) (6)
end if
obj = mySet.getNext() (7)
end while
return result (8)
end function
Lines 4 – 7 implement the loop that walks through each element in mySet
. If the current object is contained in set2
(line 5), the operation calls the add
operation to insert obj
into the result
set. Line 7 gets the next element from mySet
and loops back to the top.
Eventually, we will look at each element in mySet
and fall out of the loop. When that happens, the intersection operation is complete, and it returns the result
set in line 8
The union
operation is similar to the intersection
operation in that we will need to use the list iterator operations to walk through our two sets. The difference lies in what we include in the new result
set. While we only walked through mySet
in the intersection
operation, picking only those objects that existed in set2
, here we will walk through both sets adding all their elements to the result
set.
The code starts in line 1 by creating our result
set. Then, lines 2 and 3 reset the mySet
iterator and get the first item from mySet
. Lines 4 – 6 form the while
loop that we use to walk through the entire set of objects in mySet
. This time, however, we simply add every element we find in line 5 before getting the next object in line 6.
function union(set2) returns set
result = new set() (1)
mySet.reset() (2)
obj = mySet.getNext() (3)
while obj != null (4)
result.add(obj) (5)
obj = mySet.getNext() (6)
end while
set2.reset() (7)
obj = set2.getNext() (8)
while obj != null (9)
result.add(obj) (10)
obj = set2.getNext() (11)
end while
return result (12)
end function
Lines 7 – 11 duplicate the loop function above, only using set2
instead of mySet
. We don’t have to worry about adding duplicate items in line 10, since as we saw above, our add
operation will not allow duplicates to be added to the set. Finally, we return the result
set, which holds the union of mySet
and set2
.
Instead of using the intersection
operation directly to compute the isSubset
operation, we will compute the value of isSubset
directly to make it more efficient. If we use intersection
directly, we will end up looping through the members of one of the sets at least two times (once to compute the intersection and once to check equality). Computing isSubset
directly only loops through one of the sets once.
The isSubset
operation below is very much like the intersection
operation as we have a loop in lines 3 - 6 that checks each element of set2
and checks to see if it is in mySet
. The difference between the two operations is that in the isSubset
operation, we do not build a third result
set. Instead, if any element in set2
is not found in mySet
, then we return false
since not all elements of set2
are contained in mySet
. If we get all the way through the loop, we have checked that each element in set2
was found in mySet
and we can return true
; set2
is a subset of mySet
.
function isSubset(set2) returns boolean
set2.reset() (1)
obj = set2.getNext() (2)
while obj != null (3)
if (! mySet.contains(obj)) (4)
return false (5)
end if
obj = set2.getNext() (6)
end while
return true (7)
end function
One of the most common uses of set operations is in database systems. While we will not create a full database system here, we will show you the basics of how they work using our set operations. Assume you have three sets of people:
$$ \text{Students} = \{ \text{"Vanessa"}, \text{"Travis"}, \text{"Lu"}, \text{"Cole"}, \text{"Jordan"}, \text{"Elaine"}, \text{"Caleb"} \} \\ \text{Family} = \{ \text{"Amy"}, \text{"Scott"}, \text{"Vanessa"}, \text{"Lauren"}, \text{"Zachary"}, \text{"Jordan"}, \text{"Caleb"} \} \\ \text{Workers} = \{\text{"Lauren"}, \text{"Amy"}, \text{"Felix"}, \text{"Zachary"}, \text{"Quinn"}, \text{"Jordan"}, \text{"Gabrielle"} \} $$The goal of our application is to be able to answer the following types of questions about the data.
The first step in our program is to add all the data to sets as shown below.
Set students = new Set()
students.add("Vanessa")
students.add("Travis")
students.add("Lu")
students.add("Cole")
students.add("Jordan")
students.add("Elaine")
students.add("Caleb")
Set family = new Set()
family.add("Amy")
family.add("Scott")
family.add("Vanessa")
family.add("Lauren")
family.add("Zachary")
family.add("Jordan")
family.add("Caleb")
Set workers = new Set()
workers.add("Lauren")
workers.add("Amy")
workers.add("Felix")
workers.add("Zachary")
workers.add("Quinn")
workers.add("Jordan")
workers.add("Gabrielle")
Once we have all the data in sets, we can use our set operations to find the information we are after. If we wanted to get a count of students and workers to answer questions 1 and 2, we can use the getter operation getSize
in the following expressions.
students.getSize()
workers.getSize()
Question 3 is a little more interesting. Obviously, we will use the getSize
operation again, but we can’t use it directly on an existing set. We need to compute the set that holds only people who are both in the worker and student sets. Luckily, we have the intersection
operation to help us do this. We simply need to compute the intersection of the workers and students and then take the size of that set as shown below.
workers.intersection(students).getSize()
Question 4 asks who is in the family set but is not in the student set. We can use the difference
operation to compute this set. Again, we use the getSize
operation to obtain the size.
family.difference(students).getSize()
Question 5 asks a slightly different question than question 3. It asks “who” works and goes to school, not just the number. Now, we compute the set of student-workers the same way, we just use the toString
operation to produce a list of people in the set instead of just reporting the size.
workers.intersection(students).toString()
Question 6 asks who works or goes to school. To answer this question, we need a set of people which includes anyone who is either a worker or a student. This requires the union
operation. We can compute the union of the two sets as follows.
workers.union(students).toString()
Question 7 wants to know who works and does not go to school. Again, this is everyone in workers who is not in students. This uses the difference
operation as shown below.
workers.difference(students).toString()
Question 8 is similar and asks who in the family does not go to school. Obviously, we can use the previous expression and just replace the workers set with the family set.
family.difference(students).toString()
Finally, question 9 asks who in the family does not work or go to school. This is basically asking who is in the family set that is not in the work set or the school set. There are actually a couple of ways to approach this. We could take the difference between the family and the workers and then take the difference between the resulting set and the students as shown below.
family.difference(workers).difference(students).toString()
Alternatively, we could take the difference between the family and the union of the worker and student sets. This solution is shown below.
family.difference(workers.union(students)).toString()
If we put all this into a program with the appropriate statements to write the solutions to the console, we would get the following output.
How many students are there?
7
How many workers are there?
7
How many workers are students too?
1
Who in my family does not go to school?
4
Who works and goes to school?
Jordan
Who works or goes to school?
Lauren, Amy, Felix, Zachary, Quinn, Gabrielle, Vanessa, Travis, Lu, Cole, Jordan, Elaine, Caleb
Who works and does not go to school?
Lauren, Amy, Felix, Zachary, Quinn, Gabrielle
Who in my family does not go to school?
Amy, Scott, Lauren, Zachary
Who in my family does not work or go to school?
Scott
As you can see, we can use set operations to quickly and easily compute solutions to database type applications. While we don’t often create special set-based solutions to data applications like this, we do use existing database applications that work on the same principle.
In this module, we introduced a new data structure, the Set. A set is a collection of values that does not contain any duplicate items. For simplicity, we showed an implementation of a set using a doubly linked list, but we could easily use an array to store a set as well.
Sets also include several unique operations, such as union, intersection, and difference. We looked at the pseudocode for some of these operations, but a few of them are left as exercises for us to complete in the project for this module.
While we rarely use sets in practice, the concepts we learned are instrumental in the development of more complex data storage techniques such as relational databases.