Chapter 999

Sets

Welcome!

This page is the main page for Sets

What are Sets?

A set is a collection of elements that are usually related to each other. They can be sets of numbers, letters, people, cars, or even sets themselves! Thus, the elements stored in a stack, queue or list can all be considered as sets. When we define the elements in a set, we typically enclose them in curly brackets as follows.

$$ \text{A} = \{ \text{Emily}, \text{Don}, \text{Mohammed}, \text{Huichen} \} $$

Here, $\text{A}$ is the name of the set and Emily, Don, Mohammed, and Huichen are members of set $\text{A}$.

Sets do have a few key properties:

There are no duplicates allowed; we can’t add another Emily to our set above, and
The elements in a set are unordered; Don doesn’t come before or after Huichen in the set.

There are also a couple of important sets that we need to know.

The universal set, $\mathbb U$, is the set of all elements in what we call the “universe of interest”. However, this “universe of interest” is relative to the problem at hand; in our example above it could be the set of all people, the set of all students, etc.
The empty set, usually written as $\emptyset$ or {} is the set that contains, well nothing! There is exactly one empty set across all “universes of interest”.

Visualizing Sets

If we want to visualize sets and the relationships between them, we can view them using a Venn Diagram as shown below.

The drawing is equivalent to saying the following.

$$ \text{A} = \{ 17, 9, 32, 15, 3, 5 \} \\ \text{B} = \{ 32, 1, 22, 15, 4, 6, 14 \} $$

If we overlap the two sets so that the common elements (in this case, 15 and 32) are in the overlapping section, we see that we do not get two copies of 15 and 32, but that we have just one copy of each. Thus, we have preserved the property that sets do not have duplicates. This overlapping area is called the intersection of the two sets.

In the following operations on sets, we will use variants of the diagram below, which shows set $\text{A}$ and set $\text{B}$ overlapping. The yellow part of set $\text{A}$ denotes elements of set $\text{A}$ that are not in set $\text{B}$, while the green part of $\text{B}$ denotes elements of set $\text{B}$ that are not in set $\text{A}$. The orange part of the diagram denotes the intersection of $\text{A}$ and $\text{B}$, which includes elements of the sets that exist in both set $\text{A}$ and set $\text{B}$.

Operations on Sets

Now that we can create and show sets of various types of elements, there are a few interesting operations that we can perform on sets. Given two sets, $\text{A}$ and $\text{B}$, we can perform the following operations:

Union $\cup$ — a set with all the elements from both sets,
Intersection $\cap$ — a set with only elements that are in both sets,
Difference $\setminus$ — a set with the elements in one set that are not in the other,
Disjoint $\text{A} \cap \text{B} = \emptyset$ — two sets don’t share any elements in common
Subset $\subseteq$ — one set is completely contained in another set
Superset $\supseteq$ — one set completely contains another set
Product $\times$ — a set of all the ordered pairs where the first element is from the first set and second element is from the second set, and
Powerset $\wp$ — a set of all the subsets of a given set.

We will describe each of these operations in more detail below.

Union $\cup$

The union of two sets, $\text{A}$ and $\text{B}$, is the set of elements that belongs to either $\text{A}$ or to $\text{B}$. The following figure shows the Venn Diagram for the union of set $\text{A}$ and set $\text{B}$. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \cup \text{B}$.

For our example sets $\text{A}$ and $\text{B}$ above, the union would be equivalent to the following.

$$ \text{A} \cup \text{B} = \{ 17, 9, 32, 15, 3, 5, 1, 22, 4, 6, 14 \} $$

Intersection $\cap$

The intersection of two sets, $\text{A}$ and $\text{B}$, includes all the elements that belong to both $\text{A}$ and $\text{B}$. The following figure shows the Venn Diagram for the intersection of set $\text{A}$ and set B. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \cap \text{B}$.

For our example sets $\text{A}$ and $\text{B}$ above, the union would be equivalent to the following.

$$ \text{A} \cap \text{B} = \{ 15, 32 \} $$

Difference $\setminus$

The difference of two sets, $\text{A}$ and $\text{B}$, includes all the elements in $\text{A}$ that are not in $\text{B}$. The following figure shows the Venn Diagram for the difference between set $\text{A}$ and set $\text{B}$. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \setminus \text{B}$.

For our example sets $\text{A}$ and $\text{B}$ above, $\text{A} \setminus \text{B}$ would be equivalent to the following.

$$ \text{A} \setminus \text{B} = \{ 17, 9, 3, 5 \} $$

It is important to realize that the set difference is not symmetrical, which means that the operation $\text{A} \setminus \text{B}$ does not result in the same set as $\text{B} \setminus \text{A}$. $\text{B} \setminus \text{A}$ is the set composed of the elements that belong to $\text{B}$ but not to $\text{A}$. The following figure shows the Ven Diagrams for $\text{B} \setminus \text{A}$.

For our example sets $\text{A}$ and $\text{B}$ above, $\text{B} \setminus \text{A}$ would be equivalent to the following.

$$ \text{B} - \text{A} = \{ 1, 22, 4, 6, 14 \} $$

isDisjoint $\text{A} \cap \text{B} = \emptyset$

We say two sets are disjoint it they have no common elements. The isDisjoint operation returns true if two sets are disjoint and false if they are not. For example, sets $\text{A}$ and $\text{B}$ below are disjoint since they do not share any of the same numbers. A simple way of computing the value of isDisjoint is just to check to see if the intersection of the two sets is the empty set. If it is, then the two sets are disjoint.

$$ \text{A} \cap \text{B} = \emptyset \implies \text{A and B are disjoint} $$

isSubset $\subseteq$

We say a set $\text{B}$ is a subset of set $\text{A}$ if all of the elements of set $\text{B}$ are also elements of set A. The isSubset operation is an operation that determines if the subset relationship is true or false for any two sets. In the example below, set $\text{B}$ is a subset of set $\text{A}$ since all the elements of $\text{B}$ are also in the set A. Notice also that if set $\text{A}$ and $\text{B}$ are equal, then $\text{A}$ is a subset of $\text{B}$ and $\text{B}$ is a subset of $\text{A}$ as well!

Like the isDisjoint operation above, we can compute the isSubset operation using the intersection of sets $\text{A}$ and $\text{B}$. If set $\text{B}$ is equal to the intersection of sets $\text{A}$ and $\text{B}$, then $\text{B}$ is a subset of $\text{A}$.

$$ \text{A} \cap \text{B} = \text{B} \implies \text{B} \subseteq \text{A} $$

isSuperset $\supseteq$

We say a set $\text{A}$ is the superset of set $\text{B}$ if all the elements of $\text{B}$ are also elements of $\text{A}$. You may be thinking to yourself, “this sounds a lot like the subset operation”. And you would be correct. If $\text{B}$ is the subset of $\text{A}$, then $\text{A}$ is a superset of $\text{B}$. It is saying the same thing two different ways.

Of course, if we can compute the isSubset operation, we can use that directly to compute the isSuperset operation. Or, we can use the intersection operation as well. If set $\text{B}$ is equal to the intersection of sets $\text{A}$ and $\text{B}$, then $\text{A}$ is a superset of $\text{B}$.

$$ \text{A} \cap \text{B} = \text{B} \implies \text{A} \supseteq \text{B} $$

Product $\times$

The product, or Cartesian product, of two sets results in a set that includes all pairs of elements such that the first element of the pair is from the first set and the second is from the second set. Let’s give a quick example to make this easier to understand. We denote a pair of elements as $(a,b)$, where $a$ is the first element in the pair and $b$ is the second element in the pair.

We start with two sets, $\text{A}$ and $\text{B}$ as defined below.

$$ \text{A} = \{ 1, 2 \} \\ \text{B} = \{ a, b, c \} $$

Then the Cartesian product of $\text{A}$ and $\text{B}$ is

$$ \text{A} \times \text{B} = \{ (1,a), (1,b), (1,c), (2,a), (2,b), (2,c) \} $$

And the Cartesian product of $\text{B}$ and $\text{A}$ is

$$ \text{B} \times \text{A} = \{ (a,1), (a, 2), (b,1), (b,2), (c,1), (c,2) \} $$

Another way to think about the product is to consider a matrix where the columns are the elements of the first set and the rows are elements of the second set. Then, each cell in the matrix contains the ordered pair where the first element is the column and the second element is the row. The matrix for our example would look like the following.

Powerset

The powerset of set $\text{A}$ is the set of all possible subsets of $\text{A}$ and is denoted as $\wp(\text{A})$. If we start with the set $\text{A} = \{ 1, 2, 3 \}$, then its powerset is

$$ \{ \emptyset, \{1 \}, \{2 \}, \{3 \}, \{1, 2 \}, \{1, 3 \}, \{2, 3 \}, \{1, 2, 3 \} \} $$

Notice that the empty set, $\emptyset$, is a subset of the powerset of $\text{A}$ as well.

Sets in Code

YouTube Video

Different data structures can be used to represents sets and efficiently support their operations. Among the data structures considered so far, it is possible to use both lists and arrays to represent sets. Even though sets by definition are unordered, we could store the elements in order, which has its pros and cons. Storing elements in order will speed up the contains operation since we won’t have to check each element in the set. However, it will also slow down insertions since we have to insert the elements in order.

Further, storing elements in order will improve the efficiency of the union and intersection operations since we could compute both the union and the intersection in one pass (similar to merge sort). Likewise, we can speed up our searching for elements in the superset and subset operations by taking advantage of the ordering of elements.

Whether or not keeping the elements in order makes sense for your application depends on how many times you will actually insert data into the set, versus the number of times you will be using the other set operations.

Arrays

We can use arrays to store sets, much like we did for array-based stacks or queues. In fact, if we implemented an array-based list, we could implement sets directly on top of that structure. Unfortunately, if we need to keep the set elements ordered for efficiency, we would compound the inefficiency of inserting or removing items from an array since we would have to move other elements in the array each time to keep it sorted. Even if we’re not interested in sorting our set, we would still face the fixed-capacity issue and would have to manage the size of the set container using the doubleCapacity and halveCapacity operations. On the positive side, if we kept the set elements sorted in the array, we could use binary search to implement a very efficient contains operation.

Linked Lists

We can also use linked lists to implement sets. Lists provide all of the operations we need to implement the basic set operations such as insert, remove, and list iterator. The only property we need to implement is that sets do not include duplicate elements. This is easily taken care of by checking to see if the set contains an element before we insert it. Since linked lists are very flexible and don’t require us to worry about capacity issues, we will use them as the basis for our set operations in this module.

Set Operations

In this section, we will walk through the pseudocode for some basic set operations. We will build our set class using the doubly linked list class. In this way, we can build from the functionality that is already developed. So, in the set class, we will declare mySet as a doubly linked list and use that throughout our operations.

mySet = new doubleLinkedList()

Contains

The first operation we will look at is the contains operation, which checks to see if a given object already exists in our set. This operation is generally very useful, which we will see in the operations below.

The pseudocode for the contains operation is shown below. We will use the list iterator built into our doubly linked list class to help us walk through the list searching for the object o. So, in line 1, we call the reset function and then the getNext operation in line 2 to get the first item in the mySet list.

function contains(object o) returns Boolean
    mySet.reset()	            (1)
	obj = mySet.getNext()	    (2)
	while obj != null	        (3)
		if (obj == o)	        (4)
			return true	        (5)
		end if
		obj = mySet.getNext()	(6)
	end while
	return false	            (7)
end function

We then enter a while loop in line 3, where we will stay until either we find the object in our list, or we reach the end of the list. Once inside the loop, we check to see if the object from the list is equal to the object we are searching for in line 4. If it is, we return true in line 5 and we are done. If they are not equal, we get the next object from mySet and return to the top of the loop. If we reach the end of the list without finding our object, we will exit the loop and return false in line 7.

Add

The add operation is relatively straightforward. The only thing we need to check is whether or not the object is already in mySet. In this case, instead of raising an exception when we try to add a duplicate object, we will return true if we add the object and false if we do not. This is because often, the user only really cares that the object is in the set after calling add and does not care if it was already there.

As shown, the pseudocode for the add operation is fairly straightforward. In lines 1 and 2, we call the contains operation to determine if the object already exists in mySet and return the value false if it does. Otherwise, we simply call the list operation append to insert the object into the list.

function add(object o) returns Boolean
	if (contains(o))	   (1)
		return false	   (2)
	else
        mySet.append(o)	   (3)
		return true	       (4)
	end if
end function

Remove

The remove operation is very similar to the contains operation, except this time we want to remove the item if it exists in the set. We can use the removeCurrent() operation to remove the current item that our iterator is pointing to from the list. Again, like the add operation, we will return false if the operation does not actually remove an item from the list. Otherwise, the operation will return true.

function remove(object o) returns Boolean
    mySet.reset()	                (1)
	obj = mySet.getNext()	        (2)
	while obj != null	            (3)
		if (obj == o)	            (4)
            mySet.removeCurrent()	(5)
			return true	            (6)
		end if
		obj = mySet.getNext()	    (7)
	end while
	return false	                (8)
end function

Intersection

The intersection operation creates a new set that has only elements that exist in both sets under consideration. In code, we basically accomplish this by looping through the elements in one set and then checking to see if they exist in the other set. If they do, then we include them in the intersection.

To follow that basic algorithm, the pseudocode below uses the list iterator operations to loop through and look at each element in mySet. We will create a new set, result, to hold the intersection of mySet and set2 in line 1. Then we get the first element from mySet by calling the list reset operation in line 2 and the getNext operation in line 3.

function intersection(set2) returns set
	result = new set()	            (1)

    mySet.reset()	                (2)
	obj = mySet.getNext()	        (3)

	while obj != null	            (4)
		if (set2.contains(obj))	    (5)
			result.add(obj)	        (6)
        end if
		obj = mySet.getNext()	    (7)
	end while

	return result	                (8)
end function

Lines 4 – 7 implement the loop that walks through each element in mySet. If the current object is contained in set2 (line 5), the operation calls the add operation to insert obj into the result set. Line 7 gets the next element from mySet and loops back to the top.

Eventually, we will look at each element in mySet and fall out of the loop. When that happens, the intersection operation is complete, and it returns the result set in line 8

Union

The union operation is similar to the intersection operation in that we will need to use the list iterator operations to walk through our two sets. The difference lies in what we include in the new result set. While we only walked through mySet in the intersection operation, picking only those objects that existed in set2, here we will walk through both sets adding all their elements to the result set.

The code starts in line 1 by creating our result set. Then, lines 2 and 3 reset the mySet iterator and get the first item from mySet. Lines 4 – 6 form the while loop that we use to walk through the entire set of objects in mySet. This time, however, we simply add every element we find in line 5 before getting the next object in line 6.

function union(set2) returns set
	result = new set()	           (1)

    mySet.reset()	               (2)
	obj = mySet.getNext()	       (3)
    
	while obj != null	           (4)
		result.add(obj)	           (5)
		obj = mySet.getNext()	   (6)
	end while

    set2.reset()	               (7)
	obj = set2.getNext()	       (8)
    
	while obj != null	           (9)
		result.add(obj)	           (10)
		obj = set2.getNext()	   (11)
	end while

	return result	               (12)
end function

Lines 7 – 11 duplicate the loop function above, only using set2 instead of mySet. We don’t have to worry about adding duplicate items in line 10, since as we saw above, our add operation will not allow duplicates to be added to the set. Finally, we return the result set, which holds the union of mySet and set2.

isSubset

Instead of using the intersection operation directly to compute the isSubset operation, we will compute the value of isSubset directly to make it more efficient. If we use intersection directly, we will end up looping through the members of one of the sets at least two times (once to compute the intersection and once to check equality). Computing isSubset directly only loops through one of the sets once.

The isSubset operation below is very much like the intersection operation as we have a loop in lines 3 - 6 that checks each element of set2 and checks to see if it is in mySet. The difference between the two operations is that in the isSubset operation, we do not build a third result set. Instead, if any element in set2 is not found in mySet, then we return false since not all elements of set2 are contained in mySet. If we get all the way through the loop, we have checked that each element in set2 was found in mySet and we can return true; set2 is a subset of mySet.

function isSubset(set2) returns boolean
    set2.reset()	                (1)
	obj = set2.getNext()	        (2)

	while obj != null	            (3)
		if (! mySet.contains(obj))	(4)
			return false	        (5)
        end if
		obj = set2.getNext()	    (6)
	end while

	return true	                    (7)
end function

Using Sets

YouTube Video

One of the most common uses of set operations is in database systems. While we will not create a full database system here, we will show you the basics of how they work using our set operations. Assume you have three sets of people:

$$ \text{Students} = \{ \text{"Vanessa"}, \text{"Travis"}, \text{"Lu"}, \text{"Cole"}, \text{"Jordan"}, \text{"Elaine"}, \text{"Caleb"} \} \\ \text{Family} = \{ \text{"Amy"}, \text{"Scott"}, \text{"Vanessa"}, \text{"Lauren"}, \text{"Zachary"}, \text{"Jordan"}, \text{"Caleb"} \} \\ \text{Workers} = \{\text{"Lauren"}, \text{"Amy"}, \text{"Felix"}, \text{"Zachary"}, \text{"Quinn"}, \text{"Jordan"}, \text{"Gabrielle"} \} $$

The goal of our application is to be able to answer the following types of questions about the data.

How many students are there?
How many workers are there?
How many workers are students too?
How many in my family do not go to school?
Who works and goes to school?
Who works or goes to school?
Who works and does not go to school?
Who in my family does not go to school?
Who in my family does not work or go to school?

The first step in our program is to add all the data to sets as shown below.

Set students = new Set()
students.add("Vanessa")
students.add("Travis")
students.add("Lu")
students.add("Cole")
students.add("Jordan")
students.add("Elaine")
students.add("Caleb")

Set family = new Set()
family.add("Amy")
family.add("Scott")
family.add("Vanessa")
family.add("Lauren")
family.add("Zachary")
family.add("Jordan")
family.add("Caleb")

Set workers = new Set()
workers.add("Lauren")
workers.add("Amy")
workers.add("Felix")
workers.add("Zachary")
workers.add("Quinn")
workers.add("Jordan")
workers.add("Gabrielle")

Once we have all the data in sets, we can use our set operations to find the information we are after. If we wanted to get a count of students and workers to answer questions 1 and 2, we can use the getter operation getSize in the following expressions.

students.getSize()
workers.getSize()

Question 3 is a little more interesting. Obviously, we will use the getSize operation again, but we can’t use it directly on an existing set. We need to compute the set that holds only people who are both in the worker and student sets. Luckily, we have the intersection operation to help us do this. We simply need to compute the intersection of the workers and students and then take the size of that set as shown below.

workers.intersection(students).getSize()

Question 4 asks who is in the family set but is not in the student set. We can use the difference operation to compute this set. Again, we use the getSize operation to obtain the size.

family.difference(students).getSize()

Question 5 asks a slightly different question than question 3. It asks “who” works and goes to school, not just the number. Now, we compute the set of student-workers the same way, we just use the toString operation to produce a list of people in the set instead of just reporting the size.

workers.intersection(students).toString()

Question 6 asks who works or goes to school. To answer this question, we need a set of people which includes anyone who is either a worker or a student. This requires the union operation. We can compute the union of the two sets as follows.

workers.union(students).toString()

Question 7 wants to know who works and does not go to school. Again, this is everyone in workers who is not in students. This uses the difference operation as shown below.

workers.difference(students).toString()

Question 8 is similar and asks who in the family does not go to school. Obviously, we can use the previous expression and just replace the workers set with the family set.

family.difference(students).toString()

Finally, question 9 asks who in the family does not work or go to school. This is basically asking who is in the family set that is not in the work set or the school set. There are actually a couple of ways to approach this. We could take the difference between the family and the workers and then take the difference between the resulting set and the students as shown below.

family.difference(workers).difference(students).toString()

Alternatively, we could take the difference between the family and the union of the worker and student sets. This solution is shown below.

family.difference(workers.union(students)).toString()

If we put all this into a program with the appropriate statements to write the solutions to the console, we would get the following output.

How many students are there?
7

How many workers are there?
7

How many workers are students too?
1

Who in my family does not go to school?
4

Who works and goes to school?
Jordan

Who works or goes to school?
Lauren, Amy, Felix, Zachary, Quinn, Gabrielle, Vanessa, Travis, Lu, Cole, Jordan, Elaine, Caleb

Who works and does not go to school?
Lauren, Amy, Felix, Zachary, Quinn, Gabrielle

Who in my family does not go to school?
Amy, Scott, Lauren, Zachary

Who in my family does not work or go to school?
Scott

As you can see, we can use set operations to quickly and easily compute solutions to database type applications. While we don’t often create special set-based solutions to data applications like this, we do use existing database applications that work on the same principle.

Sets Summary

In this module, we introduced a new data structure, the Set. A set is a collection of values that does not contain any duplicate items. For simplicity, we showed an implementation of a set using a doubly linked list, but we could easily use an array to store a set as well.

Sets also include several unique operations, such as union, intersection, and difference. We looked at the pseudocode for some of these operations, but a few of them are left as exercises for us to complete in the project for this module.

While we rarely use sets in practice, the concepts we learned are instrumental in the development of more complex data storage techniques such as relational databases.