### Welcome!

This page is the main page for Sets

This page is the main page for Sets

A set is a collection of elements that are usually related to each other. They can be sets of numbers, letters, people, cars, or even sets themselves! Thus, the elements stored in a stack, queue or list can all be considered as sets. When we define the elements in a set, we typically enclose them in curly brackets as follows.

$$ \text{A} = { \text{Emily}, \text{Don}, \text{Mohammed}, \text{Huichen} } $$

Here, $\text{A}$ is the name of the set and Emily, Don, Mohammed, and Huichen are members of set $\text{A}$.

Sets do have a few key properties:

- There are no duplicates allowed; we can’t add another Emily to our set above, and
- The elements in a set are unordered; Don doesn’t come before or after Huichen in the set.

There are also a couple of important sets that we need to know.

- The universal set, $\mathbb U$, is the set of all elements in what we call the “universe of interest”. However, this “universe of interest” is relative to the problem at hand; in our example above it could be the set of all people, the set of all students, etc.
- The empty set, usually written as $\emptyset$ or
`{}`

is the set that contains, well nothing! There is exactly one empty set across all “universes of interest”.

If we want to visualize sets and the relationships between them, we can view them using a Venn Diagram as shown below.

The drawing is equivalent to saying the following.

$$ \text{A} = { 17, 9, 32, 15, 3, 5 } \ \text{B} = { 32, 1, 22, 15, 4, 6, 14 } $$

If we overlap the two sets so that the common elements (in this case, 15 and 32) are in the overlapping section, we see that we do not get two copies of 15 and 32, but that we have just one copy of each. Thus, we have preserved the property that sets do not have duplicates. This overlapping area is called the intersection of the two sets.

In the following operations on sets, we will use variants of the diagram below, which shows set $\text{A}$ and set $\text{B}$ overlapping. The yellow part of set $\text{A}$ denotes elements of set $\text{A}$ that are not in set $\text{B}$, while the green part of $\text{B}$ denotes elements of set $\text{B}$ that are not in set $\text{A}$. The orange part of the diagram denotes the intersection of $\text{A}$ and $\text{B}$, which includes elements of the sets that exist in both set $\text{A}$ and set $\text{B}$.

Now that we can create and show sets of various types of elements, there are a few interesting operations that we can perform on sets. Given two sets, $\text{A}$ and $\text{B}$, we can perform the following operations:

- Union $\cup$ — a set with all the elements from both sets,
- Intersection $\cap$ — a set with only elements that are in both sets,
- Difference $\setminus$ — a set with the elements in one set that are not in the other,
- Disjoint $\text{A} \cap \text{B} = \emptyset$ — two sets don’t share any elements in common
- Subset $\subseteq$ — one set is completely contained in another set
- Superset $\supseteq$ — one set completely contains another set
- Product $\times$ — a set of all the ordered pairs where the first element is from the first set and second element is from the second set, and
- Powerset $\wp$ — a set of all the subsets of a given set.

We will describe each of these operations in more detail below.

The union of two sets, $\text{A}$ and $\text{B}$, is the set of elements that belongs to either $\text{A}$ or to $\text{B}$. The following figure shows the Venn Diagram for the union of set $\text{A}$ and set $\text{B}$. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \cup \text{B}$.

For our example sets $\text{A}$ and $\text{B}$ above, the union would be equivalent to the following.

$$ \text{A} \cup \text{B} = { 17, 9, 32, 15, 3, 5, 1, 22, 4, 6, 14 } $$

The intersection of two sets, $\text{A}$ and $\text{B}$, includes all the elements that belong to both $\text{A}$ and $\text{B}$. The following figure shows the Venn Diagram for the intersection of set $\text{A}$ and set B. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \cap \text{B}$.

For our example sets $\text{A}$ and $\text{B}$ above, the union would be equivalent to the following.

$$ \text{A} \cap \text{B} = { 15, 32 } $$

The difference of two sets, $\text{A}$ and $\text{B}$, includes all the elements in $\text{A}$ that are not in $\text{B}$. The following figure shows the Venn Diagram for the difference between set $\text{A}$ and set $\text{B}$. In this diagram, the blue color denotes the elements of the set that are in $\text{A} \setminus \text{B}$.

For our example sets $\text{A}$ and $\text{B}$ above, $\text{A} \setminus \text{B}$ would be equivalent to the following.

$$ \text{A} \setminus \text{B} = { 17, 9, 3, 5 } $$

It is important to realize that the set difference is not symmetrical, which means that the operation $\text{A} \setminus \text{B}$ does not result in the same set as $\text{B} \setminus \text{A}$. $\text{B} \setminus \text{A}$ is the set composed of the elements that belong to $\text{B}$ but not to $\text{A}$. The following figure shows the Ven Diagrams for $\text{B} \setminus \text{A}$.

For our example sets $\text{A}$ and $\text{B}$ above, $\text{B} \setminus \text{A}$ would be equivalent to the following.

$$ \text{B} - \text{A} = { 1, 22, 4, 6, 14 } $$

We say two sets are disjoint it they have no common elements. The `isDisjoint`

operation returns `true`

if two sets are disjoint and `false`

if they are not. For example, sets $\text{A}$ and $\text{B}$ below are disjoint since they do not share any of the same numbers. A simple way of computing the value of `isDisjoint`

is just to check to see if the intersection of the two sets is the empty set. If it is, then the two sets are disjoint.

$$ \text{A} \cap \text{B} = \emptyset \implies \text{A and B are disjoint} $$

We say a set $\text{B}$ is a subset of set $\text{A}$ if all of the elements of set $\text{B}$ are also elements of set A. The `isSubset`

operation is an operation that determines if the subset relationship is `true`

or `false`

for any two sets. In the example below, set $\text{B}$ is a subset of set $\text{A}$ since all the elements of $\text{B}$ are also in the set A. Notice also that if set $\text{A}$ and $\text{B}$ are equal, then $\text{A}$ is a subset of $\text{B}$ and $\text{B}$ is a subset of $\text{A}$ as well!

Like the `isDisjoint`

operation above, we can compute the `isSubset`

operation using the intersection of sets $\text{A}$ and $\text{B}$. If set $\text{B}$ is equal to the intersection of sets $\text{A}$ and $\text{B}$, then $\text{B}$ is a subset of $\text{A}$.

$$ \text{A} \cap \text{B} = \text{B} \implies \text{B} \subseteq \text{A} $$

We say a set $\text{A}$ is the superset of set $\text{B}$ if all the elements of $\text{B}$ are also elements of $\text{A}$. You may be thinking to yourself, “this sounds a lot like the subset operation”. And you would be correct. If $\text{B}$ is the subset of $\text{A}$, then $\text{A}$ is a superset of $\text{B}$. It is saying the same thing two different ways.

Of course, if we can compute the `isSubset`

operation, we can use that directly to compute the `isSuperset`

operation. Or, we can use the intersection operation as well. If set $\text{B}$ is equal to the intersection of sets $\text{A}$ and $\text{B}$, then $\text{A}$ is a superset of $\text{B}$.

$$ \text{A} \cap \text{B} = \text{B} \implies \text{A} \supseteq \text{B} $$

The product, or Cartesian product, of two sets results in a set that includes all pairs of elements such that the first element of the pair is from the first set and the second is from the second set. Let’s give a quick example to make this easier to understand. We denote a pair of elements as $(a,b)$, where $a$ is the first element in the pair and $b$ is the second element in the pair.

We start with two sets, $\text{A}$ and $\text{B}$ as defined below.

$$ \text{A} = { 1, 2 } \ \text{B} = { a, b, c } $$

Then the Cartesian product of $\text{A}$ and $\text{B}$ is

$$ \text{A} \times \text{B} = { (1,a), (1,b), (1,c), (2,a), (2,b), (2,c) } $$

And the Cartesian product of $\text{B}$ and $\text{A}$ is

$$ \text{B} \times \text{A} = { (a,1), (a, 2), (b,1), (b,2), (c,1), (c,2) } $$

Another way to think about the product is to consider a matrix where the columns are the elements of the first set and the rows are elements of the second set. Then, each cell in the matrix contains the ordered pair where the first element is the column and the second element is the row. The matrix for our example would look like the following.

The powerset of set $\text{A}$ is the set of all possible subsets of $\text{A}$ and is denoted as $\wp(\text{A})$. If we start with the set $\text{A} = { 1, 2, 3 }$, then its powerset is

$$ { \emptyset, {1 }, {2 }, {3 }, {1, 2 }, {1, 3 }, {2, 3 }, {1, 2, 3 } } $$

Notice that the empty set, $\emptyset$, is a subset of the powerset of $\text{A}$ as well.

Different data structures can be used to represents sets and efficiently support their operations. Among the data structures considered so far, it is possible to use both lists and arrays to represent sets. Even though sets by definition are unordered, we could store the elements in order, which has its pros and cons. Storing elements in order will speed up the `contains`

operation since we won’t have to check each element in the set. However, it will also slow down insertions since we have to insert the elements in order.

Further, storing elements in order will improve the efficiency of the union and intersection operations since we could compute both the union and the intersection in one pass (similar to merge sort). Likewise, we can speed up our searching for elements in the superset and subset operations by taking advantage of the ordering of elements.

Whether or not keeping the elements in order makes sense for your application depends on how many times you will actually insert data into the set, versus the number of times you will be using the other set operations.

We can use arrays to store sets, much like we did for array-based stacks or queues. In fact, if we implemented an array-based list, we could implement sets directly on top of that structure. Unfortunately, if we need to keep the set elements ordered for efficiency, we would compound the inefficiency of inserting or removing items from an array since we would have to move other elements in the array each time to keep it sorted. Even if we’re not interested in sorting our set, we would still face the fixed-capacity issue and would have to manage the size of the set container using the `doubleCapacity`

and `halveCapacity`

operations. On the positive side, if we kept the set elements sorted in the array, we could use binary search to implement a very efficient `contains`

operation.

We can also use linked lists to implement sets. Lists provide all of the operations we need to implement the basic set operations such as insert, remove, and list iterator. The only property we need to implement is that sets do not include duplicate elements. This is easily taken care of by checking to see if the set contains an element before we insert it. Since linked lists are very flexible and don’t require us to worry about capacity issues, we will use them as the basis for our set operations in this module.

In this section, we will walk through the pseudocode for some basic set operations. We will build our set class using the doubly linked list class. In this way, we can build from the functionality that is already developed. So, in the set class, we will declare `mySet`

as a doubly linked list and use that throughout our operations.

`mySet = new doubleLinkedList()`

The first operation we will look at is the `contains`

operation, which checks to see if a given object already exists in our set. This operation is generally very useful, which we will see in the operations below.

The pseudocode for the `contains`

operation is shown below. We will use the list iterator built into our doubly linked list class to help us walk through the list searching for the object `o`

. So, in line 1, we call the `reset`

function and then the `getNext`

operation in line 2 to get the first item in the `mySet`

list.

```
function contains(object o) returns Boolean
mySet.reset() (1)
obj = mySet.getNext() (2)
while obj != null (3)
if (obj == o) (4)
return true (5)
end if
obj = mySet.getNext() (6)
end while
return false (7)
end function
```

We then enter a `while`

loop in line 3, where we will stay until either we find the object in our list, or we reach the end of the list. Once inside the loop, we check to see if the object from the list is equal to the object we are searching for in line 4. If it is, we return `true`

in line 5 and we are done. If they are not equal, we get the next object from `mySet`

and return to the top of the loop. If we reach the end of the list without finding our object, we will exit the loop and return `false`

in line 7.

The `add`

operation is relatively straightforward. The only thing we need to check is whether or not the object is already in `mySet`

. In this case, instead of raising an exception when we try to add a duplicate object, we will return `true`

if we add the object and `false`

if we do not. This is because often, the user only really cares that the object is in the set after calling `add`

and does not care if it was already there.

As shown, the pseudocode for the `add`

operation is fairly straightforward. In lines 1 and 2, we call the `contains`

operation to determine if the object already exists in `mySet`

and return the value `false`

if it does. Otherwise, we simply call the list operation `append`

to insert the object into the list.

```
function add(object o) returns Boolean
if (contains(o)) (1)
return false (2)
else
mySet.append(o) (3)
return true (4)
end if
end function
```

The `remove`

operation is very similar to the `contains`

operation, except this time we want to remove the item if it exists in the set. We can use the `removeCurrent()`

operation to remove the current item that our iterator is pointing to from the list. Again, like the `add`

operation, we will return `false`

if the operation does not actually remove an item from the list. Otherwise, the operation will return `true`

.

```
function remove(object o) returns Boolean
mySet.reset() (1)
obj = mySet.getNext() (2)
while obj != null (3)
if (obj == o) (4)
mySet.removeCurrent() (5)
return true (6)
end if
obj = mySet.getNext() (7)
end while
return false (8)
end function
```

The `intersection`

operation creates a new set that has only elements that exist in both sets under consideration. In code, we basically accomplish this by looping through the elements in one set and then checking to see if they exist in the other set. If they do, then we include them in the intersection.

To follow that basic algorithm, the pseudocode below uses the list iterator operations to loop through and look at each element in `mySet`

. We will create a new set, `result,`

to hold the intersection of `mySet`

and `set2`

in line 1. Then we get the first element from `mySet`

by calling the list `reset`

operation in line 2 and the `getNext`

operation in line 3.

```
function intersection(set2) returns set
result = new set() (1)
mySet.reset() (2)
obj = mySet.getNext() (3)
while obj != null (4)
if (set2.contains(obj)) (5)
result.add(obj) (6)
end if
obj = mySet.getNext() (7)
end while
return result (8)
end function
```

Lines 4 – 7 implement the loop that walks through each element in `mySet`

. If the current object is contained in `set2`

(line 5), the operation calls the `add`

operation to insert `obj`

into the `result`

set. Line 7 gets the next element from `mySet`

and loops back to the top.

Eventually, we will look at each element in `mySet`

and fall out of the loop. When that happens, the intersection operation is complete, and it returns the `result`

set in line 8

The `union`

operation is similar to the `intersection`

operation in that we will need to use the list iterator operations to walk through our two sets. The difference lies in what we include in the new `result`

set. While we only walked through `mySet`

in the `intersection`

operation, picking only those objects that existed in `set2`

, here we will walk through both sets adding all their elements to the `result`

set.

The code starts in line 1 by creating our `result`

set. Then, lines 2 and 3 reset the `mySet`

iterator and get the first item from `mySet`

. Lines 4 – 6 form the `while`

loop that we use to walk through the entire set of objects in `mySet`

. This time, however, we simply add every element we find in line 5 before getting the next object in line 6.

```
function union(set2) returns set
result = new set() (1)
mySet.reset() (2)
obj = mySet.getNext() (3)
while obj != null (4)
result.add(obj) (5)
obj = mySet.getNext() (6)
end while
set2.reset() (7)
obj = set2.getNext() (8)
while obj != null (9)
result.add(obj) (10)
obj = set2.getNext() (11)
end while
return result (12)
end function
```

Lines 7 – 11 duplicate the loop function above, only using `set2`

instead of `mySet`

. We don’t have to worry about adding duplicate items in line 10, since as we saw above, our `add`

operation will not allow duplicates to be added to the set. Finally, we return the `result`

set, which holds the union of `mySet`

and `set2`

.

Instead of using the `intersection`

operation directly to compute the `isSubset`

operation, we will compute the value of `isSubset`

directly to make it more efficient. If we use `intersection`

directly, we will end up looping through the members of one of the sets at least two times (once to compute the intersection and once to check equality). Computing `isSubset`

directly only loops through one of the sets once.

The `isSubset`

operation below is very much like the `intersection`

operation as we have a loop in lines 3 - 6 that checks each element of `set2`

and checks to see if it is in `mySet`

. The difference between the two operations is that in the `isSubset`

operation, we do not build a third `result`

set. Instead, if any element in `set2`

is not found in `mySet`

, then we return `false`

since not all elements of `set2`

are contained in `mySet`

. If we get all the way through the loop, we have checked that each element in `set2`

was found in `mySet`

and we can return `true`

; `set2`

is a subset of `mySet`

.

```
function isSubset(set2) returns boolean
set2.reset() (1)
obj = set2.getNext() (2)
while obj != null (3)
if (! mySet.contains(obj)) (4)
return false (5)
end if
obj = set2.getNext() (6)
end while
return true (7)
end function
```

One of the most common uses of set operations is in database systems. While we will not create a full database system here, we will show you the basics of how they work using our set operations. Assume you have three sets of people:

$$ \text{Students} = { \text{“Vanessa”}, \text{“Travis”}, \text{“Lu”}, \text{“Cole”}, \text{“Jordan”}, \text{“Elaine”}, \text{“Caleb”} } \ \text{Family} = { \text{“Amy”}, \text{“Scott”}, \text{“Vanessa”}, \text{“Lauren”}, \text{“Zachary”}, \text{“Jordan”}, \text{“Caleb”} } \ \text{Workers} = {\text{“Lauren”}, \text{“Amy”}, \text{“Felix”}, \text{“Zachary”}, \text{“Quinn”}, \text{“Jordan”}, \text{“Gabrielle”} } $$

The goal of our application is to be able to answer the following types of questions about the data.

- How many students are there?
- How many workers are there?
- How many workers are students too?
- How many in my family do not go to school?
- Who works and goes to school?
- Who works or goes to school?
- Who works and does not go to school?
- Who in my family does not go to school?
- Who in my family does not work or go to school?

The first step in our program is to add all the data to sets as shown below.

```
Set students = new Set()
students.add("Vanessa")
students.add("Travis")
students.add("Lu")
students.add("Cole")
students.add("Jordan")
students.add("Elaine")
students.add("Caleb")
Set family = new Set()
family.add("Amy")
family.add("Scott")
family.add("Vanessa")
family.add("Lauren")
family.add("Zachary")
family.add("Jordan")
family.add("Caleb")
Set workers = new Set()
workers.add("Lauren")
workers.add("Amy")
workers.add("Felix")
workers.add("Zachary")
workers.add("Quinn")
workers.add("Jordan")
workers.add("Gabrielle")
```

Once we have all the data in sets, we can use our set operations to find the information we are after. If we wanted to get a count of students and workers to answer questions 1 and 2, we can use the getter operation `getSize`

in the following expressions.

```
students.getSize()
workers.getSize()
```

Question 3 is a little more interesting. Obviously, we will use the `getSize`

operation again, but we can’t use it directly on an existing set. We need to compute the set that holds only people who are both in the worker and student sets. Luckily, we have the `intersection`

operation to help us do this. We simply need to compute the intersection of the workers and students and then take the size of that set as shown below.

`workers.intersection(students).getSize()`

Question 4 asks who is in the family set but is not in the student set. We can use the `difference`

operation to compute this set. Again, we use the `getSize`

operation to obtain the size.

`family.difference(students).getSize()`

Question 5 asks a slightly different question than question 3. It asks “who” works and goes to school, not just the number. Now, we compute the set of student-workers the same way, we just use the `toString`

operation to produce a list of people in the set instead of just reporting the size.

`workers.intersection(students).toString()`

Question 6 asks who works or goes to school. To answer this question, we need a set of people which includes anyone who is either a worker or a student. This requires the `union`

operation. We can compute the union of the two sets as follows.

`workers.union(students).toString()`

Question 7 wants to know who works and does not go to school. Again, this is everyone in workers who is not in students. This uses the `difference`

operation as shown below.

`workers.difference(students).toString()`

Question 8 is similar and asks who in the family does not go to school. Obviously, we can use the previous expression and just replace the workers set with the family set.

`family.difference(students).toString()`

Finally, question 9 asks who in the family does not work or go to school. This is basically asking who is in the family set that is not in the work set or the school set. There are actually a couple of ways to approach this. We could take the difference between the family and the workers and then take the difference between the resulting set and the students as shown below.

`family.difference(workers).difference(students).toString()`

Alternatively, we could take the difference between the family and the union of the worker and student sets. This solution is shown below.

`family.difference(workers.union(students)).toString()`

If we put all this into a program with the appropriate statements to write the solutions to the console, we would get the following output.

```
How many students are there?
7
How many workers are there?
7
How many workers are students too?
1
Who in my family does not go to school?
4
Who works and goes to school?
Jordan
Who works or goes to school?
Lauren, Amy, Felix, Zachary, Quinn, Gabrielle, Vanessa, Travis, Lu, Cole, Jordan, Elaine, Caleb
Who works and does not go to school?
Lauren, Amy, Felix, Zachary, Quinn, Gabrielle
Who in my family does not go to school?
Amy, Scott, Lauren, Zachary
Who in my family does not work or go to school?
Scott
```

As you can see, we can use set operations to quickly and easily compute solutions to database type applications. While we don’t often create special set-based solutions to data applications like this, we do use existing database applications that work on the same principle.

In this module, we introduced a new data structure, the Set. A set is a collection of values that does not contain any duplicate items. For simplicity, we showed an implementation of a set using a doubly linked list, but we could easily use an array to store a set as well.

Sets also include several unique operations, such as union, intersection, and difference. We looked at the pseudocode for some of these operations, but a few of them are left as exercises for us to complete in the project for this module.

While we rarely use sets in practice, the concepts we learned are instrumental in the development of more complex data storage techniques such as relational databases.