Python Collections
Collections in Python
Collections in Python
Python includes an extensive library of classes that can be included in any Python program. These classes are included as part of the Python Standard Library, which is included in most installations of the Python programming language.
Thankfully, the developers of the Python programming language have also written an extensive online manual that explains all of the features of the Python Standard Library in detail. Many developers refer to this manual as the Python API, or sometimes simply the documentation for the Python Standard Library.
To explore the Python Standard Library, we can start on this webpage. That link goes directly to the documentation for the latest version of Python 3, but it is very easy to find the manual for other versions of Python as well. There is a more technical version of the documentation found here that covers the specifics of the Python programming language itself..
On the home page of the Python Standard Library documentation, we’ll see several items. To the left is an index of where we are in the documentation, though there isn’t much to show on this home page. On the main page itself is a list of all of the packages included as part of the Python Standard Library. A package is a set of classes that are all related somehow, giving us a nice layer of organization to our code.
Here are some Python packages that we may want to explore as we continue to develop more advanced programs using Python:
string
—methods for manipulating stringsdatetime
—types and methods for dealing with dates and timescollections
—types and methods for storing datamath
—functions for mathematicsrandom
—pseudo-random number generatorspathlib
—working with file system pathsos
—interfaces for working with the operating systemio
—tools for working with input and output streamsIn this module, we’ll mostly explore a few of the collection classes that are built-in to the Python language itself, but are related to those found in the collections
package.
The Python Standard Library, as well as the Python programming language itself, contains several different classes specifically designed to store data. All together, they are generically referred to as collections since they are used to store a “collection” of data.
Python contains implementations for several different data structures, which are classes used to store and manipulate data in specific ways. We’ll learn more about how each of these implementations works and differs in a later course, but in this course, we’ll quickly describe a few of the most common ones. Specifically, we’ll discuss these three data structures:
Properly, Python refers to these types of items as “containers”, not collections. Containers are all instances of the collections.Container
interface. The collections
module contains several additional containers.
Sometimes we just want a collection of associated methods – think math.pi. In Python we could accomplish this by having a class that has no class or instance attributes and only static methods. Raymond Hettinger, a Python Language developer goes so far as to say the purpose of @staticmethod
is to attach methods to classes1
So in Python a “bucket” of
class my_math:
@staticmethod
def pi():
return 3.1416
@staticmethod
def circle_area(radius):
return radius * radius * my_math.pi()
Here we have a static method that returns a value for π. This is my personal method for establishing a constant2. This avoids the situation where pi can be overwritten.
.WIN2>python
Python 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)] on win32
>>> import math
>>> print(math.pi)
3.141592653589793
>>> print(math.sin(math.pi))
1.2246467991473532e-16 # essentially the correct answer (0.0)
>>> math.pi =2.0
>>> print(math.sin(math.pi))
0.9092974268256817 # very much not zero
>>>
Python Class Development Toolkit at https://www.youtube.com/watch?v=HTLu2DFOdTg&ab_channel=NextDayVideo. ↩︎
Python has been described as a language for consenting adults–there is no privacy and there are no constants. Every value and method can be accessed changed by a determined programmer ↩︎
First, let’s explore the simplest of these collections, the list. A list is defined as an ordered collection or sequence of elements, meaning that the order in which the elements were added to the list is preserved, but that order can be updated through the use of sorting algorithms.
We’ve already worked with lists in this course, but in previous modules we included one important restriction: we would know the maximum size of the list ahead of time. This is because most other programming languages use statically sized arrays as the basic collection data type.
Thankfully, lists in Python do not require a static size, and can be resized to store as much data as needed. In addition, lists are much more flexible than arrays, so in this module we’ll discuss some of the more useful features of lists.
List are generally used when:
None
To create a list, we can simply declare it using square brackets, either containing a list of elements to be included or an empty set of brackets to create an empty list:
# empty list
list1 = []
# list containing 4 elements
list2 = [1, 2, 3, 4]
However, we can also create lists in Python using list comprehensions, which allow us to build lists from a range or another list. For example, if we want to build a list that contains the first 10 perfect squares, we could do the following:
list3 = [x**2 for x in range(1, 11)]
We could even expand that with a filter that will only include even values:
list4 = [x**2 for x in range(1, 11) if x % 2 == 0]
To learn more about list comprehensions, we can consult the Python Tutorial for Data Structures.
The list
data type in Python defines several operations that it can perform. The full list can be found on the Data Structures page of the Python documentation. Here are a few of the most important ones:
append(e)
—adds element e
to the end of the listinsert(index, e)
—adds element e
to the specified index
position in the list. All elements at that position and after are shifted toward the end of the list.x in list
—returns true if the object x
is contained in the list list
index(o)
—returns the first index that this object can be found in the list, or raises a ValueError
not foundremove(o)
—removes the first occurrence of the given object from the list, if one exists. If not, a ValueError
is raisedlen(list)
—returns the number of elements in the listRecall from an earlier module that we can also use slicing to get chunks of a list. For example, here are some different ways to slice a small list:
a = [1, 2, 3, 4, 5]
b = a[:3] # [1, 2, 3]
c = a[2:] # [3, 4, 5]
d = a[:] # [1, 2, 3, 4, 5]
e = a[:-2] # [1, 2, 3]
We can even assign to a slice of a list! For more information, we can consult An Informal Introduction to Python: Lists or Data Structures: More on Lists.
What if we want to iterate through a list? Thankfully, we can use a for each loop to do this, just as we’ve learned to do previously
iter_list = [5, 3, 7, 2, 4]
for x in iter_list:
print(x)
To sort a list in Python, we can use the sort()
method on the list itself. This will sort the list in ascending order by default, but we can provide an optional parameter to sort in descending order as well:
list5 = [5, 3, 7, 2, 4]
# sort in ascending order
list5.sort()
# sort in descending order
list5.sort(reverse=True)
To explore how to use a list in a real program, let’s look at a quick example program. Here’s a short problem statement:
Write a program that will accept a single integer as input from the terminal. If there are any errors r parsing the input, simply print “Invalid Input!” and terminate the program. If the provided integer is less than 3, it should also print “Invalid Input!”.
Using the integer provided as input, the program should generate a list of numbers representing the Fibonacci sequence containing that many entries.
Before printing the list, the program should perform two additional operations on the list. First, if the integer provided as input is contained in the list, it should be removed. Secondly, the list should be presented in descending order, with the largest value first.
The program should print the list to the terminal, with each entry separated by a space.
The program should be implemented as two Methods - a
main()
Method that handles reading input, and amake_list()
Method that accepts a single integer as a parameter and returns the completed list of integers.
The Fibonacci sequence is produced by adding the two previous numbers together to generate a new number. Traditionally, the first two Fibonacci numbers are 0
and 1
. So, the next number is 0 + 1 = 1
, and the number following that is 1 + 1 = 2
. The sequence continues indefinitely.
Formally, the Fibonacci sequence is defined as:
$$F_0 = 0$$ $$F_1 = 1$$ $$F_n = F_{n-1} + F_{n-2}$$You can find more information about the Fibonacci Sequence on Wikipedia.
For example, if the input provided is 8
, the program would begin by generating the first eight Fibonacci numbers, starting with 0
and 1
:
[0, 1, 1, 2, 3, 5, 8, 13]
Then, since 8
is contained in the list, it should be removed:
[0, 1, 1, 2, 3, 5, 13]
Finally, the list should be sorted in descending order, so the final list returned will be:
[13, 5, 3, 2, 1, 1, 0]
main
MethodSo, let’s build the program. We can start with this simple skeleton:
import sys
class ListExample:
@staticmethod
def main(args):
reader = sys.stdin
try:
count = int(reader.readline())
if count < 3:
print("Invalid Input!")
return
l1 = make_list(count)
l1 = [str(l) for l in l1]
print(" ".join(l1))
except Exception as e:
print("Invalid Input!")
return
def make_list(x):
# MORE CODE GOES HERE
# main guard
if __name__ == "__main__":
ListExample.main(sys.argv)
This program contains a simple main()
method that will handle reading and parsing the input from either the terminal or a file provided as a command-line argument. It will also verify that the input is an integer, and it is a value that is at least 3
or greater. If will then call the make_list()
Method using the input as a parameter to create the list. It uses list comprehension to reassign the variable l1
to a list of strings. Finally it will print the result using the String.join()
method to place a space between each element of the list. Pretty nifty, right?
So, all we need to worry about implementing is the make_list()
Method.
make_list
MethodNote: video unnecessarily marks the method `make list` as static method. We will follow the convention that only the start-up 'main' method shall be static.
Let’s dive into the make_list()
method. First, we’ll need to create a list that contains the first two Fibonacci numbers. So, the code would be as follows:
def make_list(x):
l1 = [0, 1]
Following that, we’ll use a simple for loop to generate the next few numbers. We can either have separate variables to represent the previous values, or we can just read them directly from the list using square brackets:
def make_list(x):
l1 = [0, 1]
for i in range(2, x):
new_number = list[i-1] + list[i-2]
list.append(new_number)
Once we’ve generated our list, we need to make sure x
is not in the list. If so, we can just remove it using the remove()
method. We must be careful to only call the remove()
method if we are sure it is in the list, otherwise it will raise a ValueError
:
def make_list(x):
l1 = [0, 1]
for i in range(2, x):
new_number = list[i-1] + list[i-2]
list.append(new_number)
if x in list:
list.remove(x)
Finally, we can use the sort()
method to sort the list. We’ll need to remember to include the optional parameter to sort in descending order. Once we’ve sorted the list, we can return it.
def make_list(x):
l1 = [0, 1]
for i in range(2, x):
new_number = list[i-1] + list[i-2]
list.append(new_number)
if x in list:
list.remove(x)
list.sort(reverse=True)
return list
That’s all there is to it! See if you can complete the code in ListExample.py
to the left, then use the two assessments below to check your program.
This content is presented in the course directly through Codio. Any references to interactive portions are only relevant for that interface. This content is included here as reference only.
Next, let’s explore another important collection, the dictionary. Some programming languages, such as Java, refer to this collection as a map as well.
A map is a collection that associates, or maps, a key to a value. To store something in a dictionary, we must provide a unique key for each value. If we provide a key that is already in use, the value stored by that key is overwritten. Then, to retrieve a value, we simply provide the key that is associated with that value.
This is very similar to how lists work, where each element in the list can be accessed using a list index. The biggest difference is that keys in a map can be any value, not just an integer, and likewise the values can be any type of value.
Some types of objects cannot be used as keys. Lists and Dictionaries cannot be used as keys. Pretty much any type where you can assign by index cannot be used as a key. If you can go x[0] = ...
it is out.
In Python, dictionaries are another example of a collection that is included directly in the language.
Dictionaries are generally used when:
To create a dictionary, we can simply declare it using curly brackets {}
, either containing a set of keys and elements to be included or an empty set of brackets to create an empty dictionary:
# empty dictionary
dict1 = {}
# dictionary with some items
dict2 = {"one" : 1, "two": 2, "three": 3}
That’s all there is to it!
Elements in a dictionary can be accessed and updated by providing the key in square brackets:
# dictionary with some items
dict2 = {"one" : 1, "two": 2, "three": 3}
x = dict2["one"]
print(x) # 1
dict2["two"] = 5
print(dict2["two"]) # 5
The dict
data type in Python defines several operations that it can perform. The full list can be found on the Data Structures page of the Python documentation. Here are a few of the most important ones:
dict.get(k)
—return the value associated with the key k
in the dictionarylen(dict)
—returns the number of key and value associations in the dictionary dict
k in dict
—returns True
if the dictionary contains an associated value for the given key k
dict.pop(k)
—removes the key k
and its associated value from the dictionaryWhat if we want to iterate through a dictionary? That is a bit tricky, but it can be done. To do this, we’ll use a for each loop. The code for doing this is shown here:
dict_iter = {"One":"Uno", "Two":"Dos", "Three":"Tres", "Four":"Quatro"}
for key in dict_iter:
print("Key: {}".format(key))
print("Value: {}".format(dict_iter[key]))
However, in most cases, it doesn’t make much sense to iterate through a dictionary, since that isn’t what it is designed for. Instead, we may want to consider using some sort of a list instead.
To explore how to use a dictionary in a real program, let’s look at a quick example program. Here’s a short problem statement:
Write a program that will read multiple lines of input from the terminal. The program should terminate if a blank line is entered.
The program will store a dictionary that associates lines of input with random integers.
When a line of input is read, the program will check to see if that line has already been used as a key in the dictionary. If so, it will return the value associated with that key.
If the dictionary does not contain an entry for that key, the program should generate a new random integer and store it in the dictionary, using the input line as the key.
The program should be implemented as two methods - a
main()
method that handles reading input, and agetOutput()
method that accepts a string as a parameter and returns the associated number for that string. It should also use a global variable to store the dictionary.
For example, let’s say the program receives the following input:
cat
dog
horse
dog
cat
One possible output could be:
12334512345
239487234
234234
239487234
12334512345
Notice that the lines of output correspond with the lines of input, such that the program returns the same value each time it reads the word cat
.
main
MethodSo, let’s build the program. We can start with this simple skeleton:
import sys
import random
class DictExample:
dictionary = {}
@staticmethod
def main(args):
reader = sys.stdin
line = "test"
while len(line.strip()) > 0:
line = reader.readline()
if len(line.strip())>0:
output = DictExample.get_output(line)
print(output)
def get_output(line):
# MORE CODE GOES HERE
# main guard
if __name__ == "__main__":
DictExample.main(sys.argv)
This program contains a simple main()
method that will handle reading and parsing the input. When it reads a line of input, it will use the
get_output()
method to get the associated output for that input, and then print it to the terminal.
The program also includes a class attribute to store the dictionary. In that way, we can use the same object in both methods without having to pass it around as an argument.
So, all we need to worry about implementing is the get_output()
method.
get_output
MethodNote: video unnecessarily marks the method `get output` as static method. We will follow the convention that only the start-up 'main' method shall be static.
Let’s dive into the get_output()
method. First, we’ll need to see if the dictionary contains a value for the string provided as input. If so, we can just return that value:
def get_output(line):
if line in DictExample.dictionary:
return DictExample.dictionary[line]
However, if the dictionary does not contain a value for that key, we must generate a new one, store it in the dictionary, then return it:
def get_output(line):
if line in DictExample.dictionary:
return DictExample.dictionary[line]
else:
new_number = random.randint(-1000000, 1000000)
DictExample.dictionary[line] = new_number
return new_number
That’s all there is to it! See if you can complete the code in DictExample.py
to the left, then use the two assessments below to check your program.
This content is presented in the course directly through Codio. Any references to interactive portions are only relevant for that interface. This content is included here as reference only.
The last data structure we will explore is the tuple. A tuple in mathematics is defined as a “finite ordered list” of items. So, it is a list of items that is not infinite, and the ordering of the items in that list matters.
Thankfully, Python includes support for immutable tuples directly in the language, so we can use them in our programs quickly and easily.
Tuples are generally used when:
None
To create a tuple in Python, we can simply assign multiple values to a variable, separated by commas:
tuple_one = 123, 456, 789
In Python, assigning values to a tuple in this way is referred to as packing. As with lists and dictionaries, we can even store different types of data in the same tuple, as seen below:
tuple_two = 876, "Hello", True
Tuples can be nested, using parentheses to enclose any inner tuples for clarity:
tuple_three = 123, ("Hello", True)
To create a tuple containing fewer than two elements, we must use some special syntax:
# empty tuple uses empty parentheses
tuple_empty = ()
# single item tuple uses a comma after the item
tuple_single = "Hello", # alterante t_single = ("Hello")
To access individual elements of a tuple, we can use the same square bracket notation []
we’ve learned with arrays:
tuple_access = 123, 456, "Hello"
print(tuple_access[0]) # 123
print(tuple_access[1]) # 456
print(tuple_access[2]) # Hello
We can also unpack values in a tuple, storing them in individual variables:
tuple_unpack = 123, 456, "Hello"
x, y, message = tuple_unpack
print(x) # 123
print(y) # 456
print(message) # "Hello"
We can even do the same for a method that returns a tuple:
x, y = get_coordinates();
As we can see, tuples appear to be very similar to lists in Python. However, tuples have one important property that sets them apart from lists—they are immutable. This means that once a tuple is created, the items it contains cannot be changed or updated:
tuple_modify = 123, 456, "Hello"
tuple_modify[0] = 789 # raises TypeError
So, we should always remember that items in a tuple cannot be modified, whereas a list can be modified. Because of that, we typically use tuples when returning data from a function or method, or when storing associated data in a list.
To really explore how we could use a tuple in our code, let’s build a simple program. Here’s a problem statement:
Write a program that will read lines of input from the terminal. Valid input is a single line with two space separated integers between 0-9.
The program should implement a simple search game. Players will guess coordinates on a 10x10 grid, trying to find a hidden item. So, input will take the form of two integers, separated by a space. If the program is unable to parse an input for any reason, it should print “Invalid Input!” and terminate the program.
When a player makes a guess, the program should respond with one of the four cardinal directions: “North”, “South”, “East”, “West”, indicating the direction of the hidden item from the guess. For simplicity, the program will only respond “North” or “South” if the player’s guess is not on the same row, or
x
coordinate, as the location, and “East” or “West” only if the player’s guess is on the same row as the location. If the player guesses correctly, simply print “Found!” and terminate the program. If the player repeats a guess, the program should print “Repeat!” instead of a cardinal direction.
The hidden location should be randomly generated and stored in a tuple as a global variable, and the program should maintain a global list of tuples representing the locations already guessed.
The program should be implemented as two functions:
main()
that handles reading input and printing output, and amake_guess()
function that accepts two integers,x
andy
as parameters, and responds with a string representing the result of that guess.
Let’s see if we can build this program using tuples.
main
MethodFirst, let’s look at the main function. Once again, we’ll start with skeleton code that is very similar to the other examples in this chapter:
import sys
import random
class TupleExample:
guesses = []
location = None
@staticmethod
def main(args):
reader = sys.stdin
lines = "0 0"
while len(lines.strip()) > 0:
lines = reader.readline()
if len(lines.strip()) > 0:
try:
val = lines.split(" ")
x = int(val[0])
y = int(val[1])
output = TupleExample.make_guess(x, y)
print(output)
if output == "Found!":
return
except Exception as e:
print("Invalid Input!")
return
def make_guess(x, y):
# MORE CODE GOES HERE
# main guard
if __name__ == "__main__":
TupleExample.main(sys.argv)
At the top of the class, we have included two fields to store our list of previous guesses and the location that should be guessed. For now, we’ll set that value to None
until we initialize it in our make_guess()
function.
In this code, we are simply reading a line of input, splitting it into two tokens, and then parsing each token to an integer to get the x
and y
values of the guess. After that, we can call the make_guess()
function with those values, and print the result. Finally, we must check to see if the result is "Found!"
. If so, we can terminate the program using the return
keyword.
make_guess
FunctionNote: video unnecessarily marks the method `make guess` as a static method. We will follow the convention that only the start-up 'main' method shall be static.
Now that we have our main()
Method written, let’s work on the make_guess()
function. For starters, we must make sure our random location has been established. So, we’ll do that in the code shown below:
def make_guess(x, y):
if TupleExample.location is None:
TupleExample.location = random.randint(0, 9), random.randint(0, 9)
Here, we are using the randint()
method of the random
library to generate a number between 0
and 9
, inclusive.
Now we must handle producing the output. First, we need to create a new tuple for the guess, and then check and see if the guess is correct, or if it has already been guessed:
def make_guess(x, y):
if TupleExample.location is None:
TupleExample.location = random.randint(0, 9), random.randint(0, 9)
guess = x, y
if guess == TupleExample.location:
return "Found!"
if guess in TupleExample.guesses:
return "Repeat!"
Here, we are taking advantage of the fact that Python can automatically handle checking for equality between tuples as easily as any other value. As long as both tuples contain the same values in the same order, they are considered equal.
If we find out that the guess is not the hidden location, then we should store it in our list of guesses:
def make_guess(x, y):
if TupleExample.location is None:
TupleExample.location = random.randint(0, 9), random.randint(0, 9)
guess = x, y
if guess == TupleExample.location:
return "Found!"
if guess in TupleExample.guesses:
return "Repeat!"
TupleExample.guesses.append(guess)
Finally, we can handle printing the hints for cases where the guess is not on the same row:
def make_guess(x, y):
if TupleExample.location is None:
TupleExample.location = random.randint(0, 9), random.randint(0, 9)
guess = x, y
if guess == TupleExample.location:
return "Found!"
if guess in TupleExample.guesses:
return "Repeat!"
TupleExample.guesses.append(guess)
if guess[0] > TupleExample.location[0]:
return "North"
if guess[0] < TupleExample.location[0]:
return "South"
Below that, we can assume that the guess is on the same row, so we’ll have to handle the cases where the guess is east or west of the location:
def make_guess(x, y):
if TupleExample.location is None:
TupleExample.location = random.randint(0, 9), random.randint(0, 9)
guess = x, y
if guess == TupleExample.location:
return "Found!"
if guess in TupleExample.guesses:
return "Repeat!"
TupleExample.guesses.append(guess)
if guess[0] > TupleExample.location[0]:
return "North"
if guess[0] < TupleExample.location[0]:
return "South"
if guess[1] > TupleExample.location[1]:
return "West"
else:
return "East"
Tuples are useful, but limited. Conventional Python style is to use tuple when returning multiple values. However, using a tuple signals the grouping is only transitory. If a tuple represents a data aggregation that should be persistent, it is generally recommended that you make a class with properties and return an instance of t class instead. Classes can be better documented and extended.
def fie():
...
return a,b
def fee():
...
t1= fee()
foe(t1)
z = t1[0].operation()
...
def foe(x):
...
In this instance, it looks like the data returned by fie
is persistent, and maybe there should be a class for it. This is particularly true when a tuple contains lots of elements or it the tuple’s elements themselves are complex objects.
There we go! That method will handle producing the output for any guess provided by the user. See if you can complete the code in TupleExample.py
to the left, then use the two assessments below to check your program.
This content is presented in the course directly through Codio. Any references to interactive portions are only relevant for that interface. This content is included here as reference only.
As we continue to write larger and more complex programs, it is important to remember that we can include comments and documentation in our code. Comments are lines of text in our program’s source code that are ignored by the compiler or interpreter, allowing us to add information to the program beyond the code itself.
These help explain our code to anyone who might read it, but can even be useful to help us remember exactly how it works or what it does, especially if we end up coming back to a program written a long time ago that we do not remember very well.
As we’ve seen before, we can add single-line comments to our Python programs using a hash symbol #
before a line in our source file:
# this is a comment
x = 5
# this is also a comment
b = True
Finally, Python also includes a secondary type of comment that spans multiple lines, specifically for creating documentation. A docstring is usually the first line of text inside of a class or method definition, and is surrounded by three double quotes """
with one set of three on each end.
These comments are specifically designed to provide information about classes and methods in our code. Here’s a quick example using a simple class:
class IntTuple:
""" Represents a tuple containing two integer values
This class is an adaptation of a class developed for Java
that mimics the built-in tuples in Python
Attributes
----------
first : int
the first element in the tuple
second : int
the second element in the tuple
"""
def __init__(self, one, two):
""" Initializes a new IntTuple object
Parameters
----------
one : int
the first element in the new tuple
two : int
the second element in the new tuple
"""
self.first = one
self.second = two
Unfortunately, Python does not enforce a particular style for these docstrings, so there are many different formats used in practice. To learn more, we can consult the following references.
References