Chapter 915

Programming by Contract and Introduction to Performance

Welcome!

This page is the main page for Programming by Contract and Introduction to Performance

Subsections of Programming by Contract and Introduction to Performance

Invariants

YouTube Video

Another concept related to preconditions and postconditions is the loop invariant. A loop invariant is a statement should be true after each iteration of a loop, provided the loop’s preconditions are all true before the start of the loop. Yes, that’s right—we can think of a loop as a miniature method within a method, with its own set of preconditions and postconditions.

Example - Maximum

For this example, let’s consider a method maximum(numbers) that gets an array of numbers as input, and returns the maximum value stored in that list. So, we can start by listing the preconditions of the method:

  1. numbers is an array containing at least one numerical value, either a floating point or an integer

Thankfully, that one precondition covers all of our bases. Likewise, we can define our postcondition pretty easily as well:

  1. The method returns the maximum value stored in the numbers array
  2. The numbers array is not modified by the method

There we go! Hopefully it is easy to see how the preconditions and postconditions help us build a better definition of what operations should be performed by the method.

Method Pseudocode

Now that we’ve built our method’s preconditions and postconditions, let’s quickly write the method in pseudocode so we can see what it would look like.

function MAXIMUM(NUMBERS)
    MAX = NUMBERS[0]
    loop I from 1 to length of NUMBERS
        if NUMBERS[I] > MAX
            MAX = NUMBERS[I]
        end if
    end loop
    return MAX
end function

We’ve seen methods like this several times already, since calculating the maximum value from an array of values is a very common task. This time, however, let’s discuss the preconditions, postconditions and invariants of the loop inside of this method.

Preconditions

First, we can establish an important precondition at the beginning of the loop. In this case, the precondition that best describes the value stored in MAX is:

  1. MAX contains the maximum value stored in NUMBERS up to and including index $0$.

In this way, we’ve directly tied our precondition to the values that exist in the method already, and we’ve accurately described the value that MAX is currently storing.

Tip

Confused?

Don’t worry if this doesn’t make sense right now - you won’t be expected to write your own preconditions and invariants at this point. We’re just introducing the concepts so you’ll understand how they work when you see them later in the projects in this course

Loop Invariant

Next, we can determine what the loop invariant should be. This is a statement that should be true after each iteration of the loop, provided the precondition is true. Usually we want to try and relate it to the precondition somehow, and how it changes after each iteration of the loop. So, let’s consider the following loop invariant:

  1. MAX contains the maximum value stored in NUMBERS up to and including index $I$.

Hmm, that’s almost exactly the same as the precondition, isn’t it? That’s what we’re going for here. In this way, we can easily describe what the loop does after each iteration based on how it updates the value in MAX, keeping the loop invariant true. We call that “maintaining the loop invariant.”

Postcondition

Finally, we can define the postcondition of the loop. This is pretty simple, since it is the same as the loop invariant, but this time we can say that I is equal to the end of the loop. So, our postcondition is simply:

  1. MAX contains the maximum value stored in NUMBERS up to and including the last index.

That’s all there is to it! By defining a precondition, invariant, and postcondition for our loop, we can very accurately describe exactly what that loop should do. On the next page, we’ll see how we can put all of those together to show that our method works correctly.

Correctness

YouTube Video

If we have defined the correct preconditions, postconditions, and invariants for our code, we can then use those to prove that our code correctly performs its intended operation.

In this course, we won’t ask you to do any of this yourself, but it is important to understand what is going on in the background and how this process works. We can use the concepts of preconditions and postconditions when grading your code using our Autograder—in fact, that’s really how it works!

Proving Correctness

To prove correctness of a method, we can generally follow this process:

  1. Establish (or assume) that the method’s preconditions are all true
  2. Use the preconditions and the code to demonstrate that any invariants within the code are always true
    1. If the code contains a loop, we can repeat this process for preconditions and postconditions of the loop
  3. Use the preconditions and invariants to show that the postcondition is true

Let’s do this for our maximum() method described on the previous page. Here is the pseudocode once again:

function MAXIMUM(NUMBERS)
    MAX = NUMBERS[0]
    loop I from 1 to length of NUMBERS
        if NUMBERS[I] > MAX
            MAX = NUMBERS[I]
        end if
    end loop
    return MAX
end function

Here are the associated conditions we established as well:

Method Precondition: NUMBERS is an array containing at least one numerical value, either a floating point or an integer Loop Precondition: MAX contains the maximum value stored in NUMBERS up to and including index $0$. Loop Invariant: MAX contains the maximum value stored in NUMBERS up to and including index $I$. Loop Postcondition: MAX contains the maximum value stored in NUMBERS up to and including the last index. Method Postconditions: The method returns the maximum value stored in the NUMBERS array, and the NUMBERS array is not modified by the method

Now that we have that information, we can discuss the correctness of the method.

A Simple Proof

To begin, we assume that the method’s preconditions are true. Therefore, we know that NUMBERS is an array containing at least one numerical value. Earlier in this chapter, we learned that we can’t assume that the method works if the preconditions are false, so we can always begin our proof by assuming they are true.

Next, we have to use the code as well as the method’s preconditions to establish that the loop’s preconditions are true. In the code, we see that we set the value of MAX equal to the first element in the NUMBERS array. So, since there is only that value in the NUMBERS array up to and including index $0$, it is easy to say that it is indeed the maximum value. So, we’ve shown that our loop’s precondition is true.

After that, we have to show that the loop invariant is true after each iteration of the loop. A proper proof would involve a technique called proof by induction, which is more advanced than we want to cover in this course. However, a simple way to think about it is to say that the value in MAX contains the largest value in the list that we’ve seen so far. On each loop iteration, we look at one more value. If it is larger than MAX, then it becomes the new MAX value. Otherwise, we know that the maximum value we’ve seen so far hasn’t changed. So, we can show that our loop invariant is always true after each loop iteration.

Finally, once we reach the end of the loop, we’ve looked at every element including the last index, and found the MAX value, so it is easy to see that our loop postcondition is true.

Then, we can quickly use that loop postcondition to understand that the MAX value is really the largest in the NUMBERS array, which means that the first part of our method’s postcondition is also true.

But wait! What about the second part, that insists that we did not modify the contents of the numbers array? Thankfully, we can quickly look at our code and determine that we are not assigning a value into the array, nor do we call any functions on the array, so it is easy to show that our code has not modified it in any way.

There we go! This is a quick example for how we can use preconditions, postconditions, and invariants to help understand our code and whether it works correctly.

On the next page, we’ll learn about unit testing and how we can write our own code to verify that our code works correctly.

Performance of Algorithms

The performance of our algorithms is very important, since the difference between good algorithms and bad algorithms on very large data sets can often be measured in terms of days of execution time. Thus, efficiency will be one of the key issues we will look at when designing algorithms.

For example, one simple problem involves finding the largest sum of contiguous elements in an array. So, if we have the array:

[βˆ’2, 1, βˆ’3, 4, βˆ’1, 2, 1, βˆ’5, 4]

we could find that the contiguous sequence of:

[4, βˆ’1, 2, 1]

sums to $6$, which is the largest sum of any contiguous subsequences of this array.

A simple solution to this problem might involve finding all possible subsequences of the array and adding them, which could take a very long time. In fact, if the array contains $n$ elements, it might take $n^3$ steps to solve it. However, with a little ingenuity, we can actually solve this problem using many fewer steps, even as few as $n$ steps itself.

When we try to solve a problem, it is often helpful to look at multiple solutions to the problem and compare them before choosing our final design. Just the act of trying to find multiple ways to solve the same problem stimulates creativity and promotes mental elasticity and speed of thought. Selecting a solution from several different choices following rigorous and objective criteria enables us to improve fundamental life skills such as simply knowing how to make decisions! In fact, the very attempt at solving a problem in a variety of ways forces us to look our problems from different perspectives, which is the catalyst of all scientific discoveries.

Throughout this section, we will develop two solutions to the problem of finding the maximum max and minimum min of a list of N numbers (defined as list[N]) as an example. We will develop both solutions and then evaluate their performances in terms of both execution time and memory space.

Max and Min - Linear

YouTube Video

Let’s start by considering one number from the list at a time.

Base Case

When we have received just one number, this number is both the maximum and the minimum. In the initial state, you have max holding the maximum, and min holding the minimum. The invariant is the following:

  1. The variable max holds the maximum of all the numbers considered so far
  2. The variable min holds the minimum of all the numbers considered so far

The algorithm is depicted by the following flowchart and pseudocode:

Min Max Flowchart Base Min Max Flowchart Base

print "Enter a Number:"
input X
MAX = X
MIN = X

The Next Number

Then, our program will enter a loop to read 10 more numbers from the user. So, we’ll need to perform the following process during each iteration of the loop:

  1. compare the value in max with this new number and update the max value if the new number is greater. In this way, the invariant is preserved.
  2. compare the value in min with this new number and update the min value if the new number is smaller. In this way, the invariant is preserved.

This part of the program is depicted by the following flowchart and pseudocode:

Min Max Flowchart Loop Min Max Flowchart Loop

loop I from 1 to 10
    print "Enter a Number:"
    input X
    if X > MAX
        MAX = X
    end if
    if X < MIN
        MIN = X
    end if
end loop

After you’ve considered the second number, you end up in the same situation at the beginning: you have a maximum and a minimum value of the numbers input by the user so far. You have found an invariant if you verify that the preconditions before executing an iteration of the loops are the same as the conditions at the end of the loop, known as postconditions:

  • Loop preconditions:
    1. The variable max holds the maximum value among all the numbers considered so far
    2. The variable min holds the minimum value among all the numbers considered so far
    3. The new input considered is a number
  • Loop postconditions:
    1. The variable max holds the maximum value among all the numbers considered so far
    2. The variable min holds the minimum value among all the numbers considered so far
  • Loop invariant:
    1. The variable max holds the maximum value among all the numbers considered so far
    2. The variable min holds the minimum value among all the numbers considered so far

You can then generalize the solution for the nth input: when you consider the nth number, compare it with the values in max and min, updating them if necessary. In each step, we can show that the invariant holds.

A full flowchart of this program can be found by clicking the following link:

Min Max Flowchart

It is helpful to have this diagram available in a second browser tab for review on the next few pages.

Max and Min - Pairs

YouTube Video

Another solution consists of comparing the numbers in pairs, instead of one at a time.

Base Case

When we have received just one number, this number is both the maximum and the minimum. In the initial state, you have max holding the maximum, and min holding the minimum. The invariant is the following:

  1. The variable max holds the maximum of all the numbers considered so far
  2. The variable min holds the minimum of all the numbers considered so far

The algorithm is depicted by the following flowchart and pseudocode:

Min Max Flowchart Base Min Max Flowchart Base

print "Enter a Number:"
input X
MAX = X
MIN = X

The Next Two Numbers

In this program, instead of just considering one number at a time, we’ll ask the user to input two numbers. Then, we can determine which of those two inputs is larger (we’ll call it lastmax), and compare it to the value in max. Similarly, we can do the same for the smaller value (called lastmin) and min. Would this program be more efficient?

The algorithm is depicted by the following flowchart and pseudocode:

Min Max Pairs Loop Min Max Pairs Loop

loop I from 1 to 10 step by 2:
    output "Enter a Number:"
    input X
    output "Enter a Number:"
    input Y
    if X > Y
        LASTMAX = X
        LASTMIN = Y
    else
        LASTMAX = Y
        LASTMIN = X
    end if
    if LASTMAX > MAX
        MAX = LASTMAX
    end if
    if LASTMIN < MIN
        MIN = LASTMIN
    end if
end loop

Once again, we can easily show that the same loop preconditions, postconditions, and invariants work for this loop:

  • Loop preconditions:
    1. The variable max holds the maximum value among all the numbers considered so far
    2. The variable min holds the minimum value among all the numbers considered so far
    3. The new inputs considered are a number
  • Loop postconditions:
    1. The variable max holds the maximum value among all the numbers considered so far
    2. The variable min holds the minimum value among all the numbers considered so far
  • Loop invariant:
    1. The variable max holds the maximum value among all the numbers considered so far
    2. The variable min holds the minimum value among all the numbers considered so far

A full flowchart of this program can be found by clicking the following link:

Min Max Flowchart

It is helpful to have this diagram available in a second browser tab for review on the next few pages.

Time Complexity

YouTube Video

It is very useful to compare the two solutions to choose one. In general, you can compare the two solutions by considering:

  1. The time required to execute it
  2. The amount of memory it requires
  3. The number of instructions used, and also the understandability of the code

Time Complexity

To compare programs in terms of the time we can estimate the running time, or time it takes to complete its work. If the program performs operations on a large set of data, you can also check the run time by using the timer available in your programming language. This type of algorithm profiling is called experimental algorithm evaluation.

You can also estimate the time by counting the number of comparisons and assignments performed by the two programs since comparison and assignments are the fundamental operations that allow you to find the smallest and the largest element.

Let’s do this analysis for both the linear and pairs solution to the problem of finding the minimum and maximum values in a list of numbers. For this analysis, we’ll just look at the code inside the loop, and ignore any code before the loop that initializes variables, since both programs have the same code there.

Comparisons

The linear program performs two comparisons, x > max and x < min each time a new number is considered. If the algorithm considers N numbers, the total number of comparisons is $N * 2$ or $2N$ comparisons.

The pairs program makes three comparisons every time it considers two elements: x > y, lastmax > max and lastmin < min If the algorithm considers N numbers, the total number of comparison is $N/2 * 3$ or $3/2 N$ comparisons.

If we assume that our list is holding a large number of values, then we can see a major difference in the time required to complete the program. Put another way, we can observe that the difference in efficiency between the two programs increases as N increases. Assuming N is $1,000$, the linear program makes $2,000$ comparisons, while the pairs program makes $1,500$ comparisons.

Of course, data sizes in real programs can be much larger. For example, Google currently indexes around 50 billion web pages! So, a list that contains $1,000$ numbers looks pretty small by comparison.

Assignments

For the assignments, you can count them. For the linear solution, the following situations could occur:

  1. The new number updates the maximum. If the new number is the maximum it will be greater than the minimum, so no update is required for the minimum
  2. The new number update the minimum. If the new number is the minimum it will be lower than the maximum so no updated are required for the maximum
  3. If the new number is between the minimum and the maximum value no updates are required

So, in the worst-case, the algorithm performs $N$ assignments. For example, consider a situation where the list is already sorted in increasing order. In that case, we’ll have to update the maximum value each time, resulting in $N$ assignments.

In the pairs solution, for every two numbers considered causes the following assignments to be computed:

  1. Two assignments for updating the lastmax and lastmin. These two assignments are done in any case.
  2. From zero to two assignments to update the min and the max, depending on the results of the comparisons.

Therefore, the number of assignments is $2$ in the best case, $4$ in the worst case, and $3$ in the average case. Hence, with $N$ numbers, the number of assignments is $N/2 * 2$ or $N$ in the best case, $2N$ in the worst case and $3/2N$ in the average case. The number of assignments is greater with the second algorithm. But, could we do better in this last case?

Which is Better?

So, how can we determine which is better? Let’s look at a couple of situations.

For example, if we have a list of 1000 numbers, we can find the following number of steps for each program. First, let’s consider the worst case performance, taking the largest values for each.

  1. Linear: $2N$ or $2000$ comparisons + $N$ or $1000$ assignments, so $3000$ total steps
  2. Pairs: $3/2 * N$ or $1500$ comparisons + $2N$ or $2000$ assignments, to $3500$ total steps

As we can see, the pairs program requires more steps in total than the linear program. So, it appears that it might be the best choice.

However, not every program will be a worst case. So, let’s look at the best case performance for each one:

  1. Linear: $2N$ or $2000$ comparisons + $0$ assignments (ignoring the small number of initial assignments), so $2000$ total steps
  2. Pairs: $3/2 * N$ or $1500$ comparisons + $N$ or $1000$ assignments, to $2500$ total steps

Again, we see that the pairs program still requires more steps than the linear program, though both programs run faster in the best case than the worst case.

Finally, we can do the same for the average case performance:

  1. Linear: $2N$ or $2000$ comparisons + $N/2$ assignments, so $2500$ total steps
  2. Pairs: $3/2 * N$ or $1500$ comparisons + $3/2 * N$ or $1500$ assignments, to $3000$ total steps

There we go! As we can see, in each case the linear program actually performs better than the pairs program, even though we know that the pairs program will only run the loop half as many times as the linear program. The extra comparisons and assignments make the program take more time!

Space Complexity

We can also look at a program based on the amount of space, or memory, that it uses. In this case, we are looking at the number of variables that are needed, and also the size of any lists, arrays, or more advanced data structures used.

The comparison in terms of space leads us to observe that the linear solution uses, in addition to the input value x for each iteration of the loop, also the max and min variables. So, there are just 3 more variables to keep track of.

The pairs solution uses, in addition to the two input values x and y for each iteration of the loop, two variables for the lastmax and lastmin as well as the global max and min variables. So, there are 6 more variables to keep track of in this solution

Therefore, the linear solution is more space-efficient since it requires only three variables.

A Bigger Example

Let’s consider a bigger example program, just to see the impact of space complexity when analyzing a program. Consider a program that will compute the result of multiplying two numbers from $1$ through $10$. So, there are 10 numbers we need to consider as both operands.

One possible solution would be to pre-compute all possible answers in a 10 by 10 array, which would contain 100 elements. A diagram of this is shown below.

Multiplication Table Multiplication Table^[Wikipedia contributors. (2020, January 25). Multiplication table. In Wikipedia, The Free Encyclopedia. Retrieved 02:06, January 28, 2020, from https://en.wikipedia.org/w/index.php?title=Multiplication_table&oldid=937470066]

This is a very inefficient program in terms of space complexity, since it requires $N^2$ spaces in memory for $N$ possible operand values.

Of course, we already know that we could write a program that could simply calculate and return the answer based on the input values, so this example is a bit worthless in practice. However, on some small, embedded systems such as a smart watch or nano-machine, we might discover that it is better to use memory than to spend time calculating a result on such a slow processor, so there are times where having higher space complexity helps save time in the long run.

Code Complexity

Lastly, when analyzing a program, we must also consider the complexity of the code used to write the program. Code complexity can refer to many things, but in general we use it to describe how many lines of code are included the program, as well as how easy it is to understand what the program does.

Lines of Code

One of the most common ways to measure the size of a program is the number of lines of code, or LOC, of the program. This can be a very rough estimate of the size of the program, since, in general, a longer program with more lines of code may be more complex than a shorter program with fewer lines.

So, if we can find a solution to a problem that requires many fewer lines of code than another solution, that is one factor to consider. Of course, the solution with fewer lines of code may be more complex in terms of time or space!

Understandability

The other important measure of code complexity deals with how easy it is to understand what the program does. Sometimes we can write a program that only takes a few lines of code, but it is so complex that it is difficult to really understand how it works.

For example, consider this line of pseudocode - can you determine what it does?

return (X > 0) and ((X % 10) + (INT((X % 100) / 10))) < 5

It is pretty difficult to understand. Can we rewrite this program to make it easier to understand? Consider the following code:

if X <= 0
    return FALSE
else
    ONES_PLACE = X % 10
    TENS_PLACE = X % 100
    SUM_LAST_TWO_DIGITS = TENS_PLACE + ONES_PLACE
    if SUM_LAST_TWO_DIGITS < 5
        return TRUE
    else
        return FALSE
    end if
end if

This program is pretty easy to follow. First, it will determine if X is 0 or a negative number, and return FALSE if so. Then, it will find the last two digits of the number, corresponding to the ones and the tens place, and sum them. Finally, if the sum of the last two digits is less than $5$, it will return TRUE. Otherwise, it will return FALSE.

As it turns out, these two programs do the exact same thing! However, from the single line of code in the first program, it is very difficult to decipher exactly what it does. The second program, even though it is much longer, is much easier to understand. In addition, when run on a real computer, both of these programs will take nearly the same amount of time to run.

Complexity Triad

As we’ve seen, we can describe the complexity of a program in terms of three values: the time it takes to run, the space it requires in memory, and the size and understandability of the code. However, how do we know which of those measures is the most important?

Unfortunately, that is a difficult question to answer. In general, we want to reduce each of these levels of complexity, but sometimes there are trade-offs. For example, a program that runs quickly may require lots of extra memory, or it could even require really complex code.

The world of business uses a simple triad to understand these trade-offs, as shown in the diagram below:

Business Triad Diagram Business Triad Diagram^[File:Project-triangle.svg. (2020, January 12). Wikimedia Commons, the free media repository. Retrieved 02:41, January 28, 2020 from https://commons.wikimedia.org/w/index.php?title=File:Project-triangle.svg&oldid=386979544.]

Along with this diagram comes the saying: “Fast. Good. Cheap. Pick any two.” It helps express the difficulty in finding a perfect program that does not contain at least one level of complexity.

Thankfully, most modern computers have a sufficiently fast processor and ample memory that most programs can run easily, even without worrying about reducing the space and time complexity of the program. So, it may seem that the most important measure to focus on is code complexity: we should strive to write readable and understandable code.

However, as we build larger programs and deal with more input data, time and space complexity quickly become more important. So, it is important to consider each measure independently, and do the best we can to build programs that run quickly, use a minimal amount of memory, and aren’t so complex that they are difficult to understand.

Programming by Contract and Performance Summary

In this module, we covered two major topics that will help us understand the data structures and algorithms we’ll learn in this course. First, we learned about the use of preconditions, postconditions, and loop invariants to help describe the exact specifications of how methods and loops should operate in code. We’ll use these as the basis of unit tests in this course to help prove the correctness of data structures and algorithms we are asked to develop throughout the course.

In addition, we were introduced to the concepts of time complexity, space complexity, and code complexity, three ways to measure the performance and usefulness of a computer program. We also learned that it may be difficult to find a perfect program that has very little time, space, and code complexity, so many times we’ll have to consider a trade-off between programs that operate quickly, use less memory, and are easier to understand.

Which these tools, we’re in a much better place to understand the specifics of the data structures and algorithms we’ll encounter in this module. Next, we’ll do a short project to explore how we can build programs based on a set of preconditions, postconditions, and invariants instead of the plain language descriptions we’ve used up to this point.