Hello Real World

Hello World, but like the pros do it!

Welcome

Welcome to CC 410 - Advanced Programming. This course is designed to be a capstone experience at the end of the Computational Core program, building upon our prior knowledge and experience to help us become a truly effective programmer. In this course, we’ll not only learn new skills and techniques, but we’ll try to pull back the curtain and explain the history of programming and why we do some of the things we do.

Big Ideas

In this course, we’re going to cover a lot of content. However, it can be grouped into a few big ideas in programming:

How can we write professional looking code that is easy for others to understand?
How can we effectively debug and test our programs to minimize the number of bugs?
What is object-oriented programming, really, and why is it so popular?
How can we develop programs that have a graphical user interface (GUI)?
What is event-driven programming, and how does it relate to the development of GUIs?
What are some common design patterns that we can use in our code?
How can we interface with applications on the Internet?
How do we design and develop our own programs from scratch to solve a particular problem?

We’ll spend some time covering each of these in more detail as we go through the course. In this module, we’ll start working on the first two - writing professional code and minimizing bugs through testing and debugging.

Getting Started

Before we dive too deeply into this topic, let’s take a step back and examine some of the history of programming that lead to our current state of the art that revolves around object-oriented programming. To do that, we’ll need to explore the software crisis and the topic of structured programming.

The Growth of Computing

Content Note

The content on this page was adapted from Nathan Bean’s CIS 400 course at K-State, with the author’s permission. That content is licensed under a Creative Commons BY-NC-SA license.

By this point, you should be familiar enough with the history of computers to be aware of the evolution from the massive room-filling vacuum tube implementations of ENIAC, UNIVAC, and other first-generation computers to transistor-based mainframes like the PDP-1, and the eventual introduction of the microcomputer (desktop computers that are the basis of the modern PC) in the late 1970s. Along with a declining size, each generation of these machines also cost less:

Machine	Release Year	Cost at Release	Adjusted for Inflation
ENIAC	1945	$400,000	$5,288,143
UNIVAC	1951	$159,000	$1,576,527
PDP-1	1963	$120,000	$1,010,968
Commodore PET	1977	$795	$5,282
Apple II (4K RAM model)	1977	$1,298	$8,624
IBM PC	1981	$1,565	$4,438
Commodore 64	1982	$595	$1,589

This increase in affordability was also coupled with an increase in computational power. Consider the ENIAC, which computed at 100,000 cycles per second. In contrast, the relatively inexpensive Commodore 64 ran at 1,000,000 cycles per second, while the more pricey IBM PC ran 4,770,000 cycles per second.

Not surprisingly, governments, corporations, schools, and even individuals purchased computers in larger and larger quantities, and the demand for software to run on these platforms and meet these customers’ needs likewise grew. Moreover, the sophistication expected from this software also grew. Edsger Dijkstra described it in these terms:

The major cause of the software crisis is that the machines have become several orders of magnitude more powerful! To put it quite bluntly: as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming has become an equally gigantic problem. – Edsger Dijkstra, The Humble Programmer (EWD340), Communications of the ACM

Coupled with this rising demand for programs was a demand for skilled software developers, as reflected in the following table of graduation rates in programming-centric degrees (the dashed line represents the growth of all bachelor degrees, not just computer-related ones):

Unfortunately, this graduation rate often lagged far behind the demand for skilled graduates, and was marked by several periods of intense growth (the period from 1965 to 1985, 1995-2003, and the current surge beginning around 2010). During these surges, it was not uncommon to see students hired directly into the industry after only a course or two of learning programming (coding boot camps are a modern equivalent of this trend).

All of these trends contributed to what we now call the Software Crisis.

The Software Crisis

Content Note

The content on this page was adapted from Nathan Bean’s CIS 400 course at K-State, with the author’s permission. That content is licensed under a Creative Commons BY-NC-SA license.

YouTube Video

Video Materials

At the 1968 NATO Software Engineering Conference held in Garmisch Germany, the term “Software Crisis” was coined to describe the current state of the software development industry, where common problems included:

Projects that ran over-budget
Projects that ran over-time
Software that made inefficient use of calculations and memory
Software was of low quality
Software that failed to meet the requirements it was developed to meet
Projects that became unmanagable and code difficult to maintain
Software that never finished development

The software development industry sought to counter these problems through a variety of efforts:

The development of new programming languages with features intended to make it harder for programmers to make errors.
The development of Integrated Development Environments (IDEs) with developer-centric tools to aid in the software development process, including syntax highlighting, interactive debuggers, and profiling tools
The development of code repository tools like SVN and GIT
The development and adoption of code documentation standards
The development and adoption of program modeling languages like UML
The use of automated testing frameworks and tools to verify expected functionality
The adoption of software development practices that adopted ideas from other engineering disciplines

This course will seek to instill many of these ideas and approaches into your programming practice through adopting them in our everyday work. It is important to understand that unless these practices are used, the same problems that defined the software crisis continue to occur!

In fact, some software engineering experts suggest the software crisis isn’t over, pointing to recent failures like the Denver Airport Baggage System in 1995, the Ariane 5 Rocket Explosion in 1996, the German Toll Collect system canceled in 2003, the rocky healthcare.gov launch in 2013, and the massive vulnerabilities known as the Meltdown and Spectre exploits discovered in 2018.

Language Evolution

Content Note

The content on this page was adapted from Nathan Bean’s CIS 400 course at K-State, with the author’s permission. That content is licensed under a Creative Commons BY-NC-SA license.

YouTube Video

Video Materials

One of the strategies that computer scientists employed to counter the software crisis was the development of new programing languages. These new languages would often 1) adopt new techniques intended to make errors harder to make while programming, and 2) remove problematic features that had existed in earlier languages.

A Fortran Example

Let’s take a look at a working (and in current use) program built using Fortran, one of the most popular programming languages at the onset of the software crisis. This software is the Environmental Policy Integrated Climate (EPIC) Model, created by researchers at Texas A&M:

Environmental Policy Integrated Climate (EPIC) model is a cropping systems model that was developed to estimate soil productivity as affected by erosion as part of the Soil and Water Resources Conservation Act analysis for 1980, which revealed a significant need for improving technology for evaluating the impacts of soil erosion on soil productivity. EPIC simulates approximately eighty crops with one crop growth model using unique parameter values for each crop. It predicts effects of management decisions on soil, water, nutrient and pesticide movements, and their combined impact on soil loss, water quality, and crop yields for areas with homogeneous soils and management. -- EPIC Homepage

You can download the raw source code and the accompanying documentation. Open and unzip the source code, and open a file at random using your favorite code editor. See if you can determine what it does, and how it fits into the overall application.

Try this with a few other files. What do you think of the organization? Would you be comfortable adding a new feature to this program?

New Language Features

You probably found the Fortran code in the example difficult to wrap your mind around - and that’s not surprising, as more recent languages have moved away from many of the practices employed in Fortran. Additionally, our computing environment has dramatically changed since this time.

Symbol Character Limits

One clear example is symbol names for variables and procedures (functions) - notice that in the Fortran code they are typically short and cryptic: RT, HU, IEVI, HUSE, and NFALL, for example. You’ve been told since your first class that variable and function names should express clearly what the variable represents or a function does. Would rainFall, dailyHeatUnits, cropLeafAreaIndexDevelopment, CalculateWaterAndNutrientUse(), CalculateConversionOfStandingDeadCropResidueToFlatResidue() be easier to decipher? (Hint: the documentation contains some of the variable notations in a list starting on page 70, and some in-code documentation of global variables occurs in MAIN_1102.f90.).

Believe it or not, there was an actual reason for short names in these early programs. A six character name would fit into a 36-bit register, allowing for fast dictionary lookups - accordingly, early version of FORTRAN enforced a limit of six characters for variable names¹. However, it is easy to replace a symbol name with an automatically generated symbol during compilation, allowing for both fast lookup and human readability at a cost of some extra computation during compilation. This step is built into the compilation process of most current programming languages, allowing for arbitrary-length symbol names with no runtime performance penalty.

Structured Programming Paradigm

Another common change to programming languages was the removal of the GOTO statement, which allowed the program execution to jump to an arbitrary point in the code (much like a choose-your-own adventure book will direct you to jump to a page). The GOTO came to be considered too primitive, and too easy for a programmer to misuse ².

However, the actual functionality of a GOTO statement remains in higher-order programming languages, abstracted into control-flow structures like conditionals, loops, and switch statements. This is the basis of structured programming, a paradigm adopted by all modern higher-order programming languages. Each of these control-flow structures can be represented by careful use of GOTO statements (and, in fact the resulting assembly code from compiling these languages does just that). The benefit is using structured programming promotes “reliability, correctness, and organizational clarity” by clearly defining the circumstances and effects of code jumps ³.

Object-Orientation Paradigm

The object-orientation paradigm was similarly developed to make programming large projects easier and less error-prone. We’ll examine just how it seeks to do so in the next few chapters. But before we do, you might want to see how language popularity has fared since the onset of the software crisis, and how new languages have appeared and grown in popularity in this animated chart from Data is Beautiful:

YouTube Video

Interestingly, the four top languages in 2019 (Python, JavaScript, Java, and C#) all adopt the object-oriented paradigm - though the exact details of how they implement it vary dramatically.

The term “Object Orientation” was coined by Alan Kay while he was a graduate student in the late 60s. Alan Kay, Dan Ingalls, Adele Goldberg, and others created the first object-oriented language, Smalltalk, which became a very influential language from which many ideas were borrowed. To Alan, the essential core of object-orientation was three properties a language could possess: ⁴

Encapsulation
Message passing
Dynamic binding

We’ll take a look at each of these in the next few chapters.

Weishart, Conrad (2010). “How Long Can a Data Name Be?” ↩︎
Dijkstra, Edgar (1968). “Go To Statement Considered Harmful” ↩︎
Wirth, Nicklaus (1974). “On the Composition of Well-Structured Programs” ↩︎
Eric Elliot, “The Forgotten History of Object-Oriented Programming,” Medium, Oct. 31, 2018. ↩︎

Writing Professional Code

YouTube Video

Video Materials

As we saw earlier in this module, the software development industry adopted many new processes and ideas to help combat the issues that arose during the software crisis. One of the major things they focused on was how to write code that is easy to understand, easy to maintain, and works as intended with a minimal amount of bugs. Let’s review a few of the concepts that came from those efforts, which we’ll learn more about throughout this semester.

Object-Oriented Programming

The use of object-oriented programming languages was one major outcome of the software crisis. An object-oriented language allows developers to build code that represents real-world concepts and ideas, making it easier to reason about large software programs. In addition, the concept of encapsulation helped ensure data stored and manipulated by one part of the program wasn’t inadvertently changed by a bug in another part. Finally, through message passing and dynamic binding, we could write more advanced functions that allowed our code to be very modularized, flexible, and highly reusable. We’ll spend the next several modules in this course covering object-oriented programming in much greater detail.

Unit Testing

Another major movement in the software industry was toward the use of automated testing frameworks and the use of unit testing. Unit testing involves writing detailed tests for small units of a program’s source code, often individual functions, that exercise the expected functionality of the code as well as checking for any edge cases or expected errors.

In theory, if the unit tests are properly written and perform all possible operations that the code should perform, than any code passing the tests should be considered complete and ready for use. Of course, coming up with a set of unit tests that can account for all possible scenarios is just as impossible as writing software that doesn’t contain any bugs, but it can be a great step toward writing better software.

A common software development methodology today is test-driven development or TDD. In test-driven development, the unit tests are developed first, based on the software specification, before the source code is ever written. In that way, it is easy to know if the software actually does what the requirements says it should, instead of the test simply being written to match the code that exists. (It is shockingly common for unit tests to be written based on the code it should test, which is equivalent of looking at the answers when doing a word scramble - you’ll find what you expect to find, but won’t actually learn anything useful from it.)

Another useful feature of unit tests is the ability to re-run tests on the program after an update has been developed, which is known as regression testing. If the program previously passed all available unit tests, then failed some of those tests after an update, we know that we introduced some unintended bugs in the code that can be repaired before publishing an update. In that way, we can avoid sending out an update that ends up making things even worse.

Code Coverage

Along with unit testing, another useful technique is calculating the code coverage of a set of tests. Ideally, you’d like to make sure that each and every line of code in the program is executed by at least one test - otherwise, how can you really say that that line does what it should? This is especially difficult in programs that contain multiple conditional statements and loops, or any code that checks for and handles exceptions.

There are various ways to measure code coverage, including this list from Wikipedia:

Function coverage - has every function been called?
Statement coverage - has every statement been executed?
Edge coverage - has every edge in the control flow graph been executed?
Branch coverage - has every branch in each control structure been executed?
Condition coverage - has every boolean expression been evaluated to both true and false?

There are various different ways to measure code coverage that we’ll discuss later in this course, but for now we’ll just look at statement coverage. Thankfully, there are some great tools for computing the code coverage of a set of unit tests. Our goal is always to get as close to 100% coverage as possible.

Documentation

Another major focus among professional coders is the inclusion of documentation directly in the source code itself. Many languages, such as Java, Python, and C#, include standards for documenting what various pieces of the code are for. This includes each individual source code file, classes, functions, attributes, and more. In many cases, this is done by including specially structured code comments in various places throughout the source code.

To make those comments easier to read and understand, many languages also include tools to automatically create developer documents based on those comments. A prime example of this is the Java API Documentation, which is nearly entirely generated directly from comments in the Java source code. In fact, you can compare the source code for the ArrayList class and the ArrayList Documentation in the Java API to get an idea of how this works.

Static Code Analysis

Finally, there are many tools available today that can perform static code analysis of source code, helping developers find and fix errors without ever even compiling and running the code. Some static code analysis tools are quite powerful, able to find logic errors or completely validate that the software meets a specification. These tools are commonly used in the development of critical software components, such as medical devices and avionics for aircraft, but they are also quite difficult to use.

In this course, we’re going to focus on a simpler form of static code analysis that will help us maintain good coding style. These tools are sometimes commonly referred to as “linters,” named for the old Unix ’lint’ tool that performed this task for code written in the C programming language. Of course, the use of the term “lint” is a reference to the tiny bits of fiber and fuzz that are shed by clothing, with the idea that by removing the “lint” that makes our code messy, we can have code that is cleaner and easier to read and maintain.

In fact, you may have already encountered these tools in your programming experience. Development environments such as the one used by Codio, as well as other integrated development environments (IDEs) such as Visual Studio Code, PyCharm, IntelliJ, and others all include support for static code analysis. Usually it takes the form of helpful error messages that show simple syntax and usage errors.

In this course, we’ll learn how to use some more powerful static code analysis tools to enforce a standard coding style across all of our source code. A [coding style] can be thought of as roughly equivalent to a dialect of a spoken or written language - it deals with common conventions and usage, beyond just the simple definitions and syntax rules of the language itself. By following a standardized style, our code will be easier to read and maintain for any developer who is familiar with that style.

Hello Real World

Example Videos

Based on the previous page, it sounds like writing professional code can be quite difficult. There are so many tools and concepts to keep track of, and, in fact, you may end up spending just as much time working with everything else around your code as you do writing the code itself. The benefit of all of this work comes later, when you have to update or maintain the code. If you’ve done a good job writing unit tests, checking for coverage, documenting and styling your code, you’ll end up with fewer bugs overall, and hopefully it will be easier to patch and update the code over the long term that it is in use.

Thankfully, in this course, we’re going to start small in this module with a new project we’re calling “Hello Real World.”

Hello Real World

Most programmers can recall the simple “Hello World” program they wrote when learning to program. For many of us, it is the first program we learned to write, and usually the first thing we write when learning a new language. It is almost a sacred tradition!

We’re going to build upon that in this module by learning to write a “Hello World” program of our own, but one that meets the following requirements:

It must be fully object-oriented, with the code placed within a method that is inside of a class, which is part of a package.
The code must include unit tests that fully verify that the code works properly in all cases.
The unit tests must achieve 100% code coverage of the source code.
The source code must contain full documentation for each file, class, and method, as defined by the language’s standard for in-code documentation.
The source code must pass all checks enforced through static code analysis based on a common coding style for the language.
The entire process should be easily executable at-will from the terminal, while providing opportunities for future full automation.
The resulting code should be stored in a version control software system.

That’s quite a tall order, but this is really how a professional software developer would approach writing good and maintainable code. In some languages, such as Java, a few parts of this process are pretty straightforward - Java is already fully object-oriented by default, and Java uses a common standard for creating in-code documentation. Other languages, such as Python, end up becoming more complex to work with as more requirements are added. For Python developers, a simple “Hello World” program is a single line of code, whereas this set of requirements requires multiple files to properly create a Python package. In addition, the Python language itself does not define a common standard for in-code documentation, so we must rely on external resources to determine what coding style we should follow.

Thankfully, we’ll go through this entire process step by step in the example portion of this module, and you’ll be able to follow along and build your own version of “Hello Real World.”

Summary

Content Note

Portions of the content on this page were adapted from Nathan Bean’s CIS 400 course at K-State, with the author’s permission. That content is licensed under a Creative Commons BY-NC-SA license.

In this chapter, we’ve discussed the environment in which object-orientation emerged. Early computers were limited in their computational power, and languages and programming techniques had to work around these limitations. Similarly, these computers were very expensive, so their purchasers were very concerned about getting the largest possible return on their investment. In the words of Niklaus Wirth:

Tricks were necessary at this time, simply because machines were built with limitations imposed by a technology in its early development stage, and because even problems that would be termed "simple" nowadays could not be handled in a straightforward way. It was the programmers' very task to push computers to their limits by whatever means available.

As computers became more powerful and less expensive, the demand for programs (and therefore programmers) grew faster than universities could train new programmers. Unskilled programmers, unwieldy programming languages, and programming approaches developed to address the problems of older technology led to what became known as the “software crisis” where many projects failed or floundered.

This led to the development of new programming techniques, languages, and paradigms to make the process of programming easier and less error-prone. Among the many new programming paradigms was structured programming paradigm, which introduced control-flow structures into programming languages to help programmers reason about the order of program execution in a clear and consistent manner. Also developed during this time was the object-oriented paradigm, which we will be studying in this course.

Programming Today

Today, many software developers have adopted techniques designed to produce high quality code. These include the use of automated unit testing and test-driven development, as well as standardized use of code comments and linters to maintain good coding style and ample documentation for future developers. In the project for this module, we’ll explore what this looks like by building a simple “Hello World” program that uses all of these techniques.

Review Quiz

Check your understanding of the new content introduced in this chapter below - this quiz is not graded and you can retake it as many times as you want.

--- primaryColor: '#512888' secondaryColor: '#cccccc' textColor: black shuffleQuestions: true shuffleAnswers: true locale: en --- # Software Crisis What was the primary cause of the **software crisis**? 1. [X] Machines becoming more powerful, requiring more software 1. [ ] Software becoming too expensive for anyone to afford 1. [ ] The unavailability of software for a particular task 1. [ ] All common software was compromised and hacked # Structured Programming What is the major feature of the **structured programming** paradigm? 1. [X] Control-flow structures like conditionals, loops, and switch statements 1. [ ] Structures to represent real-world objects 1. [ ] Functions as first-class citizens in the code 1. [ ] Statements only executed in the order listed in the code # 3 Key Properties **Encapsulation**, **message passing**, and **dynamic binding** are three key properties of a programming language that follows what paradigm? 1. [X] Object-oriented programming 1. [ ] Structured programming 1. [ ] Functional programming 1. [ ] Imperative programming # Unit Testing **Unit testing** is best described by what statement? 1. [X] Detailed tests for small parts of a program, usually individual functions 1. [ ] Verifying the result of a mathematical computation has the correct unit 1. [ ] Recruiting users to test a new piece of software under observation 1. [ ] Automatically testing the whole program together at once # Test-Driven Development **Test-driven development** is a process for software development that creates the items below in what order? (Click and drag to reorder elements) 1. Software specification 2. Unit tests 3. Source code # Regression Testing **Regression testing** is best described by what statement? 1. [X] Re-running previously passed unit tests after an update to catch new bugs 1. [ ] Testing older versions of software for new bugs 1. [ ] Providing invalid data to a program to make sure it doesn't crash 1. [ ] Running new software on old hardware # Code Coverage The **code coverage** of a set of unit tests is a measure of what? 1. [X] The percentage of source code that is executed during testing 1. [ ] How many possible inputs are handled properly by a function 1. [ ] How many different keywords are used in a program's code 1. [ ] The number of different developers who have reviewed the code # Documentation **Documentation** in programming typically refers to what? 1. [X] Including code comments that describe how a program works 1. [ ] Keeping track of the number of hours required to develop a program 1. [ ] Instructions for ways to compromise the system once it is deployed 1. [ ] A list of all developers of a piece of software # Messy Code Tools that help clean up messy code by making sure it follows good style are examples of what type of tool? 1. [X] Static code analysis 1. [ ] Automated testers 1. [ ] Code mutators 1. [ ] Source control # First Program What is the common name for the simple program that most programmers learn first when learning a new language? 1. [X] Hello World 1. [ ] Count to Three 1. [ ] EPIC 1. [ ] Tic-Tac-Toe

Hello Real World

Subsections of Hello Real World

Welcome

Big Ideas

Getting Started

The Growth of Computing

The Software Crisis

Language Evolution

A Fortran Example

New Language Features

Symbol Character Limits

Structured Programming Paradigm

Object-Orientation Paradigm

Writing Professional Code

Object-Oriented Programming

Unit Testing

Code Coverage

Documentation

Static Code Analysis

Hello Real World

Hello Real World

Summary

Programming Today

Review Quiz