Chapter 9

Strings

Text as Data!

Subsections of Strings

From Numbers to Text

Up to this point, we’ve learned how to store data in many different forms in our programs. We’ve used integers and floating point numbers to store various numerical values, and Booleans to store true and false in our programs. We have learned how to make arrays of those data types when we need to store several of them together. We can even create multidimensional arrays when we want to store data in a structured way, such as a grid for representing 2D data in a game.

However, we haven’t covered how to handle one of the the most important types of data that our programs can handle - text. Text is a fundamentally important type of data in the world today, because it is really what enables us to share broad ideas across the entire Internet. Therefore, our programs need to be able to work with that type of data clearly and effectively.

As we’ve seen so far, computers are well suited to working with numerical data. In fact, all the data stored on a computer are just numbers, written in a binary format. So, how can we store text in our computer, using that binary format?

ASCII and Unicode

ASCII Table ASCII Table1

In the 1960s, the American Standard Code for Information Interchange, abbreviated as ASCII and usually pronounced “az-skee”, was developed as a way to represent text on a system that was designed to store numerical data. Each character in the English alphabet was assigned a value. So, on a computer, the decimal value $ 65 $ represents the character A, while the decimal value $ 116 $ represents t. In addition to the printed characters in English, several other characters were added to ASCII, representing items such as a tab character or newline in printed text. In that way, an entire document of printed text could be represented as a list of numbers using ASCII.

However, ASCII can only handle simple characters in English, and wasn’t a very good system for storing text in other languages. So, another encoding system, known as Unicode, was developed in the 1980s to handle text in a variety of writing systems. As the Internet grew, Unicode became much more prevalent, and by the late 2000s, a majority of websites on the Internet were using UTF-8, a version of Unicode, to store and represent their data.

UTF-8 was chosen as a global standard because it is also backwards compatible with ASCII. So, any text written in ASCII will automatically work in UTF-8 as well, bypassing any required conversion step.

In this book, we’ll generally be using ASCII to refer to the encoding system used to handle text in our programs. However, behind the scenes, it is important to know that most of the text may actually be stored and represented in Unicode, specifically UTF-8, depending on how we are using it.


  1. File:ASCII-Table-wide.svg. (2019, January 19). Wikimedia Commons, the free media repository. Retrieved 23:04, February 21, 2019 from https://commons.wikimedia.org/w/index.php?title=File:ASCII-Table-wide.svg&oldid=335449197↩︎

Characters & Strings

YouTube Video

Video Materials

Using ASCII, we can now store and manipulate text in our programs. There are a couple of different ways our programs store and interact with text, so let’s cover both of them here before moving on.

Characters

In programming, the smallest piece of text is referred to as a character. We sometimes use the term letter to refer to the smallest parts of text, since a word in text is made up of many letters, but technically symbols such as &, ( and # are not letters, so we should refer to them as characters instead. This helps avoid any confusion.

A character, therefore, is a single symbol in written text. Using ASCII, it would represent a single entry on that table, so it could be stored as a single integer in a computer.

In fact, many programming languages include a special data type for storing characters, sometimes referred to as the char data type. This is really handy if we need to store a single character of text, or want to convert between the character’s textual value and the underlying number used in ASCII to store the character.

Typically, single characters are denoted using a single quotation mark, both in code and when written. So, the first character of the alphabet would be written as 'a' or 'A'.

Strings

So, how can we store larger pieces of text, such as words and sentences? In a programming language, we refer to these longer pieces of text as strings. So, a string can refer to any text stored in a computer. Most programming languages also include a specific data type named String, just for storing and interacting with strings of text. We’ll learn all about how to use these data types in this chapter. Strings are usually written inside of double quotation marks. We’ve already seen examples of this, such as “Hello World” in our very first program.

So, what is a string? In essence, a string is just an an aggregation of characters, like an array or list, with each character representing one symbol in the text stored in the string.

String Array String Array

The image above shows how the text “Hello World” would be stored on our computer as a string. It is simply an array of characters, with each character representing a single symbol from the text, stored in order. As we learn more about using strings in our programs, we’ll see how useful it is to know that each character in the string has an index, just like elements in an array.

ASCII Array ASCII Array

Behind the scenes, our computer is actually storing the numerical value for each character, so it would actually be seeing these values. However, by using special data types for storing text as strings, it will display the characters for us instead of the numbers. This is why it is important for our programming languages to have different data types: it allows us to store all of our data as numbers, specifically binary numbers, and the programming language uses the data type to tell the computer how to interpret that data when we use it in our programs. Pretty neat, right?

Subsections of Characters & Strings

Parsing Strings

One of the most common things our programs will do with strings is use them to collect data from the user. When we do, we’ll need to parse the data in the string into a format our computer can recognize.

Parsing typically involves two steps: tokenization and conversion.

Tokenizing Strings

The first step in parsing a string is to tokenize the string, or separate it into its individual elements. For example, let’s say we want our user to be able to input two numbers, representing the coordinates of a square in Tic-Tac-Toe. So, we could prompt the user to input those two numbers on the same line, separated by a single space, like this:

2 1

When we read that line of input from the user, we’ll create a string variable that stores "2 1". So, we want to be able to separate that string into two parts, representing each number.

Most programming languages include a method to split a string into parts, using a specific character as a delimiter, marking where one part ends and another begins. In many cases, we’ll use the space character as the delimiter, but sometimes we’ll use other characters such as the comma or semicolon as well.

So, once we split the string, we’ll have an array of strings that stores {"2", "1"}, with each element representing a part of the string. That’s the first step.

Converting Strings

Once we’ve split our string into parts, we may need to convert the strings to another data type, such as integers or floating point values. Thankfully, each programming language includes a variety of methods we can use to convert to and from strings and other data types. We’ll see how to do that in our chosen programming language later in this chapter.

String Operations

We can also perform many different operations on strings in our programs. These operations allow us to search within strings, edit them, compare them, and more.

Length

First, we can get the length of any string stored in our programs. From the example earlier:

String Array String Array

This string has length 11, because it contains 11 characters. We must make sure we count the space as a character, as well as any punctuation or other symbols. Since the index of the last character is 10, we know that the length is 11, just as we saw when working with arrays.

Comparison

We can also compare two strings to see if they are equal. Two equal strings would contain exactly the same characters in exactly the same order. So, we would say that “Hello” and “Hello” are equal, but “Hello” and “hello” are not.

We can also use comparison operators to determine if one string comes before another lexicographically. This is similar to alphabetical order, but it also encompasses all of the other characters in either ASCII or Unicode and handles capitalization, sorting uppercase letters before lowercase letters

Searching

There are also methods we can use to search within a string. For example, we could see if the string “Hello World” contains “lo” using a find method. We could even determine if a string starts with or ends with a particular sequence of characters.

Finally, we can find the location of a character or sequence within a larger string. As an example, if we wanted to find the location of “lo” in “Hello World”, our program would tell us that it begins at character 3.

“Modifying” Strings

In many programming languages, including Python and Java, strings are immutable–the values in the memory locations containing the string cannot be changed. However there are many methods which provide new copies of old strings with modifications. These typically include methods to make a string entirely lowercase or uppercase, as well as a method to remove any extra whitespace from the beginning or end of a string.

There are also methods to replace one character with another in a new string. So, we could replace all spaces in “Hello World” with commas, resulting in a new string “Hello,World”.

Immutable Strings

Consider the scenario below:

Immutable Strings Immutable Strings

  1. On Line 2, calling the method toUpperCase() does not change the memory location holding s. It provides a copy of the string, with all letters as capitals. But since this value is not captured in a variable, it never goes into the variable storage, and it is lost when the program moves to line 3.
  2. Line 3 makes t an alias of s, they are two different variables referring to the same thing.
  3. Line 4 reassigns s to the new string “HELLO”–but because strings are immutable, the memory space with “hello” is not reused (as it the case mutable data types). Instead, a new memory location is used and s is redirected there.

Substring

Finally, we can also get a substring from our original string. A substring is simply a consecutive portion of the original string. For example, if we want the substring from character 3 through character 7 of “Hello World”, the result would be “lo Wo”. So, we can get smaller parts of our original string using a method that creates a substring.

Later in this chapter we’ll see how to use each of these methods in our chosen programming language.

Formatted Strings

We’ve already seen one way that we can create strings of output for our users. In nearly all of the programs we’ve written so far, we have simply placed variables into our print statements, along with strings inside of quotation marks, and then combined them together. However, most programming languages support a way to create formatted output strings.

The syntax varies for each programming language, but in essence, we can create a string with placeholders for variables, and we can also include information about how those variables should be formatted.

For these examples, I’ll be using the most common syntax, which comes from the C programming language. Both Java and Python use a similar syntax, but each language works a bit differently, so we’ll want to consult the documentation linked later in this chapter for our language of choice.

Consider this string as an example:

Welcome %s! You have loaded this program %d days so far.

In that string, we have “%s” and “%d” as placeholders. The first one, “%s”, specifies that it should be replaced by a string since it includes the letter “s”, while the second one, “%d”, should be replaced by an integer (also known as a decimal, hence the use of the letter “d”)1

In addition, we could also specify things such as the number of leading 0s or decimal places in these format strings as well.

Here’s another example:

Your balance is $%8.3f

In this example, the format string “%8.3f” specifies that we should create an output that is 8 characters wide, including 3 decimal places. Finally, the use of the character “f” tells us that it expects a floating point value. So, if we replaced that format string with the value 1.23, the resulting string would be:

Your balance is $   1.230

Formatted output strings are a great way to make sure our output is formatted exactly the way we want.


  1. These particular formatting codes originate in C ( s- string, d- decimal, x-hexadecimal, f-floating point), which probably borrowed the idea from Fortran. In older versions of Fortran “D” is used for double precision floats. ↩︎

Chapter 6.J

Java Strings

Strings in Java

Subsections of Java Strings

Making Strings

String variables in Java can be created just like any other variable type we’ve seen so far. To declare a string variable, we can use the following syntax:

String s;

Notice that the keyword String in Java is capitalized. This is because we are actually referring to a class named String that is a part of the Java programming language and not a simple data type. This tends to cause new programmers quite a bit of problems, so it is important to remember that this particular data type is capitalized in Java.

We can of course then instantiate our variable by assigning a value to it, as in this example:

String s;
s = "This is a string!";

The text itself must be placed in double quotation marks as seen in this example. This allows the Java compiler determine what part of the source code file should be treated as text instead of code.

As always, we can do both steps on a single line as well:

String s = "This is a string!";
Special Characters

Java supports several special characters that can use in our strings to represent specific symbols. For example, we know that strings must be surrounded by double quotation marks. So, what if we want to include quotation marks in our string?

We can use \" as a special character to represent a double quote in our string. Here’s an example:

String s = "This is \"a quote\"";
System.out.println(s);

This code segment would produce the following output:

This is "a quote"

There are several special characters we can include in our strings. Here are a few of the more common ones:

  • \' - Single Quotation Mark (usually not required)
  • \" - Double Quotation Mark
  • \n - New Line
  • \t - Tab
  • \\ - The backslash

Character variables are created using the char data type in a similar way:

char c;
c = 'a';

char d = 'b';

In Java, characters are placed in single quotation marks as seen above.

Finally, we can also create a string from an array of characters, as in this example:

char[] c = {'H', 'e', 'l', 'l', 'o', '!'};
String s = new String(c);

System.out.println(s); // Hello!

Here, we are using the new keyword to create a new String object. Then, we are using the variable c as input to the that object’s constructor. We’ll learn more about creating objects and using constructors in a later chapter, but it is important to know that it is possible to create a string from an array of characters quickly.

Reference

Java Strings

Parsing Strings

In many programs, we’ll be reading input from the user into a string variable, and then parsing that input into the various data types we need. Parsing is a two-step process: tokenization and conversion.

Let’s explore parsing by starting with a program that reads a line from the keyboard.

// Load required classes

public class Example{
  
  public static void main(String[] args) throws Exception{
    
    // Scanner variable
    Scanner reader = new Scanner(System.in);
    
    /* -=-=-=-=- MORE CODE GOES HERE -=-=-=-=- */
    
  }
  
}

This code creates a variable named reader, which is a Scanner object in Java. we recommend you always read in Strings using the .nextLine() method. Once we’ve read in a line, we split it into tokens (parts).

Tokenization

Tokenization refers to splitting a large input into smaller parts, its tokens. Each token is delimited by special characters, called delimiters. In normal text, words (the tokens) are delimited by so called “white space”^[which derives from the blank spaces on standard paper]. In a computer String variable these spaces are not blank, but rather contain “unprintable” characters as shown in the following table

ASCIIchar thing
32' 'space
9'\t'tab
13'\r'carriage return
10'\n'new line

In general the problem statement or program specification will provide some clue as to appropriate delimiters.

Split

For example, let’s say we’d like to tokenize the following input from the user:

This 1 2.0 true
The second line

We’ll assume that we’ve already created our reader variable using the skeleton code given above. So, to parse this input, we could use the following code

String line1 = reader.nextLine();  
   // line1 == "This 1 2.0 true"
String[] line1Parts = line1.split(" ");
  // line1Parts == {"This", "1", "2.0", "true"}
String line2 = reader.nextLine();  
   // line2 == 
String[] line2Parts = line1.split(" ");
  // line2Parts ==   

Let’s go through this code and see how it works. First, we read a single line of input from the user using the reader.nextLine() method. Then, we split that line into individual parts using the split method of the string variable line1. Inside of the split method, we need to give the string that we’d like to use as our delimiter. So, in this case, we’ll just provide a string that contains a single space to use the space character as our delimiter.

That will create an array of strings named line1Parts, which will contain four elements. In this case, the first element will be “This”, the second element will be “1” and so on.

We then do the same process again for the second line. What will the values of line2 and line2Parts be?

String.split() and Regular Expressions

Technically, Java’s String.split() method we are using actually uses a regular expression to perform the split operation. A regular expression is a specially formatted string that is used to define a search pattern in a string. For example, we could write a regular expression to match words that begin with a number, contain at least three letters, and then end with the letter “a”. That regular expression would be “\b\d.{3,}a\b”, by the way.

So, as input, we are not just providing a delimiter as a string, but we are actually creating a regular expression that the computer users to determine where to split the string. Thankfully, if we provide a single character in a string as input, it will simply look for that character in the string, and split the string anywhere that it finds the character we provide.

So, we can just pretend we are providing a single delimiter string to this operation for now, but behind the scenes it is capable of doing so much more.

You can learn more about regular expressions in Java here:

Java Regular Expressions

Conversion (finally)

First, we determine if a token can be converted into a literal value of a certain type^[Often you will know the intended type of the input]. Consider the token “2.”.

  • Can you parse this into a double? Yes, double temp = 2.0; will compile just fine.
  • How about an int? No, int temp = 2.0; does not compile.

What about the token “2”, can it be converted to : double, int?

Once we determine the token can be converted, we do the conversion.

For example, let’s say the user has provided the following text as input:

1 This 2.0 is true

We could parse that input into individual variables using this block of code:

String line = reader.nextLine();
String[] tokens = line.split(" ");  
int i1 = Integer.parseInt(tokens[0]);        // 1
String s1 = tokens[1];                       // This
double d = Double.parseDouble(tokens[2]);    // 2.0
String s2 = tokens[3];                       // is
boolean b = Boolean.parseBoolean(tokens[4]); // true

Reading More Input

When using a while-loop to read from the terminal, we must use a sentinel value to “signal” the end of input. Typically, an empty line^[just hit the return/enter key] is used. Scanner.nextLine() returns the empty string in this case. Then, we can use an If-Then statement to determine if the user is finished providing input.

Here’s a great way to handle this situation in Java:

String line = " "; // a space
while(line.length() >0){
  line = reader.nextLine();
  
  if(line.length() > 0){
      // parse the input
  }

}

In this case, the program will continue to read input from the user until the user enters a blank line of input by just pressing the Enter key on the keyboard.

Practice

Let’s take a minute to get some practice parsing strings of input for our programs.

Complete ‘StringParsing.java’ to meet the following problem statement:

Write a program that can find the sum of an undetermined number of inputs provided on two lines. The first line of input will contain one or more integers, separated by spaces. The second line of input will contain one or more floating point numbers, separated by commas. The program should output the sum of all inputs provided.

So, for example, if our program receives the following input:

1 2 3 4 5
1.25,2.5,3.75

we should print out “22.5” as the result.

Assuming we already have our skeleton code, we can quickly work through this problem statement. First, we’ll need to read a line of input and split it using the space character as our delimiter:

String input = reader.nextLine();
String[] splits = input.split(" ");

That’s simple enough. Now, since we don’t know how many inputs might have been provided, we’ll have to use a FOREACH^[the Java nomenclature is enhanced-For loop] loop to iterate over the inputs:

String input = reader.nextLine();
String[] splits = input.split(" ");

for(String s : splits){
  
}

Then, inside of the FOREACH loop, we can just convert each input as an integer and then add it to a sum variable. We’ll have to create the sum variable outside of the FOREACH loop, because we’ll want it available outside of the loop. We’ll make that variable a floating point data type, since we’ll be adding floating point numbers to it from the second line of input.

String input = reader.nextLine();
String[] splits = input.split(" ");

double sum = 0.0;
for(String s : splits){
  sum += Integer.parseInt(s);
}

Next, we can read the next line of input from the user, and then split it using a comma as a delimiter.

String input = reader.nextLine();
String[] splits = input.split(" ");

double sum = 0.0;
for(String s : splits){
  sum += Integer.parseInt(s);
}

input = reader.nextLine();
splits = input.split(",");

Notice that we are able to reuse the variables input and splits here. This is handy, so we only have to manage one set of variables as we parse multiple lines of input.

Finally, we can use another FOREACH loop to iterate across the second set of inputs, parse them into a floating point value, and then add them to the sum variable. Finally, at the end, we’ll print out the value of the sum variable.

String input = reader.nextLine();
String[] splits = input.split(" ");

double sum = 0.0;
for(String s : splits){
  sum += Integer.parseInt(s);
}

input = reader.nextLine();
splits = input.split(",");

for(String s : splits){
  sum += Double.parseDouble(s);
}

System.out.println(sum);

Subsections of Parsing Strings

String Operations

YouTube Video

Video Materials

The string data type includes many built-in operations that we can use to compare, manipulate, and search within strings. We’ll cover several of them on this page, and we’ll also include links at the bottom to additional resources where all of them are listed.

Length

First and foremost is the length() method. It allows us to find the number of characters in a string.

String s = "This";
System.out.println(s.length()); // 4

String t = "This \"is\" that";
System.out.println(t.length()); // 14

Notice that the second string, stored in variable t, only contains 14 characters. That is because \" only counts as a single character in the output, so it is stored as a single character in the string. The same applies to any of the special characters we’ve seen so far in this chapter.

Comparison

Next, we can use special methods in Java to compare two strings. First, we must use the equals() method to determine if two strings are equal (meaning they contain exactly the same characters in the same order), as in this example:

String s1 = "This";
String s2 = "This";
String s3 = "this";

System.out.println(s1.equals(s2)); // true
System.out.println(s1.equals(s3)); // false
Don’t Use == with Strings!

When comparing two strings in Java, we cannot use the equality == operator. This is because Java stores strings as an object, and not a primitive data type such as the integers and floating point numbers we’ve seen so far.

When using the equality operator, it will test to see if the two objects are exactly the same, not the contents of the string.

Here’s an example:

String s1 = "This";
String s2 = s1;
String s3 = new String("This");

System.out.println(s1 == s2); // true
System.out.println(s1 == s3); // false

In this case, even though all three strings contain the same data, they may not be the same objects in memory. So, we must always use the equals() method instead.

Similarly, we can use the compareTo() method to compare two strings and see which one should be placed first in lexicographic order. Consider this example:

String s1 = "This";
String s2 = "That";

int x = s1.compareTo(s2);

In this example, x will be a negative number if s1 should come before s1, a positive number if s2 should come before s1, and exactly 0 if the two strings are the same.

While this may seem a bit complex, there is actually a great way to remember how this works. Whenever we would normally want to say s1 < s2, we’ll instead say s1.compareTo(s2) < 0. In effect, we replace the left side with s1.compareTo(s2), and then replace the right side with 0, leaving the sign the same. This simple conversion works for all comparison operators:

  • s1 < s2s1.compareTo(s2) < 0
  • s1 <= s2s1.compareTo(s2) <= 0
  • s1 > s2s1.compareTo(s2) > 0
  • s1 >= s2s1.compareTo(s2) >= 0
  • s1 == s2s1.compareTo(s2) == 0

Concatenation

Another common string operation is concatenation, or joining two strings together. This operation is actually very simple, and there are multiple ways to do it.

First, we can use the + operator to concatenate any two strings together. In addition, if at least one of the operands is a string, Java will automatically convert the other operand to a string, if possible.

Here are a few examples:

String s1 = "This";
String s2 = "That";
int x = 42;

String s3 = s1 + s2;
String s4 = "" + x;

System.out.println(s3); // ThisThat
System.out.println(s4); // 42

As we can see, one neat way to convert any primitive data type to a string is to simply concatenate it with an empty string literal, represented by empty double quotation marks in the code above.

Strings also include a method named concat() that will also perform concatenation. However, it does not modify the original string, so we’ll have to remember to store the result in a string variable in order to use it.

String s1 = "This";
String s2 = "That";

String s3 = s1.concat(s2); // we can store it in a new variable, and the original is unchanged!

System.out.println(s1); // This
System.out.println(s3); // ThisThat

s2 = s2.concat(s1); // we can store it in the same variable!

System.out.println(s2); // ThatThis

Either method works well for concatenating two strings together.

Searching Within Strings

Java also includes several methods that can be used to search within one string for another. We can even specify if we’d like to find the string at the beginning or the end of the string, and it includes methods to give us the location of the string we are searching for. Here’s a great example of several of those methods in action:

String s1 = "abc123abc123";

System.out.println(s1.contains("123")); // true
System.out.println(s1.contains("321")); // false

System.out.println(s1.indexOf("123")); // 3  (the index of the first character)
System.out.println(s1.indexOf("321")); // -1 (it returns -1 if it can't find it)

System.out.println(s1.lastIndexOf("123")); // 9  (it returns the beginning of the last instance)
System.out.println(s1.lastIndexOf("321")); // -1 (it returns -1 if it can't find it)

System.out.println(s1.startsWith("abc")); // true
System.out.println(s1.startsWith("123")); // false

System.out.println(s1.endsWith("abc")); // false
System.out.println(s1.endsWith("123")); // true

Manipulating Strings

Finally, Java includes methods that can be used to manipulate strings in unique ways. It is important to remember that none of these methods modify the original string, so we’ll need to store the result back in a string variable in order to use it. In these examples, we’ll just print the output so we can see the result:

String s1 = "abc123abc123";

// replace takes two characters as input, and replaces all 
// instances of the first character with the second
System.out.println(s1.replace('b', ' ')); // a c123a c123

// substring takes two integers as input, and returns
// all characters starting at the first index up to
// but not including the second index
System.out.println(s1.substring(3, 9)); // 123abc

String s2 = "UPPERlower";

System.out.println(s2.toLowerCase()); // upperlower
System.out.println(s2.toUpperCase()); // UPPERLOWER

String s3 = "  \t Some String  \n \n ";

// trim removes all whitespace characters from the beginning
// and end of the string, including special characters
// such as newlines and tabs. 
String s4 = s3.trim();
    
System.out.println(s4); // Some String
System.out.println(s4.length()); // 11

In Java, we can also get a single character from a string using the charAt method. This is similar to getting a substring of length 1, but in this case it returns a char data type:

String s1 = "abc123";

char c1 = s1.charAt(0);
char c2 = s1.charAt(5);

System.out.println(c1); // a
System.out.println(c2); // 3

This is just a small list of the many operations that can be performed on strings in Java. For more information, consult the official Java documentation linked below.

References

Java String

Subsections of String Operations

String Formatting

YouTube Video

Video Materials

There are also a couple of different approaches we can take to formatting output strings in Java. Let’s take a minute to review both of those and see how they work.

Concatenation

We’ve already seen this approach in several programs in this course. In effect, we can simply build an output string by concatenating strings and the variables we’d like to include in those strings.

For example, if we’d like to create an output string that gives both the sum and the average of a set of numbers, we could do something like this:

int sum = 123;
double avg = 1.23;

System.out.println("The sum is " + sum + " and the average is " + avg + ".");

In this code, we are using the plus symbol + to concatenate strings and variables together into a single output string. In Java, the concatenate operator will automatically convert any primitive data type to a string for us, so we don’t have to worry about that. In many cases, this is the quick and easiest way to present output to the user.

Formatted Strings

Java also includes a special string method, the format() method, which allows us to use placeholders in our output string, and then replace those placeholders with the values stored in variables.

Here’s an example of how to use that method in Java:

int sum = 123;
double avg = 1.23;
String name = "Student";

String output = "%s: Your score is %d with an average of %f.";

System.out.println(String.format(output, name, sum, avg));

When we run this program, the output will be:

Student: Your score is 123 with an average of 1.230000.

There are several unique parts to this code, so let’s break it down and see how this works.

First, instead of using an existing string variable, we are actually using the String class when we use the format() method. This is because the format() method is a static method. Static methods do not require an existing variable to use them, and can be used directly from the class where they are defined. We’ll learn more about how classes and methods work in a later chapter. For now, just remember that we’ll use String.format() whenever we want to use this method.

Inside of the method, the first input is the string that contains the placeholders. In this case, we are using three different placeholders:

  • %s - This placeholder can be replaced by any string, or any variable which can be converted to a string.
  • %d - This placeholder can be replaced by any integer data type, including int, short, byte, or long.
  • %f - This placeholder can be replaced by any floating-point data type, including double and float.

Following that, the rest of the inputs are the variables which should be placed in each placeholder, given in the order they appear in the format string. So, in this example, we want the first placeholder, %s, replaced by the second input, the name variable.

In addition, many of the placeholders can also specify the width and precision of each output. Here’s an updated example using these formatting options:

int sum = 123;
double avg = 1.23;
String name = "Student";

String output = "%s: Your score is %5d with an average of %8.4f.";

System.out.println(String.format(output, name, sum, avg));

When we run this program, we’ll see the output is now this:

Student: Your score is   123 with an average of   1.2300.

So, what happened? First, we updated the second placedholder to %5d. This means that we want the output of that variable to have a width of 5. Since the sum variable would only have 3 characters, the format() method adds two additional spaces in front of the number.

Secondly, we updated the last placeholder to %8.4f. Once again, the number 8 is used to give the width of the output. In addition, we added a 4 after a decimal point to indicate how many characters we’d like to include after the decimal point in the output. So, the total output is 1.2300, which includes four characters after the decimal place, and an additional two spaces in the front. All told, the output is 8 characters in length, including the decimal point.

There are many more ways that a formatted string can be used to create output that meets our needs. We can find more information on using the placeholders and associated settings by reading the official Java documentation linked below.

Reference

Java Formatter Syntax

Subsections of String Formatting

A Worked Example

Now that we’ve explored all of the different ways we can use strings in our programs, let’s walk through a worked example to see how we would go about building a useful program that uses everything we’ve learned so far.

Problem Statement

Consider the following problem statement:

Write a program that will calculate weighted grades for students in a college course. This program should only have a main method.

The input will be given in a comma-delimited format. The first line will contain a number of weights as floating-point numbers, separated by commas. The first entry should be ignored.

All input will be via the keyboard.

Each subsequent line of input will contain information for a student. The first entry on that line will contain that student’s name. The rest of the line will contain that student’s scores on each assignment as an integer value, separated by commas. Input will be terminated by the end of the input file, or by a blank line when input is provided via the terminal.

It is guaranteed that at least two lines of input will be provided, the first containing the weights and at least one additional line containing data for a student. In addition, it is guaranteed that each line of input will contain the same number of parts.

The program should output the student’s name, followed by a colon, and a space, and then the student’s score. The score should be formatted to be exactly 5 characters wide, with exactly two characters after the decimal point.

Complete your solution to this example in Example.java, which is open to the left.

Sample Inputs & Outputs

Here’s an example of the expected input for the program:

Name,0.125,0.125,0.25,0.50
StudentA,75,80,85,90
StudentB,5,15,75,20
StudentC,85,90,70,75

Here is the correct output for that input:

StudentA: 85.63
StudentB: 31.25
StudentC: 76.88

Think and Design Before Coding

Start by sketching the control flow, what kind of loops are appropriate, what variables and arrays will be necessary? What packages will need to be imported?

Handling Input

Next, start with our standard program preamble that we’ve worked with previously in this course:

// Load required classes
import java.util.Scanner;


public class Example{
  
  public static void main(String[] args) throws Exception{
    
    // Scanner variable
    Scanner reader;
    reader = new Scanner(System.in);
   
    /* -=-=-=-=- MORE CODE GOES HERE -=-=-=-=- */
    
  }
  
}

For the rest of this example, we’ll look at a smaller portion of the code. That code can be placed where the MORE CODE GOES HERE comment is in the skeleton above.

Parsing Weights

Next, we’ll need to parse the weights provided on the first line of the input. So, we can begin by reading that line of input:

String weightLine = reader.nextLine();

Then, we can separate that line into its individual parts using the split() method:

String weightLine = reader.nextLine();
String[] weightParts = weightLine.split(",");

Once we’ve done that, we can populate an array of floating point numbers containing the weights. To do this, we know that the number of weights is one less than the size of the weightParts array. However, to make things simpler, we’ll simply create an array with the same size and leave the first element blank. This will help us when we perform the second step below.

String weightLine = reader.nextLine();
String[] weightParts = weightLine.split(",");

double[] weights = new double[weightParts.length];

Next, we can iterate through the weightParts array, and parse each entry to a floating point value and store it in the weights array. In this case, we’ll use a For loop, but this time we’ll start iterating at 1 instead of 0. In this way, we’ll skip the first entry in weightParts, which cannot be converted to a floating point value.

String weightLine = reader.nextLine();
String[] weightParts = weightLine.split(",");

double[] weights = new double[weightParts.length];

for(int i = 1; i < weights.length; i++){
  weights[i] = Double.parseDouble(weightParts[i].trim());
}

Inside of the For loop, we are simply converting each element of weightParts to a floating point value, and then storing the result in the corresponding element in weights.

Also, notice that we’re using weights.length in the Boolean condition of this For loop. In this case, we know that both arrays are the same size, so we can use either weights.length or weightParts.length here.

String.trim()

It is a generally good habit to always .trim() your inputs before parsing if leading/trailing whitespace is unimportant. In our example

Name, 0.125, 0.125, 0.25, 0.50

would crash the program if .trim() were not used.

Parsing Each Student

Once we’ve read the weights, we can parse the data for each student, calculate the result, and print the output, all in a single step.

First, since we are reading an unknown number of lines of input, we’ll need to use a While loop. We saw this loop earlier in this chapter, when we learned about how to handle parsing input of an unknown length.

String line = " ";
while(line.length() > 0){
  line = reader.nextLine();
  if(line.length() > 0){
     // parse the input
  }
  
}

Inside of that loop, once we’ve determined that we’ve indeed read a valid line of input, we can use the same split() method as before to split the input into parts:

  
  String[] parts = line.split(",");

Then, we want to calculate the student’s final grade. So, once again, we’ll create a sum variable and iterate through all of the parts. As before, we’ll start the For loop at 1, just to skip the first element for now:

  String[] parts = line.split(",");
  
  double totalScore = 0.0;
  for(int j = 1; j < parts.length; j++){
    totalScore += weights[j] * Integer.parseInt(parts[j].trim());
  }

Inside of the For loop, we’ll multiply the weight of the assignment by the score. Since we don’t need to store the integer value of each score, we can simply convert it to an integer and then directly use it in our expression.

Formatting Output

Finally, we’ll need to provide our output as a formatted string. Since we want to make sure the output of the totalScore variable is exactly 5 characters wide, with 2 characters after the decimal point, we’ll use the placeholder %5.2f in the format string:

  System.out.println(String.format("%s: %5.2f", parts[0], totalScore));

In the output line, we are providing "%s: %5.2f" as the first input to the String.format() method. In this way, we don’t have to create a separate variable to store the format string, simplifying our code. Then, the second input is the first element in the parts array, which will contain the student’s name. Finally, the last input is the totalScore variable, giving the student’s total score.

Subsections of A Worked Example

Summary

Strings are one of the most useful data types in many computer programs. They allow us to work with real text in our programs, and our users can provide more flexible forms of input that we can parse using string operations in our code.

For the rest of this course, we’ll be using increasingly complex forms of input and output, so knowing how to handle parsing and converting those inputs into the data types we need is a very important skill to practice.