Java Strings
Strings in Java
Strings in Java
String variables in Java can be created just like any other variable type we’ve seen so far. To declare a string variable, we can use the following syntax:
String s;
Notice that the keyword String
in Java is capitalized. This is because we are actually referring to a class named String
that is a part of the Java programming language and not a simple data type. This tends to cause new programmers quite a bit of problems, so it is important to remember that this particular data type is capitalized in Java.
We can of course then instantiate our variable by assigning a value to it, as in this example:
String s;
s = "This is a string!";
The text itself must be placed in double quotation marks as seen in this example. This allows the Java compiler determine what part of the source code file should be treated as text instead of code.
As always, we can do both steps on a single line as well:
String s = "This is a string!";
Java supports several special characters that can use in our strings to represent specific symbols. For example, we know that strings must be surrounded by double quotation marks. So, what if we want to include quotation marks in our string?
We can use \"
as a special character to represent a double quote in our string. Here’s an example:
String s = "This is \"a quote\"";
System.out.println(s);
This code segment would produce the following output:
This is "a quote"
There are several special characters we can include in our strings. Here are a few of the more common ones:
\'
- Single Quotation Mark (usually not required)\"
- Double Quotation Mark\n
- New Line\t
- Tab\\
- The backslashCharacter variables are created using the char
data type in a similar way:
char c;
c = 'a';
char d = 'b';
In Java, characters are placed in single quotation marks as seen above.
Finally, we can also create a string from an array of characters, as in this example:
char[] c = {'H', 'e', 'l', 'l', 'o', '!'};
String s = new String(c);
System.out.println(s); // Hello!
Here, we are using the new
keyword to create a new String object. Then, we are using the variable c
as input to the that object’s constructor. We’ll learn more about creating objects and using constructors in a later chapter, but it is important to know that it is possible to create a string from an array of characters quickly.
In many programs, we’ll be reading input from the user into a string variable, and then parsing that input into the various data types we need. Parsing is a two-step process: tokenization and conversion.
Let’s explore parsing by starting with a program that reads a line from the keyboard.
// Load required classes
public class Example{
public static void main(String[] args) throws Exception{
// Scanner variable
Scanner reader = new Scanner(System.in);
/* -=-=-=-=- MORE CODE GOES HERE -=-=-=-=- */
}
}
This code creates a variable named reader
, which is a Scanner object in Java. we recommend you always read in Strings using the .nextLine()
method. Once we’ve read in a line, we split it into tokens (parts).
Tokenization refers to splitting a large input into smaller parts, its tokens. Each token is delimited by special characters, called delimiters. In normal text, words (the tokens) are delimited by so called “white space”^[which derives from the blank spaces on standard paper]. In a computer String variable these spaces are not blank, but rather contain “unprintable” characters as shown in the following table
ASCII | char | thing |
---|---|---|
32 | ' ' | space |
9 | '\t' | tab |
13 | '\r' | carriage return |
10 | '\n' | new line |
In general the problem statement or program specification will provide some clue as to appropriate delimiters.
For example, let’s say we’d like to tokenize the following input from the user:
This 1 2.0 true
The second line
We’ll assume that we’ve already created our reader
variable using the skeleton code given above. So, to parse this input, we could use the following code
String line1 = reader.nextLine();
// line1 == "This 1 2.0 true"
String[] line1Parts = line1.split(" ");
// line1Parts == {"This", "1", "2.0", "true"}
String line2 = reader.nextLine();
// line2 ==
String[] line2Parts = line1.split(" ");
// line2Parts ==
Let’s go through this code and see how it works. First, we read a single line of input from the user using the reader.nextLine()
method. Then, we split that line into individual parts using the split
method of the string variable line1
. Inside of the split method, we need to give the string that we’d like to use as our delimiter. So, in this case, we’ll just provide a string that contains a single space to use the space character as our delimiter.
That will create an array of strings named line1Parts
, which will contain four elements. In this case, the first element will be “This”, the second element will be “1” and so on.
We then do the same process again for the second line. What will the values of line2
and line2Parts
be?
Technically, Java’s String.split()
method we are using actually uses a regular expression to perform the split operation. A regular expression is a specially formatted string that is used to define a search pattern in a string. For example, we could write a regular expression to match words that begin with a number, contain at least three letters, and then end with the letter “a”. That regular expression would be “\b\d.{3,}a\b”, by the way.
So, as input, we are not just providing a delimiter as a string, but we are actually creating a regular expression that the computer users to determine where to split the string. Thankfully, if we provide a single character in a string as input, it will simply look for that character in the string, and split the string anywhere that it finds the character we provide.
So, we can just pretend we are providing a single delimiter string to this operation for now, but behind the scenes it is capable of doing so much more.
You can learn more about regular expressions in Java here:
First, we determine if a token can be converted into a literal value of a certain type^[Often you will know the intended type of the input]. Consider the token “2.”.
double temp = 2.0;
will compile just fine.int temp = 2.0;
does not compile.What about the token “2”, can it be converted to : double, int?
Once we determine the token can be converted, we do the conversion.
For example, let’s say the user has provided the following text as input:
1 This 2.0 is true
We could parse that input into individual variables using this block of code:
String line = reader.nextLine();
String[] tokens = line.split(" ");
int i1 = Integer.parseInt(tokens[0]); // 1
String s1 = tokens[1]; // This
double d = Double.parseDouble(tokens[2]); // 2.0
String s2 = tokens[3]; // is
boolean b = Boolean.parseBoolean(tokens[4]); // true
When using a while-loop to read from the terminal, we must use a sentinel value to “signal” the end of input. Typically, an empty line^[just hit the return/enter key] is used. Scanner.nextLine()
returns the empty string in this case. Then, we can use an If-Then statement to determine if the user is finished providing input.
Here’s a great way to handle this situation in Java:
String line = " "; // a space
while(line.length() >0){
line = reader.nextLine();
if(line.length() > 0){
// parse the input
}
}
In this case, the program will continue to read input from the user until the user enters a blank line of input by just pressing the Enter key on the keyboard.
Let’s take a minute to get some practice parsing strings of input for our programs.
Complete ‘StringParsing.java’ to meet the following problem statement:
Write a program that can find the sum of an undetermined number of inputs provided on two lines. The first line of input will contain one or more integers, separated by spaces. The second line of input will contain one or more floating point numbers, separated by commas. The program should output the sum of all inputs provided.
So, for example, if our program receives the following input:
1 2 3 4 5
1.25,2.5,3.75
we should print out “22.5” as the result.
Assuming we already have our skeleton code, we can quickly work through this problem statement. First, we’ll need to read a line of input and split it using the space character as our delimiter:
String input = reader.nextLine();
String[] splits = input.split(" ");
That’s simple enough. Now, since we don’t know how many inputs might have been provided, we’ll have to use a FOREACH^[the Java nomenclature is enhanced-For loop] loop to iterate over the inputs:
String input = reader.nextLine();
String[] splits = input.split(" ");
for(String s : splits){
}
Then, inside of the FOREACH loop, we can just convert each input as an integer and then add it to a sum variable. We’ll have to create the sum variable outside of the FOREACH loop, because we’ll want it available outside of the loop. We’ll make that variable a floating point data type, since we’ll be adding floating point numbers to it from the second line of input.
String input = reader.nextLine();
String[] splits = input.split(" ");
double sum = 0.0;
for(String s : splits){
sum += Integer.parseInt(s);
}
Next, we can read the next line of input from the user, and then split it using a comma as a delimiter.
String input = reader.nextLine();
String[] splits = input.split(" ");
double sum = 0.0;
for(String s : splits){
sum += Integer.parseInt(s);
}
input = reader.nextLine();
splits = input.split(",");
Notice that we are able to reuse the variables input
and splits
here. This is handy, so we only have to manage one set of variables as we parse multiple lines of input.
Finally, we can use another FOREACH loop to iterate across the second set of inputs, parse them into a floating point value, and then add them to the sum variable. Finally, at the end, we’ll print out the value of the sum
variable.
String input = reader.nextLine();
String[] splits = input.split(" ");
double sum = 0.0;
for(String s : splits){
sum += Integer.parseInt(s);
}
input = reader.nextLine();
splits = input.split(",");
for(String s : splits){
sum += Double.parseDouble(s);
}
System.out.println(sum);
The string data type includes many built-in operations that we can use to compare, manipulate, and search within strings. We’ll cover several of them on this page, and we’ll also include links at the bottom to additional resources where all of them are listed.
First and foremost is the length()
method. It allows us to find the number of characters in a string.
String s = "This";
System.out.println(s.length()); // 4
String t = "This \"is\" that";
System.out.println(t.length()); // 14
Notice that the second string, stored in variable t
, only contains 14 characters. That is because \"
only counts as a single character in the output, so it is stored as a single character in the string. The same applies to any of the special characters we’ve seen so far in this chapter.
Next, we can use special methods in Java to compare two strings. First, we must use the equals()
method to determine if two strings are equal (meaning they contain exactly the same characters in the same order), as in this example:
String s1 = "This";
String s2 = "This";
String s3 = "this";
System.out.println(s1.equals(s2)); // true
System.out.println(s1.equals(s3)); // false
When comparing two strings in Java, we cannot use the equality ==
operator. This is because Java stores strings as an object, and not a primitive data type such as the integers and floating point numbers we’ve seen so far.
When using the equality operator, it will test to see if the two objects are exactly the same, not the contents of the string.
Here’s an example:
String s1 = "This";
String s2 = s1;
String s3 = new String("This");
System.out.println(s1 == s2); // true
System.out.println(s1 == s3); // false
In this case, even though all three strings contain the same data, they may not be the same objects in memory. So, we must always use the equals()
method instead.
Similarly, we can use the compareTo()
method to compare two strings and see which one should be placed first in lexicographic order. Consider this example:
String s1 = "This";
String s2 = "That";
int x = s1.compareTo(s2);
In this example, x
will be a negative number if s1 should come before s1, a positive number if s2 should come before s1, and exactly 0 if the two strings are the same.
While this may seem a bit complex, there is actually a great way to remember how this works. Whenever we would normally want to say s1 < s2
, we’ll instead say s1.compareTo(s2) < 0
. In effect, we replace the left side with s1.compareTo(s2)
, and then replace the right side with 0, leaving the sign the same. This simple conversion works for all comparison operators:
s1 < s2
→ s1.compareTo(s2) < 0
s1 <= s2
→ s1.compareTo(s2) <= 0
s1 > s2
→ s1.compareTo(s2) > 0
s1 >= s2
→ s1.compareTo(s2) >= 0
s1 == s2
→ s1.compareTo(s2) == 0
Another common string operation is concatenation, or joining two strings together. This operation is actually very simple, and there are multiple ways to do it.
First, we can use the +
operator to concatenate any two strings together. In addition, if at least one of the operands is a string, Java will automatically convert the other operand to a string, if possible.
Here are a few examples:
String s1 = "This";
String s2 = "That";
int x = 42;
String s3 = s1 + s2;
String s4 = "" + x;
System.out.println(s3); // ThisThat
System.out.println(s4); // 42
As we can see, one neat way to convert any primitive data type to a string is to simply concatenate it with an empty string literal, represented by empty double quotation marks in the code above.
Strings also include a method named concat()
that will also perform concatenation. However, it does not modify the original string, so we’ll have to remember to store the result in a string variable in order to use it.
String s1 = "This";
String s2 = "That";
String s3 = s1.concat(s2); // we can store it in a new variable, and the original is unchanged!
System.out.println(s1); // This
System.out.println(s3); // ThisThat
s2 = s2.concat(s1); // we can store it in the same variable!
System.out.println(s2); // ThatThis
Either method works well for concatenating two strings together.
Java also includes several methods that can be used to search within one string for another. We can even specify if we’d like to find the string at the beginning or the end of the string, and it includes methods to give us the location of the string we are searching for. Here’s a great example of several of those methods in action:
String s1 = "abc123abc123";
System.out.println(s1.contains("123")); // true
System.out.println(s1.contains("321")); // false
System.out.println(s1.indexOf("123")); // 3 (the index of the first character)
System.out.println(s1.indexOf("321")); // -1 (it returns -1 if it can't find it)
System.out.println(s1.lastIndexOf("123")); // 9 (it returns the beginning of the last instance)
System.out.println(s1.lastIndexOf("321")); // -1 (it returns -1 if it can't find it)
System.out.println(s1.startsWith("abc")); // true
System.out.println(s1.startsWith("123")); // false
System.out.println(s1.endsWith("abc")); // false
System.out.println(s1.endsWith("123")); // true
Finally, Java includes methods that can be used to manipulate strings in unique ways. It is important to remember that none of these methods modify the original string, so we’ll need to store the result back in a string variable in order to use it. In these examples, we’ll just print the output so we can see the result:
String s1 = "abc123abc123";
// replace takes two characters as input, and replaces all
// instances of the first character with the second
System.out.println(s1.replace('b', ' ')); // a c123a c123
// substring takes two integers as input, and returns
// all characters starting at the first index up to
// but not including the second index
System.out.println(s1.substring(3, 9)); // 123abc
String s2 = "UPPERlower";
System.out.println(s2.toLowerCase()); // upperlower
System.out.println(s2.toUpperCase()); // UPPERLOWER
String s3 = " \t Some String \n \n ";
// trim removes all whitespace characters from the beginning
// and end of the string, including special characters
// such as newlines and tabs.
String s4 = s3.trim();
System.out.println(s4); // Some String
System.out.println(s4.length()); // 11
In Java, we can also get a single character from a string using the charAt
method. This is similar to getting a substring of length 1, but in this case it returns a char
data type:
String s1 = "abc123";
char c1 = s1.charAt(0);
char c2 = s1.charAt(5);
System.out.println(c1); // a
System.out.println(c2); // 3
This is just a small list of the many operations that can be performed on strings in Java. For more information, consult the official Java documentation linked below.
There are also a couple of different approaches we can take to formatting output strings in Java. Let’s take a minute to review both of those and see how they work.
We’ve already seen this approach in several programs in this course. In effect, we can simply build an output string by concatenating strings and the variables we’d like to include in those strings.
For example, if we’d like to create an output string that gives both the sum and the average of a set of numbers, we could do something like this:
int sum = 123;
double avg = 1.23;
System.out.println("The sum is " + sum + " and the average is " + avg + ".");
In this code, we are using the plus symbol +
to concatenate strings and variables together into a single output string. In Java, the concatenate operator will automatically convert any primitive data type to a string for us, so we don’t have to worry about that. In many cases, this is the quick and easiest way to present output to the user.
Java also includes a special string method, the format()
method, which allows us to use placeholders in our output string, and then replace those placeholders with the values stored in variables.
Here’s an example of how to use that method in Java:
int sum = 123;
double avg = 1.23;
String name = "Student";
String output = "%s: Your score is %d with an average of %f.";
System.out.println(String.format(output, name, sum, avg));
When we run this program, the output will be:
Student: Your score is 123 with an average of 1.230000.
There are several unique parts to this code, so let’s break it down and see how this works.
First, instead of using an existing string variable, we are actually using the String
class when we use the format()
method. This is because the format()
method is a static method. Static methods do not require an existing variable to use them, and can be used directly from the class where they are defined. We’ll learn more about how classes and methods work in a later chapter. For now, just remember that we’ll use String.format()
whenever we want to use this method.
Inside of the method, the first input is the string that contains the placeholders. In this case, we are using three different placeholders:
%s
- This placeholder can be replaced by any string, or any variable which can be converted to a string.%d
- This placeholder can be replaced by any integer data type, including int
, short
, byte
, or long
.%f
- This placeholder can be replaced by any floating-point data type, including double
and float
.Following that, the rest of the inputs are the variables which should be placed in each placeholder, given in the order they appear in the format string. So, in this example, we want the first placeholder, %s
, replaced by the second input, the name
variable.
In addition, many of the placeholders can also specify the width and precision of each output. Here’s an updated example using these formatting options:
int sum = 123;
double avg = 1.23;
String name = "Student";
String output = "%s: Your score is %5d with an average of %8.4f.";
System.out.println(String.format(output, name, sum, avg));
When we run this program, we’ll see the output is now this:
Student: Your score is 123 with an average of 1.2300.
So, what happened? First, we updated the second placedholder to %5d
. This means that we want the output of that variable to have a width of 5. Since the sum
variable would only have 3 characters, the format()
method adds two additional spaces in front of the number.
Secondly, we updated the last placeholder to %8.4f
. Once again, the number 8 is used to give the width of the output. In addition, we added a 4 after a decimal point to indicate how many characters we’d like to include after the decimal point in the output. So, the total output is 1.2300
, which includes four characters after the decimal place, and an additional two spaces in the front. All told, the output is 8 characters in length, including the decimal point.
There are many more ways that a formatted string can be used to create output that meets our needs. We can find more information on using the placeholders and associated settings by reading the official Java documentation linked below.
Now that we’ve explored all of the different ways we can use strings in our programs, let’s walk through a worked example to see how we would go about building a useful program that uses everything we’ve learned so far.
Consider the following problem statement:
Write a program that will calculate weighted grades for students in a college course. This program should only have a
main
method.
The input will be given in a comma-delimited format. The first line will contain a number of weights as floating-point numbers, separated by commas. The first entry should be ignored.
All input will be via the keyboard.
Each subsequent line of input will contain information for a student. The first entry on that line will contain that student’s name. The rest of the line will contain that student’s scores on each assignment as an integer value, separated by commas. Input will be terminated by the end of the input file, or by a blank line when input is provided via the terminal.
It is guaranteed that at least two lines of input will be provided, the first containing the weights and at least one additional line containing data for a student. In addition, it is guaranteed that each line of input will contain the same number of parts.
The program should output the student’s name, followed by a colon, and a space, and then the student’s score. The score should be formatted to be exactly 5 characters wide, with exactly two characters after the decimal point.
Complete your solution to this example in Example.java
, which is open to the left.
Here’s an example of the expected input for the program:
Name,0.125,0.125,0.25,0.50
StudentA,75,80,85,90
StudentB,5,15,75,20
StudentC,85,90,70,75
Here is the correct output for that input:
StudentA: 85.63
StudentB: 31.25
StudentC: 76.88
Start by sketching the control flow, what kind of loops are appropriate, what variables and arrays will be necessary? What packages will need to be imported?
Next, start with our standard program preamble that we’ve worked with previously in this course:
// Load required classes
import java.util.Scanner;
public class Example{
public static void main(String[] args) throws Exception{
// Scanner variable
Scanner reader;
reader = new Scanner(System.in);
/* -=-=-=-=- MORE CODE GOES HERE -=-=-=-=- */
}
}
For the rest of this example, we’ll look at a smaller portion of the code. That code can be placed where the MORE CODE GOES HERE
comment is in the skeleton above.
Next, we’ll need to parse the weights provided on the first line of the input. So, we can begin by reading that line of input:
String weightLine = reader.nextLine();
Then, we can separate that line into its individual parts using the split()
method:
String weightLine = reader.nextLine();
String[] weightParts = weightLine.split(",");
Once we’ve done that, we can populate an array of floating point numbers containing the weights. To do this, we know that the number of weights is one less than the size of the weightParts
array. However, to make things simpler, we’ll simply create an array with the same size and leave the first element blank. This will help us when we perform the second step below.
String weightLine = reader.nextLine();
String[] weightParts = weightLine.split(",");
double[] weights = new double[weightParts.length];
Next, we can iterate through the weightParts
array, and parse each entry to a floating point value and store it in the weights
array. In this case, we’ll use a For loop, but this time we’ll start iterating at 1 instead of 0. In this way, we’ll skip the first entry in weightParts
, which cannot be converted to a floating point value.
String weightLine = reader.nextLine();
String[] weightParts = weightLine.split(",");
double[] weights = new double[weightParts.length];
for(int i = 1; i < weights.length; i++){
weights[i] = Double.parseDouble(weightParts[i].trim());
}
Inside of the For loop, we are simply converting each element of weightParts
to a floating point value, and then storing the result in the corresponding element in weights
.
Also, notice that we’re using weights.length
in the Boolean condition of this For loop. In this case, we know that both arrays are the same size, so we can use either weights.length
or weightParts.length
here.
It is a generally good habit to always .trim()
your inputs before parsing if leading/trailing whitespace is unimportant. In our example
Name, 0.125, 0.125, 0.25, 0.50
would crash the program if .trim()
were not used.
Once we’ve read the weights, we can parse the data for each student, calculate the result, and print the output, all in a single step.
First, since we are reading an unknown number of lines of input, we’ll need to use a While loop. We saw this loop earlier in this chapter, when we learned about how to handle parsing input of an unknown length.
String line = " ";
while(line.length() > 0){
line = reader.nextLine();
if(line.length() > 0){
// parse the input
}
}
Inside of that loop, once we’ve determined that we’ve indeed read a valid line of input, we can use the same split()
method as before to split the input into parts:
String[] parts = line.split(",");
Then, we want to calculate the student’s final grade. So, once again, we’ll create a sum variable and iterate through all of the parts. As before, we’ll start the For loop at 1, just to skip the first element for now:
String[] parts = line.split(",");
double totalScore = 0.0;
for(int j = 1; j < parts.length; j++){
totalScore += weights[j] * Integer.parseInt(parts[j].trim());
}
Inside of the For loop, we’ll multiply the weight of the assignment by the score. Since we don’t need to store the integer value of each score, we can simply convert it to an integer and then directly use it in our expression.
Finally, we’ll need to provide our output as a formatted string. Since we want to make sure the output of the totalScore
variable is exactly 5 characters wide, with 2 characters after the decimal point, we’ll use the placeholder %5.2f
in the format string:
System.out.println(String.format("%s: %5.2f", parts[0], totalScore));
In the output line, we are providing "%s: %5.2f"
as the first input to the String.format()
method. In this way, we don’t have to create a separate variable to store the format string, simplifying our code. Then, the second input is the first element in the parts
array, which will contain the student’s name. Finally, the last input is the totalScore
variable, giving the student’s total score.