Split

The Java split command also helps you break strings into pieces in a similar manner as StringTokenizer. You can use either technique to break apart a string. The split command is similar to the technique we will be using in C# for this process, so it is especially good to be familiar with both. Suppose you have the following string that you want to break apart:

String fullString = "Bob, Joe, Lisa, Katie";

As before, suppose you want to extract and print each name. We can specify that the names are separated by commas and spaces, and then get a string array of the remaining tokens (in this case, the names:

String[] tokens = fullString.split(", ");

Now, we can loop through the tokens array and print each value:

for (int i = 0; i < tokens.length; i++) 
{
    System.out.println(tokens[i]);
}

This will print:

Bob
Joe
Lisa
Katie

One key difference between StringTokenizer and split is how the delimeters work. In StringTokenizer, the delimeters are individual characters that separate the information we’re interested in. When we reach a delimeter, we keep stepping through the string discarding characters until we reach a non-delimeter.

In split, we specify what separates the tokens using a regular expression. In the case of our example, it matched the string “, " to our list of names, and used it as a separator. If we had put " ,” instead (space then comma), it would not have separated the names at all, because it would have found no occurrence of a space followed by a comma. So for split, when we provide a string as a separator, we look for that ENTIRE string.

There is a lot more we can do with split using regular expressions, including a generic way to process all numbers, or all words matching a particular pattern. That is beyond the scope of this course, although you will likely use regular expressions in later CIS classes. In this class, we will just provide an exact string to split that we want to process.