Limitations in Python
As a result of being immutable, strings can be cumbersome to work with in certain applications. When working with long strings or strings that we are continually appending to, such as in the memory example, we end up creating a lot of sizable copies.
Recall from the memory example the block of pseudocode.
1. function APPENDER(NUMBER, BASE)
2. RESULT = ""
3. loop I from 1 to NUMBER
4. RESULT = RESULT + BASE
5. if I MOD 2 = 0
6. RESULT = RESULT + " "
7. else
8. RESULT = RESULT + ", "
9. end loop
10. return RESULT
11. end function
In this example, what if we changed RESULT
to a mutable type, such as a StringBuilder in Java. Once the loop is done, we can cast RESULT
to a string. By changing just the one aspect of the code, we would make only one copy of RESULT
and have far less character copies.
Java specifically has a StringBuilder class which was created for this precise reason.
Consider the following, and note the slight changes on lines 2, 4, 6, 8 and the additional line 10.
1. function APPENDER_LIST(NUMBER, BASE)
2. RESULT = []
3. loop I from 1 to NUMBER
4. RESULT.APPEND(BASE)
5. if I MOD 2 = 0
6. RESULT.APPEND(" ")
7. else
8. RESULT.APPEND(", ")
9. end loop
10. RESULT = "".JOIN(RESULT)
11. return RESULT
12. end function
Now consider APPENDER_LIST(4,‘abc’)
-
Initialization: We start by initializing the empty array.
RESULT
will hold the pointer0x1
. -
I = 1: Now we have entered the loop and on line 4, we add more characters to our array. At this point, we would have only entry
0x1
in our heap and our variableRESULT
would have the pointer0x1
. Continuing through the code, line 5 determines ifI
is divisible by 2. In this iterationI = 1
, so we take the else branch. We again add characters to our array. In total, we have written 5 characters. We then increment I and move to the next iteration.
-
I = 2: We continue the loop and on line 4, we add more characters to our array. We still have just one entry in memory and our pointer is still
0x1
. In this iteration, we have written 4 characters. We then incrementI
and move to the next iteration of the loop. -
I = 3: We continue the loop and on line 4, we add more characters to our array. In this iteration, we have written 5 characters. We then increment
I
and thusI = 4
breaking out of the loop. -
Post-Loop: Once the loop breaks, we join the array to create the final string. This creates a new place in memory and changes RESULT to contain the pointer
0x2
.
We can do some further analysis of the memory that is required for this particular block.
Iteration | Memory Entries | Character Copies |
---|---|---|
1 | 2 | 8 |
2 | 2 | 17 |
3 | 2 | 26 |
4 | 2 | 35 |
. | . | . |
n | 2 | 9n - 1 |
Again, you need not worry about creating these equations for this course. To illustrate the improvement even more explicitly, let’s consider our previous example with 100K iterations. For APPENDER there were (2x100,000 - 1) = 200,001 memory entries and (9x100,0002 + 7x100,000)/2 = 45 billion character copies. For APPENDER_LIST we now have just 2 memory entries and (9x100,000 - 1) = 899,999 character copies. This dramatic improvement was a result of changing our data structure ever so slightly.
Reference: http://people.cs.ksu.edu/~rhowell/DataStructures/strings/stringbuilders.html