Be Careful When Modifying Data While Using a Java Iterator

Be Careful When Modifying Data While Using a Java Iterator Featured Image

As this semester begins to wrap up, I figured I’d share a little story of how I got very, very familiar with Java iterators.

Table of Contents

Real World Context

For context, I teach a second-year software components course which serves as the last hurdle for students trying to get into the major. Naturally, this course is very stressful for students, and I often have to work extra hard to give them every opportunity to succeed.

Unfortunately, this semester we got swept up in the pandemic and had to convert to online teaching. As a result, we had to make some quick decisions about instruction that changed the way students were going to learn. In particular, we converted all our paper exams to online quizzes.

For some students, this was a major blessing. After all, these quizzes weren’t any more difficult than the exams, and we made them open book. In other words, we made the class just a little bit easier for them to pass.

Of course, students were all over the globe, and they weren’t able to get the help they needed. In addition, students weren’t taking their studies as seriously as they would for an exam. This combination created some pretty abysmal quiz scores.

By the time we hit the fourth quiz, students were pretty upset. In fact, I heard from several instructors that their students were tired of the “trick questions.” As an instructor, this was a little frustrating to hear because these were pretty typical exam questions. We weren’t exactly ramping up the difficulty just for them, yet this was the first time I was hearing these complaints.

Example Problem

Then, something weird happened. We gave them a question that I didn’t really know the answer to, and it went a little something like the following:

What is the value of the Set<NaturalNumber> nums variable after the following code fragment?

Set<NaturalNumber> nums = new SomeSetImplementation<>();
nums.add(new NaturalNumber2(1));
nums.add(new NaturalNumber2(5));
nums.add(new NaturalNumber2(6));
for (NaturalNumber n : nums) {
    n.increment();
}

Naturally, the students’ options are as follows:

  • nums = {1, 5, 6, 2, 6, 7}
  • nums = {2, 6, 7}
  • nums = {1, 5, 6}
  • Cannot tell from the information provided.

Now, for context, there are a few in-house components in this example.

First, a NaturalNumber is a mutable class which represents an unbounded non-negative integer. In other words, a NaturalNumber can range from zero to infinity. In addition, a NaturalNumber can be modified using a series of basic math operations like the following:

  • increment(): adds 1 to this
  • add(NaturalNumber n): adds n to this

In addition, this question makes use of a Set which is akin to a mathematical set. The idea here is that a Set has two main properties:

  1. A Set lacks duplicates (i.e., {1, 2, 1} is not a legal set).
  2. A Set is unordered (i.e., {1, 2, 3} and {3, 2, 1} are equivalent).

For reference, both of these components are thoroughly documented on the course websiteOpens in a new tab., if you’re interested in reading more details. All components are written using Design by Contract, so each method will include a proper contract where the precondition is denoted by @requires and the postcondition is denoted by @ensures.

In addition, we label each parameter using parameter modes like @restores, @updates, @clears, and @replaces. Of course, that’s a bit out of the scope of this piece.

Breaking Down the Problem

Now, I’ll reiterate that I wasn’t sure exactly which answer was correct at first. Obviously, the first answer (i.e., {1, 5, 6, 2, 6, 7}) is incorrect as incrementing the underlying value doesn’t add new values to the Set—or so I thought. Using that same logic, I also assumed the third set (i.e., {1, 5, 6}) was obviously incorrect because we’re clearly mutating the underlying values.

At this point, I was fairly confident that the second answer (i.e., {2, 6, 7}) was correct, as were 87% of my students. Of course, I had the answer key, so I had to challenge myself to understand why the correct answer was actually the final answer (i.e. “Cannot tell from the information provided.”).

Now, based on the title of this article, you might already be way ahead of me. That’s fine! However, I didn’t jump to that conclusion right away. Instead, I took a step back and decided to actually draw out the Set.

Of course, there are a couple major hiccups when you try to do that. First, as I mentioned previously, a Set has no order. As a result, how do we reason about which element goes first during iteration? Do we try all possible configurations?

These were questions that I wasn’t ready to grapple with. Luckily, as it turns out, iterating by order of appearance saves us a lot of time. Take a look:

{1, 5, 6} // Initial state
{2, 5, 6}  // After incrementing the first element
{2, 6, 6}  // After incrementing the second element

Uh oh! We broke our first rule: a Set must not contain duplicates. Therefore, we cannot tell what the resulting Set will look like. My final answer is D: “Cannot tell from the information provided.”

Unfortunately, this explanation wasn’t exactly satisfying to me. Like, I get that a Set cannot contain duplicates, but what are the practical ramifications of breaking that rule? In other words, if it’s so bad, why do we even give the user access to the underlying data?

In my opinion, users should only be able to access the data when they remove it. In general, I think the library does a great job of doing that. If Set didn’t implement Iterable, we’d be in the clear.

Introducing Java Iterators

Which brings me to an even weirder problem: Java iterators. In order for this code to work, Set has to implement Iterable which means defining an Iterator for the underlying architecture.

Now, if you’ve ever written your own Iterator, you know that you need to do something like the following:

new Iterator<T>() {
  @Override
  public boolean hasNext() { ... }
  @Override
  public T next() { ... }
  @Override
  public void remove() { ... }
}

Here, the basic idea is that we define some sort of structure which can serve as a lazy data structure. If you’re familiar with generator expressions from other languages like Python, it’s the same idea: we create an object which can return one item at a time from some sequence of items.

In practice, an Iterator works by continuing to provide items through the next() method until there is nothing left to return (which may never happenOpens in a new tab.). In bounded sequences, we know when to stop because the hasNext() method will return false. Together, these methods can serve as the core of a looping mechanism:

while (iter.hasNext()) {
  T item = next();
}

By making a class implement Iterable, we can then leverage a bit of Java syntactic sugar called the for-each loop:

for (T item: collection) { ... }

Java Iterator Caveats

In the problem defined above, we were able to loop over the Set because it implements Iterable.

Of course, just because we’re able to loop over the data structure doesn’t mean we won’t run into problems. After all, the Iterator class has a few rules of its own. Perhaps the most important rule can be found in the description of the remove() method:

Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to next()Opens in a new tab.. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

Java 8 DocsOpens in a new tab. (captured 04/23/2020)

Remember how I said that modifying a NaturalNumber is bad because it could result in duplicates. Well, based on this definition, modifying a Set could result in unpredictable behavior regardless.

Of course, this raises a question for me: what does it mean to modify the underlying collection. For Java Collections, for-each loops disallow the addition or removal of an item from a collection. In those cases, we can expect to see a ConcurrentModificationException (docsOpens in a new tab.).

Now, that error isn’t universal. After all, how could an Iterator possibly know if a collection had been modified? As it turns out, that behavior is custom baked into the next() method for each collection. With the List collection, for example, the ConcurrentModificationException is thrown when the size of the list changesOpens in a new tab.. In other words, the integrity of the data structure is checked on every invocation of next().

Since the collections leverage generic types, it’s impossible to account for all the different types of situations that could arise. As a result, there’s no way for next() to detect if any data was mutated without tracking state. For example, checking if any values have changed in a list might require storing a copy of the previous state and checking against that previous state regularly. That ain’t cheap!

To make matters worse, we haven’t really talked about what effects modifying underlying data could have on the actual iteration process. For example, if next() somehow depends on the underlying data, changing it would clearly change what comes next.

Imagine for a second that we had an Iterator for a list whose items must implement Comparable. Then, we made this Iterator in such a way that it always returned the next value in sorted order. If we were to then modify underlying values, we could create a loop which never traverses the entire list:

[1, 2, 3]  // next() returns 1 which we scale by 5
[5, 2, 3]  // hasNext() claims there are no other values

Now, that’s not ideal. Typically, you’d expect a for-each loop to actually traverse an entire data structure, and this simply isn’t doing that.

Revisiting the Set Problem

At this point, we’ve had a chance to talk about the Set problem from two different angles:

  1. What happens if we invalidate a Set by generating duplicates?
  2. What happens if we invalidate a for-each loop by modifying the underlying data structure?

Now, I want to take an opportunity to talk about what can actually happen while executing the problem snippet:

Set<NaturalNumber> nums = new SomeSetImplementation<>();
nums.add(new NaturalNumber2(1));
nums.add(new NaturalNumber2(5));
nums.add(new NaturalNumber2(6));
for (NaturalNumber n : nums) {
    n.increment();
}

Assuming the Iterator for Set has no fancy modification detection, one possible outcome is the same Set most people would expect: {2, 6, 7}.

Another possible outcome is we get a Set where only some of the values are incremented. Perhaps, as I stated before, the next() method depends on underlying data to make its decision about what comes next.

In this scenario, we could end up with any combination of incremented outputs:

  • {2, 5, 6}
  • {1, 6, 6}
  • {1, 5, 7}
  • {2, 6, 6}
  • {2, 5, 7}
  • {1, 6, 7}

In either scenario, we’re not exactly safe. Sure, the Set looks the same, but is it really the same?

Let’s imagine for a second that the Set is implemented using a hash table. This offers the advantage of being able to quickly check for duplicates, but it requires a bit more maintenance. For example, if we want to change a value in the Set, we have to recompute the hash and check for collisions.

When we modify the NaturalNumber directly, we skip this maintenance phase. As a result, our hash table will still contain the original three hashes. When someone checks if the Set contains two, for example, the method will incorrectly return false.

Of course, this is an implementation detail. It’s very possible that no issues are detected at all. The program continues to run smoothly, and no one bats an eye. As with all implementation details, however, we can’t depend on their assumed behavior. In other words, the program is still unpredictable.

As a minor aside, the Java implementation of Set actually calls out this exact issue:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

Java Set DocumentationOpens in a new tab. (view on 04/24/2020)

Looks like it’s pretty tough to put together a Set implementation that doesn’t have problems with mutable types. I wonder what that says about mutable types…

What’s the Takeaway?

In the end, I think the Iterator documentation is written in a way that leaves it up to the user to play nice. In other words, when it says:

The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.

It truly means “in any way.” Of course, I was never able to confirm these suspicions, so I’d be interested to see what other folks have to say.

In the meantime, if you liked this article, I’d love it if you took the opportunity learn how you can help grow the site a bit. In that article, you’ll learn about my mailing list as well as my Patreon.

Otherwise, here are a few related posts just for you:

Likewise, here are some helpful resources on Amazon (ad):

Otherwise, thanks for sticking around. Hopefully, my late night grad school ramblings were useful to you!

Coding Tangents (45 Articles)—Series Navigation

As a lifelong learner and aspiring teacher, I find that not all subjects carry the same weight. As a result, some topics can fall through the cracks due to time constraints or other commitments. Personally, I find these lost artifacts to be quite fun to discuss. That’s why I’ve decided to launch a whole series to do just that. Welcome to Coding Tangents, a collection of articles that tackle the edge case topics of software development.

In this series, I’ll be tackling topics that I feel many of my own students have been curious about but never really got the chance to explore. In many cases, these are subjects that I think deserve more exposure in the classroom. For instance, did you ever receive a formal explanation of access modifiers? How about package management? Version control?

In some cases, students are forced to learn these subjects on their own. Naturally, this forms a breeding ground for misconceptions which are made popular in online forums like Stack Overflow and Reddit. With this series, I’m hoping to get back to the basics where these subjects can be tackled in their entirety.

Jeremy Grifski

Jeremy grew up in a small town where he enjoyed playing soccer and video games, practicing taekwondo, and trading Pokémon cards. Once out of the nest, he pursued a Bachelors in Computer Engineering with a minor in Game Design. After college, he spent about two years writing software for a major engineering company. Then, he earned a master's in Computer Science and Engineering. Today, he pursues a PhD in Engineering Education in order to ultimately land a teaching gig. In his spare time, Jeremy enjoys spending time with his wife and kid, playing Overwatch 2, Lethal Company, and Baldur's Gate 3, reading manga, watching Penguins hockey, and traveling the world.

Recent Code Posts