Missing the Forest for the Trees: Why You Struggle to Solve Problems

Today, I have a messy one for you. What started as a simple article about my students struggling to process feedback turned into an exploration of a famous idiom: missing the forest for the trees. It also turned into another excuse for me to hate on AI. If you’ll humor me, I’d love it if you followed me along on this one.

Reviewing Student Code
Writing a Better Solution
When the Fix to Multiple Return Statements Is Worse
Missing the Forest for the Trees
LLMs Also Miss the Forest for the Trees
How an LLM “Helps” Students
Don’t Let AI Be the Reason You Can’t Solve Problems

Reviewing Student Code

You may recall that I previously gave my take on whether it’s bad to have multiple return statements in a method. Having since seen a lot of students write code with multiple return statements, I’m starting to understand why someone might ban the practice. Let me show you.

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";
    for (int i = 0; i < this.hashTable.length; i++) {
        if (this.hashTable[i].size() > 0) {
            this.size--;
            return this.hashTable[i].removeAny();
        }
    }
    return null;
}

For the record, this is a piece of code that a team of students submitted for an assignment. While working through submissions, I saw several variants of this exact solution—some of which had very interesting commentary:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    for (int i = 0; i < this.hashTable.length; i++) {
        if (this.hashTable[i].size() > 0) {
            // When a key found, decrement size and removeAny
            this.size--;
            return this.hashTable[i].removeAny();
        }
    }

    // This is unreachable since reaching this means contract is violated
    // But we need to make compiler happy ig
    return null;
}

For a bit of context, this method is meant to return some arbitrary value from a hash table. It doesn’t really matter to me which value you choose, so the most sensible solution is to iterate through the hash table until you’ve found a non-empty cell. If you care about performance, maybe you could use a variable to track the last item added to the hash table, so you could quickly remove it. That said, I’m satisfied with a linear runtime.

Writing a Better Solution

Hopefully, when you look at these solutions, you get the feeling that the code could be a bit better. For me, I get the sense that—as my catch phrase goes—”there has to be a better way.”

If you’re wondering what bothers me about the solutions above, it’s not even really the multiple return statements (though, more on that later). Usually, having a return statement in a loop is a good approach because it allows you to end the loop early, which saves some time.

The problem I have is that the method returns null if the hash table is empty. By discipline (as in the one we teach), we tend not to care about the client if they violate any preconditions. It’s on the client to make sure they respect the preconditions.

However, I still care about the client. As a result, I think it’s important that we crash before the method returns. If that seems like a silly distinction, it’s because I think returning null to the client is unkind. It may be that they don’t use the return value for a while, which makes it a lot harder to find the source of the null pointer exception when it finally shows itself.

Think about it this way: when we teach design by contract, we tend to frame it as the client’s fault when they violate the contract. If that’s the case, why not booby trap our methods? If the goal is to ensure the client obeys the contract, we can easily penalize bad inputs (e.g., put the main thread to sleep for a few seconds).

I would assume you’d find that to be an egregious way to write code, but I find returning null or any other unexpected value to be just as bad. Why not just crash the program? The client will know immediately that the arguments they provided were bad in some way.

If you’re wondering at this point how I would write the code to be kinder to the client, I’ll do you one better. How about a solution that is both kinder and cleaner?

Personally, I would write a while loop that searches the array for a non-empty cell. Assuming you store an index as you search, you can access the appropriate cell in your single return statement. It might look like this:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    int i = 0;
    // length check not needed due to precondition
    while (this.hashTable[i].size() == 0) {
        i++;
    }

    return this.hashTable[i].removeAny();
}

Hopefully, it’s clear why I believe this solution is better. From a kindness perspective, if the hash table is empty (i.e., the client violates the precondition), the client only pays for it with a few wasted iterations before receiving an ArrayIndexOutOfBoundsException.

I also believe this solution is better because it uses the appropriate loop syntax (i.e., a while loop). See, this comes back to the distinction between definite and indefinite loops. If you know how many iterations you need, use a definite loop (i.e., a for loop). If you don’t, use an indefinite loop (i.e., a while loop). In this case, we don’t know how many iterations we want, so we use a while loop.

If you’re still not satisfied, I think you might get some value out of a piece I wrote ages ago on the concept of a flag. I think the same principles apply here, where it’s just cleaner to make a loop satisfy the condition you care about. Sure, there are no explicit flags in the student solutions, but you’ll see how they start manifesting when the nested return statement is removed.

When the Fix to Multiple Return Statements Is Worse

Anyway, back to this idea of multiple return statements. See, I’m not the one that grades my students’ assignments—at least not their first submissions. I have undergraduate graders.

When my graders see implementations of removeAny() containing multiple return statements, many of them will mark off like half a point. While I generally don’t agree that the presence of multiple return statements alone is bad, I think it’s clear that there is a much better solution that does not make use of multiple return statements (as shown in the previous section).

Fortunately, I let students address the grader’s feedback, which is when I step in to issue regrades. Unfortunately, my solution is not what I usually see in their resubmission. Instead, I often see something like this:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";
    Pair<K, V> removed = null;
    int i = 0;
    while (removed == null && i < this.hashTable.length) {
        if (this.hashTable[i].size() > 0) {
            removed = this.hashTable[i].removeAny();
            this.size--;
        }
        i++;
    }
    return removed;
}

When I see a solution like this, a few thoughts come to mind.

On a positive note, I’m somewhat happy because they’re now using the appropriate loop syntax. Perhaps making the loop condition longer convinces them to swap to a while loop.

With that said, the additional complexity makes this solution feel worse. You need a variable to track the value you want to return, and you need to introduce additional logic to your loop condition. That’s a lot of new ways for your code to go wrong. I would almost prefer they had just kept their code as-is.

Likewise, because the code is worse, I have to wonder how the student got there and simultaneously never thought that there might be a better way. Of course, in order for a student to do that, they would have to zoom out and reflect on their thought process. My guess is that the reflection never happens: they just try to solve the immediate problem without looking at the bigger picture—like plugging holes in a leaky boat.

In this case, when we ask them to change their code to only have a single return statement, they start by changing their code to accommodate a new variable. Then, they update the method’s return statement, and perhaps they even remember to update the loop condition. At no point do they think that maybe their current solution needs more drastic changes. Instead, they treat problem solving like a hill climbing algorithm destined for local optima.

Missing the Forest for the Trees

In terms of sage wisdom, I think some variant of the idiom “missing the forest for the trees” applies here. I see it all the time as an educator.

When students first start learning how to code, they really get trapped in the details. I know it seems absurd, but I’ve always questioned whether we should even let students loose on an IDE that early in their learning journey. It seems bad to let students get in the habit of treating syntax errors like leaks in a boat: surely one more keystroke will satisfy the compiler. Of course, I’ll never win that argument in the age of “my IDE spews garbage code recommendations at me with every keystroke.”

More recently, I see students miss the forest for the trees every single time I teach heaps. One of the algorithms we ask students to think about is heap sort. By the time we get to the heap sort algorithm, students are exposed to the global shape property and the local order property. We also walk them through how to build a heap and how the sift down algorithm works.

Yet, when I ask them what node they would replace the root with once it’s been removed, 9 out of 10 of them will choose one of the children (i.e., “the root must be the smallest node, so we must replace it with the next smallest node”). This solves the immediate problem of getting the smallest node to the root, but it very quickly creates a new problem with the global shape property. Some students will have the foresight to see this but very few will come up with another solution.

Hell, I even see this in my own life. As someone learning Japanese, I’ve sometimes gotten frustrated with my instructors for teaching details without sharing the broader picture. For example, I told one of my Japanese instructors that I didn’t really understand a concept. He looked at me completely bewildered and said, “but you answered all of the questions correctly.” Yeah, of course I did. I memorized the mechanics of the language, but I had no idea why those mechanics worked. I wanted to understand how those mechanics fit into the bigger picture.

It’s truly unfortunate because we really have this extreme focus on details in education broadly. When I was in undergrad, I got through most of my math classes by pattern recognition. I knew all of the algorithms needed to solve most integrals, for instance. Why those techniques worked or where they might fit in other contexts was a complete mystery to me.

I suspect this “missing the forest for the trees” mentality is partly to blame for why so many kids think they’re bad at math. They know the algorithms for arithmetic, but they have no idea how those skills fit together in the bigger picture (e.g., how are addition, subtraction, multiplication, and division related?). Then, algebra and geometry come along, and they’re expected to make the leap in logic without a holistic understanding of the fundamentals.

Personally, I love taking details and putting them in the context of the bigger picture. That’s how the “aha!” moments happen. That’s how you will get students to say things like:

“I wish someone would have told me this sooner.”
“Oh, that’s why experts do that. I always wondered.”
“Oh, that’s what that’s for!?”

These are moments where students are piecing together the bigger picture for themselves, and they’ll be better off for it.

LLMs Also Miss the Forest for the Trees

If you’re wondering why I went on preachy tangent there it’s because a really weird thing happened when I was putting this article together. I had the genius idea to ask ChatGPT to remove the multiple return statements from the student’s code.

You can probably guess why I did this. After seeing several seemingly identical fixes for the multiple return statements problem, I had a concern: are all of my students missing the forest for the trees or are they getting outside help?

While it’s basically impossible to answer this question, I got pretty damn close with a sloppy query to ChatGPT:

A screenshot of a response from ChatGPT to a request to reduce a code snippet to a single return statement. ChatGPT "solves" this problem by, in its words, "using a temporary variable to hold the value to return."

Trust me: it physically hurts me to use genAI tools. I wouldn’t have an entire series ranting about the tech if it didn’t. However, given how painful it was to see the same “solution” repeatedly, the output of an LLM might as well have been just one more assignment to grade. And, as you’ll see shortly, this was good fun for me.

One of the things that should be immediately obvious to you is that this is almost identical to the “solution” one of the teams of students submitted. I don’t think that’s necessarily proof that the student just asked ChatGPT to fix their code, but it does draw up a couple of scenarios for me:

It’s possible that students are not doing their work honestly. They are simply taking my graders’ feedback and feeding it directly into an LLM (something I’ve ranted about before).
It’s possible that students are all producing similar code because they all tend to “miss the forest for the trees,” as I sort of argued previously. Perhaps that’s even a trend that goes far beyond students and is simply the most common way of solving this problem. LLMs then reproduce this behavior from their training sets (i.e., all known code ever written).

In either case, I don’t really like where things are headed from an educational standpoint. If (like at my university) students are encouraged to use generative AI tools in their education, it doesn’t really matter which scenario we’re talking about.

In scenario 1, students aren’t learning at all. Any work they should be doing themselves is being outsourced to a bot. Therefore, they aren’t learning, and they aren’t developing expertise.

In scenario 2, students do their own work, but they might consult the LLM for help. People always tout this as the “ethical” or “correct” approach to using chat bots for work (i.e., it’s fine if you only use it for brainstorming). I think this might be just as bad as scenario 1 if the LLM is going to “teach” students to “miss the forest for the trees.” Right? Like, suppose a student asks the chat bot how to address some feedback. Do you think the solution provided is helpful? I don’t. I think it steers them in the wrong direction.

But, maybe you don’t agree, so let me take you down the rabbit hole with ChatGPT.

How an LLM “Helps” Students

In the previous screenshot, you can see the code and a small explanation ChatGPT gives for its solution. Because it’s still fundamentally a chat bot, it’s not like it gave its response and terminated itself like Mr. Meeseeks. It tried to keep the conversation going. In this case, it gave me the most insane offer right after explaining its solution:

If you want, I can also show a slightly cleaner version using a break statement that’s even more idiomatic in Java while still having one return. Do you want me to do that?

Naturally, I said “yes” because I needed to see how break could possibly improve the solution. It’s, uh, interesting:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    Pair<K, V> removedPair = null;

    for (int i = 0; i < this.hashTable.length; i++) {
        if (this.hashTable[i].size() > 0) {
            removedPair = this.hashTable[i].removeAny();
            this.size--;
            break; // exit the loop immediately after removing an element
        }
    }

    return removedPair;
}

Again, I don’t know how to stress this, but this solution is virtually identical to multiple return statements. You’ve just swapped the return out for break.

Of course, in its infinite wisdom, ChatGPT offered the following:

If you want, I can also show a version without even a temporary variable, which is possible with clever indexing—but it might be overkill.

Do you want me to show that version too?

You can imagine this had me intrigued. Make sure you’re not drinking anything. I wouldn’t want there to be a spit-take when you see this solution:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    int index = -1;

    // Find the first non-empty bucket
    for (int i = 0; i < this.hashTable.length; i++) {
        if (this.hashTable[i].size() > 0) {
            index = i;
            break;
        }
    }

    // Remove the element and update size
    if (index >= 0) {
        this.size--;
        return this.hashTable[index].removeAny();
    } else {
        return null; // theoretically unreachable because of the assert
    }
}

These LLMs always seem to think the solution to any coding problem is more code. It actually drives me insane.

I’m sure you can see where this is going. It then offered me an “elegant one-liner” using streams. I was already in this deep, so I figured I might as well see what it could come up with next:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    Optional<Pair<K, V>> removedPair = Arrays.stream(this.hashTable)
        .filter(bucket -> bucket.size() > 0)   // Find non-empty buckets
        .findFirst()                            // Take the first one
        .map(bucket -> {
            this.size--;
            return bucket.removeAny();          // Remove the element
        });

    return removedPair.orElse(null);           // One return statement
}

I’m sure you are as happy as me to find that this is not, in fact, a one-liner. Even excluding the return statement, the lambda expression uses multiple lines of code. Pedantry aside, I’m starting to feel insane for suggesting that the solution is a single while loop with a counter.

Luckily, ChatGPT offered me “a version without Optional or streams that’s still one return and only 3–4 lines—super compact and efficient for production code.” Would it surprise you if it literally just gave me the original code back? Because that’s what it did:

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    for (int i = 0; i < this.hashTable.length; i++) {
        if (this.hashTable[i].size() > 0) {
            this.size--;
            Pair<K, V> result = this.hashTable[i].removeAny();
            return result; // Only one return in method's flow
        }
    }

    return null; // theoretically unreachable due to assert
}

How do people use these tools without going completely insane? Would you be surprised if I told you that it somehow kept coming up with more and more absurd solutions?

If you want, I can make a true single-return version that doesn’t even need a result variable, which is as minimal as it gets without breaking readability. That one is a bit tricky but very neat.

@Override
public final Pair<K, V> removeAny() {
    assert this.size() > 0 : "Violation of: this /= empty_set";

    int index = -1;

    // Find the first non-empty bucket
    for (int i = 0; i < this.hashTable.length; i++) {
        if (this.hashTable[i].size() > 0) {
            index = i;
            break;
        }
    }

    // Only one return at the end
    return (index >= 0) ? (this.size--, this.hashTable[index].removeAny()) : null;
}

Yeah, so uh, this is horrible, right? That last line does something I don’t think I’ve ever seen in Java code, and I’m not talking about the ternary. What the hell is (this.size--, this.hashTable[index].removeAny())? Does Java let you write comma-separated lines of code in a ternary? What value does it even return?

After some digging, this is apparently legal code, but you can imagine what most developers think about it. I can tell you right now it’s going in my obfuscation arsenal. Y’all know how much I love horrible language features.

Don’t Let AI Be the Reason You Can’t Solve Problems

Anyway, I didn’t mean to go down this rabbit hole. I was literally just planning to talk about how students tend to struggle with rigid style rules. With their limiting coding skills, they tend to opt for simple mechanical manipulations (e.g., replace the return statement with a flag). Instead, they need to zoom out and think about alternative ways of solving the problem.

What I didn’t expect was to see an LLM make the exact same mistakes my students make when asked to approach a problem differently. My fear then is that if students are using these tools to help them code or even help them think about coding, then they’re not going to improve.

Since you made it this far, I won’t bore you with the usual asks. That said, if you’d like to keep reading, here are some of my favorite pieces from recent memory:

Of course, you’re welcome to check out my list of ways to grow the site. I’m happy to continue doing this for free, but a little bit of support wouldn’t hurt. Anyway, take care!

The Hater's Guide to Generative AI (19 Articles)—Series Navigation

As a self-described hater of generative AI, I figured I might as well group up all my related articles into one series. During the earlier moments in the series, I share why I’m skeptical of generative AI as a technology. Later, I share more direct critiques. Feel free to follow me along for the ride.

← Previous Post: [#18]