As the semester starts to wrap up, I’m a bit busy and a bit burnt out. As a result, I thought I’d have a little fun writing about nasty little bugs I’ve run into over time. I’ll keep it short; don’t worry!
Table of Contents
Software Bugs
Since the beginning of computing, software has had bugs. It’s sort of an inevitability of anything sufficiently complex.
For the purposes of this article, I’m interested in discussing subtle bugs that require you to have some underlying knowledge of the tools you’re using. In my mind, these kinds of bugs are leaky abstractions that expose the inner workings of the technology and challenge our intuition. That’s what makes them nasty.
So without further ado, let’s talk about some of them!
String Interning
In programming languages like Java, strings are “interned.” Interning is the process of keeping exactly one copy of each string in a program in some pool for lookup. The purpose of this feature is to save space as duplicate strings can share the same reference. It also allows for quicker comparisons as strings can be compared by their references rather than their values.
Unfortunately, I would imagine most folks in the field, especially new folks, are unaware that string interning is happening. This, of course, can lead to truly nasty bugs in languages where there is a distinction between equality and identity operators. For example, in Java, folks are used to using ==
for identity and equals()
for equality. However, ==
is also used for equality with primitive types, so it’s not uncommon to see something like this:
int x = 5; int y = 5; System.out.println(x == y);
Sometimes, folks internalize ==
as the equality operator as a result. Or, maybe they don’t but instead internalize strings as primitive types. In either case, the following code will happen:
String x = "Hello"; String y = "Hello"; System.out.println(x == y);
At a glance, you would probably assume this would print false
, but it doesn’t! It prints true
. The reason is that “Hello” is interned, so both variables end up being aliased.
From this knowledge, you might assume that every string is interned, at least in Java, and that’s not true. Only literals are interned. If you manage to create a string at runtime, the strings will no longer be equal:
String x = "Hello"; String y = new String("Hello"); System.out.println(x == y);
And if this seems a bit impractical, the following code will return false
for strings passed in on the command line.
String x = "Hello"; String y = args[0]; System.out.println(x == y);
Which, of course, is solved by using equals()
:
String x = "Hello"; String y = args[0]; System.out.println(x.equals(y));
Aliasing
Another comically common bug is aliasing. Aliasing in and of itself isn’t really a bug; it’s a feature. However, because it’s so different from how we think about real world entities, it causes so, so many bugs. In fact, we almost outright ban it in our software design courses. You would be surprised by the lengths we go to avoid it (e.g., forcing our clients to remove items from our data structures rather than providing an alias).
At any rate, I think aliasing is most commonly seen when we talk about copying data. For example, it’s not uncommon to want to copy a list (say, in Python). Initially, folks tend to think that all they need to do to copy data is to create another variable, like this:
x = [1, 2, 3] y = x
I think people tend to internalize the assignment operator in this way because it works for immutable data types like numbers and strings. Of course, we know this doesn’t create a copy of the list. It creates a copy of the reference to that list. Therefore, anything we do to y
will happen to x
:
x = [1, 2, 3] y = x y[2] = 4 print(x, y) # prints [1, 2, 4] [1, 2, 4]
Unfortunately, because of this reality, copying becomes somewhat of a nightmare due to aliasing. How do you know you’ve copied all of the values and not just their references?
Similar problems arise in other programming languages like Java, which features aliasing heavily in its APIs. Would you like a subset of this ArrayList
? Well, here’s a “view” of that subset. Don’t make any changes to that view unless you want to modify the original list.
Immutability
When you finally get used to the concept of references in programming, a library will come along that will attempt to eliminate references altogether—instead, opting for immutability. A library that comes to mind for me is pandas, which typically returns copies of your data after every method call.
Of course, when you’re used to mutability, you get used to methods making changes to your data directly. When this isn’t true, you end up with code like the following which will have you banging your head against a wall:
pd.to_numeric(education_df["Grade"])
If you’re not sure what’s wrong with the code above, the problem is that it doesn’t do anything. You think it’s updating the “Grade” column to be a numeric column, but instead it’s creating a new “Grade” column that is numeric. To overwrite the existing “Grade” column, you have to explicitly replace it:
education_df["Grade"] = pd.to_numeric(education_df["Grade"])
This leads to a lot of goofy looking code where the return values of methods are mapped back onto the existing data. It also leads to a lot of duplicated data throughout the code. With that said, pandas does provide an in_place
parameter for a lot of their methods, which treats the data as mutable.
What Are Some Nasty Bugs You Can Think Of?
If I had more time to write, I could probably expand this list indefinitely. As a result, I’m sure you have some experiences with bugs that you’d love to share. Why not let me know?
In the meantime, here are some related pieces you might browse:
- Trust Me! Your Code Isn’t That Bad
- How to Compare Strings in Python: Equality and Identity
- Be Careful When Copying Mutable Data Types
And of course, you can take your support to the next level by heading over to my list of ways to grow the site. Otherwise, take care!
Recent Posts
Recently, I was thinking about the old Pavlov's dog story and how we hardly treat our students any different. While reflecting on this idea, I decided to write the whole thing up for others to read....
In the world of programming languages, expressions are an interesting concept that folks tend to implicitly understand but might not be able to define. As a result, I figured I'd take a crack at...