It’s a special day when I cover a Java topic. In this one, we’re talking about Enums, and the problem(s) they are intended to solve.
Table of Contents
- Introducing the Problem: Storing Categorical Data
- Using Strings to Track Categorical Data
- Using Integers to Track Categorical Data
- Using Constants to Track Categorical Data
- Using Enums to Track Categorical Data
- Should You Use Enums?
Introducing the Problem: Storing Categorical Data
Recently, I was giving my students an introduction to Enums (which I pronounce as “ee-numb”) because we use them in some of our APIs. However, I didn’t give us enough time to actually cover the topic, so this article is intended to go more in depth for the folks who were interested.
Typically, the way that I introduce enums as a concept is to describe a scenario where enums might be useful and ask students for their approaches. For example, imagine we want to make an object to store a hockey player. There is a lot of information that we might want to store about that player, but for the sake of argument, how might you store the player’s position?
In the world of data science, the player’s position would be referred to as categorical data or nominal data. Categorical data is generally limited in what can be done with it, but in the data visualization space, categories are useful for doing things like coloring points in a scatterplot or grouping data for analysis. As a result, I’m just going to refer to this as the problem of storing categorical data.
Therefore, given the hockey player example, how might you store categorical data? Generally, there are two answers that most folks (i.e., my students) give. The “obvious” answer is to use a string while the slightly less obvious answer is to use an integer. Next, we’ll take a look at these options in detail.
Using Strings to Track Categorical Data
Let’s say we use a String to store our player’s position. That might look as follows in Java:
public class HockeyPlayer { private String position; }
After all, we could store the positions easily as strings (e.g., “Goaltender”, “Center”, “Left Winger”, etc.). But, what’s the problem? In general, I try to stay away from strings whenever possible because of a really common problem: typos. As a silly example of where things can go wrong, consider the following method:
public class HockeyPlayer { private String position; private int goals; private int assists; private int blocks; private int saves; public int getPrimaryMetric() { switch (this.position) { case "Left Winger": case "Right Winger": case "Centre": return this.goals; case "Left Defenseman": case "Right Defenseman": return this.blocks; case "Goaltender": return this.saves; default: return this.blocks; } } }
Like I said, it’s a silly example, but let’s imagine that forwards, defenseman, and goalies have some primary statistic that they’re categorized by. It’s silly because defenseman aren’t really categorized by their blocked shots, but you get the idea.
Anyway, have you spotted the bug yet? If not, it’s the “Centre” typo, and it’s a subtle one. After all, “Centre” is a reasonable spelling in British English. Typically, however, you will see it spelled as “Center” in the context of hockey. The consequence being we return blocks for centers rather than goals.
Now, how long do you think it would take you to spot this bug? If you do proper testing, it might not take you very long to uncover it. However, it’s more likely that this is a private helper method that you’re using throughout your code. In that case, how long do you think it will take you to notice this bug?
Of course, typos aren’t the only problems strings have. You also need to remember to use .equals()
over the equality operator. Likewise, there are plenty of ways that strings can end up with zero width characters in them or other strange characters that resemble more common characters (e.g., the Greek question mark). As a result, I generally try to avoid strings whenever possible.
Using Integers to Track Categorical Data
So, what should we do instead? Another common option is to use integers. The idea being that there is less room for mistakes with integers. After all, it’s possible to use the wrong number, but it’s not as easy to use an invalid number. Here’s the same code with integers instead:
public class HockeyPlayer { private int position; private int goals; private int assists; private int blocks; private int saves; public int getPrimaryMetric() { switch (this.position) { case 0: case 1: case 2: return this.goals; case 3: case 4: return this.blocks; case 5: return this.saves; default: return this.blocks; } } }
Of course, we now have a new problem. What do these numbers mean? Surely, you could litter your code with comments to make it clear what each number means, but you’d have to do that everywhere. Also, comments tend to go out-of-date, meaning they might be lying about what a number means.
Using Constants to Track Categorical Data
Ultimately, both strings and integers have problems on their own, so you might cleverly realize that the solution is to use constants. That way, you can still use “strings” in the sense that you have descriptive variable names. And, you can also continue to use integers if you’d like:
public class HockeyPlayer { private int position; private int goals; private int assists; private int blocks; private int saves; private final int LEFT_WINGER = 0; private final int RIGHT_WINGER = 1; private final int CENTER = 2; private final int LEFT_DEFENSEMAN = 3; private final int RIGHT_DEFENSEMAN = 4; private final int GOALTENDER = 5; public int getPrimaryMetric() { switch (this.position) { case LEFT_WINGER: case RIGHT_WINGER: case CENTER: return this.goals; case LEFT_DEFENSEMAN: case RIGHT_DEFENSEMAN: return this.blocks; case GOALTENDER: return this.saves; default: return this.blocks; } } }
Generally, I think this is a fine compromise, but it does mean that the player’s position is still just an integer. Therefore, it is still possible to litter your code with magic numbers. It’s also possible to use the wrong constants by accident just because they’re the same type (e.g., LEFT_CIRCLE referring to the area of the ice in place of LEFT_WINGER).
Using Enums to Track Categorical Data
To solve this problem, we introduce enums. Like constants, enums are just a wrapper over integers. However, enums bring in a new benefit: type checking. So, not only do we eliminate typos and readability issues by using enums, but we also eliminate the issue of using the wrong constants. Here’s what that might look like:
public class HockeyPlayer { private Position position; private int goals; private int assists; private int blocks; private int saves; public enum Position { LEFT_WINGER, RIGHT_WINGER, CENTER, LEFT_DEFENSEMAN, RIGHT_DEFENSEMAN, GOALTENDER } public int getPrimaryMetric() { switch (this.position) { case LEFT_WINGER: case RIGHT_WINGER: case CENTER: return this.goals; case LEFT_DEFENSEMAN: case RIGHT_DEFENSEMAN: return this.blocks; case GOALTENDER: return this.saves; default: return this.blocks; } } }
Now, anywhere you want to specify a player’s position, you have a type you can request (i.e., Position). Suddenly, positions are type checkable, which makes it much harder to make mistakes. Only Position values can be passed in, so the only mistake you can make is passing the wrong position.
Enums are also pretty cool because they’re just classes. As a result, you can add a constructor and even functions. That way, if you want to store any additional data alongside the Enum values you can. For example, maybe you want to store a human readable version of the constant name (e.g., “Left Winger”) for places where you might display it. Likewise, abbreviations are common (e.g., “LW”).
Finally, the last thing I’ll say to sell Enums is that they usually play really nicely with modern IDEs. In other words, Enums give you the gift of autocomplete, and your IDE might even suggest the correct one automatically. This happens because you give the IDE a lot more semantic information with Enums than you do with constants or strings. Likewise, an IDE might even help you generate the Enum in the first place—sort of like how Excel will generate rows of data by inferring a pattern.
Should You Use Enums?
Broadly speaking, Enums are one of those niche features that you’ll rarely use. More often than not, they’re going to add complexity to code that may otherwise be only a short script. That said, if you have categorical data that you’re using a lot, it’s probably a good idea to look into Enums.
When students are working on their own projects, I sometimes recommend Enums for situations where categorical data is stored, such as:
- Days of the Week (e.g., MONDAY, TUESDAY, etc.)
- Months in a Year (e.g., JANUARY, FEBRUARY, etc.)
- Cardinal Directions (e.g., NORTH, SOUTH, EAST, WEST)
- Planets in the Solar System (e.g., EARTH, VENUS, MARS, etc.)
- Status of Processes/Events (e.g., ERROR, READY, RUNNING, etc.)
- Sports Teams (e.g., PENGUINS, RANGERS, BLUE JACKETS, etc.)
- Suits for Cards (e.g., HEARTS, DIAMONDS, etc.)
- Colors (e.g., RED, GREEN, BLUE)
However, sometimes there are dozens of categories, and it doesn’t make sense to list them all out. Likewise, sometimes there are a few categories you have in mind but maybe the user wants to make some custom categories (e.g., categories for line items in a bank account). In that case, I would stick with strings or at the very least provide an OTHER Enum. Finally, Enums seem to be unpopular with folks who need to save as much space as possible, which makes sense because they’re basically classes masquerading as integers.
Of course, because I personally use Enums so infrequently, I don’t really have many experiences where they went wrong. As a result, I’ll point you to some other folks who say to avoid them (e.g., Why you shouldn’t use Enums in your Code and Why you shouldn’t use Enums!). Be aware that Enums are different in every programming language, so the critiques you’ll see in the links above might only apply to specific languages.
With that said, let’s call it a day here. As always, if you liked this article, there are definitely more what that came from:
- Java Has A Remainder Operator—Not a Mod Operator
- Why Does == Sometimes Work on Strings in Java?
- Java Lambda Expressions Are a Scam
Likewise, feel free to show your support by heading over to my list of ways to grow the site. Otherwise, we’ll see you next time!
Recent Code Posts
Generally, people think of Python as a messy language because it lacks explicit typing and static type checking. But, that's not quite true in modern times. Surely, we can take advantage of type...
Chances are, if you're reading this article, you've written some Python code and you're wondering how to automate the testing process. Lucky for you, this article covers the concept of unit testing...