Be Careful with String’s Substring Method in Java

Be Careful with String's Substring Method in Java Featured Image

Every once in awhile, I’ll come across a well-established library in a programming language that has its quirks. As an instructor, I have to make sure I’m aware of these quirks when I’m teaching. For instance, last time I talked a bit about the various Scanner input methods and how they don’t all behave the same way. Well today, I want to talk about the substring method from Java’s String library.

Table of Contents

Documentation

When using a library for the first time, I find it useful to check out the documentation. But with a library so established, it sometimes feels silly to dig into the documentation. After all, a lot of languages support strings. Personally, all I need to know is the name of the command before I can figure out the rest.

However, every once in awhile, I’ll come across a function that is less intuitive than I thought. In this case, I’m talking about Java’s substring method. As you can probably imagine, it grabs a substring from a string and returns it. So, what’s the catch?

Well for starters, the substring method is actually an overloaded method. As a result, there are two different forms of the same method in the documentation. Take a look:

public StringOpens in a new tab. substring(int beginIndex)

~

Returns a new string that is a substring of this string. The substring begins with the character at the specified index and extends to the end of this string.

Java API, 2019

public StringOpens in a new tab. substring(int beginIndex, int endIndex)

~

Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.

Java API, 2019

At this point, don’t fixate too much on their descriptions as we’ll get to those. Just be aware that there are two different versions of the same method.

Usage

At this point, I’d like to take a moment to show how to use the substring method. If this is your first time poking around the Java API, this would be a good time to follow along.

First, notice that the method header does not contain the static keyword. In other words, subtring is an instance method which makes sense. We need an instance of a string in order to get a substring:

String str = "Hello, World!";
String subOne = str.substring(7);
String subTwo = str.substring(0, 5);

In this example, we’ve created two new substrings: one from position 7 to the end and the other from position 0 to position 5. Without looking at the documentation, can you figure out what the resulting strings will be?

Interval Notation

Before I give away the answer, I think it’s important to discuss some terminology from mathematics. In particular, I’d like to talk a bit about interval notation.

In interval notation, the goal is to explicitly state the range of some subset. For instance, we may be interested in all integers greater than 0. In interval notation, that would look something like:

(0, +∞)

In this example, we’ve chosen to exclude the value of 0 from the range using parentheses. We could have just as easily defined the interval starting with 1—pay attention to the brackets:

[1, +∞)

In either case, we’re describing the same set: all integers greater than 0.

So, how does this tie into the substring method? As it turns out, a substring is a subset of a string, so we can use interval notation to define our substring. Why don’t we try a couple examples? Given “Hello, World!”, determine the substring using the following intervals:

  • [0, 2]
  • (0, 5]
  • (1, 3)
  • (-1, 7]

Once you’re done, check out the answers below:

  • “Hel”
  • “ello,”
  • “l”
  • “Hello, W”

We’ll need to keep this idea in the back of our mind moving forward.

The Truth

The truth of the matter is the substring method is a bit weird. On one hand, we can use a single index to specify the starting point of our new substring. On the other hand, we can use two indices to grab an arbitrary subset of a string.

However, in practice, I find that the second option gives a lot of students trouble, and I don’t blame them. After all, the bounds are deceptive. For example, let’s revisit some code from above:

String str = "Hello, World!";
String subOne = str.substring(7);
String subTwo = str.substring(0, 5);

Here, we can confidently predict that subOne has a value of “World!”, and we’d be right. After all, index 7 is ‘W’, the method automatically grabs the rest of the string.

As for subTwo, we’d probably guess “Hello,”, and we’d be incorrect. It’s actually “Hello” because the end index is exclusive (i.e. [0, 5) ). In the next section, we’ll take a look at why that is and how I feel about it.

My Take

From what I understand, the inclusive/exclusive model is the standard for ranges in the Java API. That said, I do occasionally question the design choice.

On one hand, there’s the advantage of being able to use the length of the string as the end point of the substring:

String jokerQuote = "Madness, as you know, is like gravity, all it takes is a little push.";
String newtonTheory = jokerQuote.substring(30, jokerQuote.length());

But, is this really necessary? Java already provides an overload to the substring method which captures exactly this behavior.

That said, there is a nice mathematical explanation for this notation, and part of it has to do with the difference between the starting and ending points. In particular, we get the length of the new substring:

int length = endIndex - startIndex;

In addition, this particular notation allows adjacent substrings to share a midpoint:

String s = "Luck is great, but most of life is hard work.";
String whole = s.substring(0, s.length()/2) + s.substring(s.length()/2, s.length());

Both of these properties are nice, but I think they’re likely a byproduct of indexing by zero (perpetuated by DijkstraOpens in a new tab.) which isn’t all that intuitive either. And for those of you who are going to take exception to that comment, be aware that I’m all for indexing by zero and and this inclusive/exclusive subset convention.

All I’m trying to say is that I’ve seen my own students get tripped up over both conventions, so I feel for them in a way. That’s why I went through such lengths to write this article in the first place.

Let me know if you feel the same or if I’m totally off base. Otherwise, thanks for taking some time to read my work. I hope you enjoyed it!

Coding Tangents (43 Articles)—Series Navigation

As a lifelong learner and aspiring teacher, I find that not all subjects carry the same weight. As a result, some topics can fall through the cracks due to time constraints or other commitments. Personally, I find these lost artifacts to be quite fun to discuss. That’s why I’ve decided to launch a whole series to do just that. Welcome to Coding Tangents, a collection of articles that tackle the edge case topics of software development.

In this series, I’ll be tackling topics that I feel many of my own students have been curious about but never really got the chance to explore. In many cases, these are subjects that I think deserve more exposure in the classroom. For instance, did you ever receive a formal explanation of access modifiers? How about package management? Version control?

In some cases, students are forced to learn these subjects on their own. Naturally, this forms a breeding ground for misconceptions which are made popular in online forums like Stack Overflow and Reddit. With this series, I’m hoping to get back to the basics where these subjects can be tackled in their entirety.

Jeremy Grifski

Jeremy grew up in a small town where he enjoyed playing soccer and video games, practicing taekwondo, and trading Pokémon cards. Once out of the nest, he pursued a Bachelors in Computer Engineering with a minor in Game Design. After college, he spent about two years writing software for a major engineering company. Then, he earned a master's in Computer Science and Engineering. Today, he pursues a PhD in Engineering Education in order to ultimately land a teaching gig. In his spare time, Jeremy enjoys spending time with his wife, playing Overwatch and Phantasy Star Online 2, practicing trombone, watching Penguins hockey, and traveling the world.

Recent Posts