Indexing in computer science is often a touchy subject, especially among beginners. Since counting usually starts at one, students tend to find indexing from zero confusing. Perhaps this is why some programming languages like MATLAB and Lua have adopted an indexing from one convention.
That said, indexing from zero is far more common among the mainstream programming languages like C, Python, and Java. As a result, students will inevitably have to come to terms with this convention.
Unfortunately, what often happens is that students have to learn this convention without understanding why it exists. It quickly becomes sort of like a “because I said so,” which students hate to hear. To help students deal with this reality, I figured I’d finally put together an article on why we don’t index from one in computer science.
Table of Contents
The Theory Behind Indexing By Zero
When we index a data structure like an array, we have a few of choices:
- Index from 0
- Index from 1
- Index from any other number
When we say index from “x”, what we’re really saying is that we begin counting the first element as “x”. In an indexing by 0 system, the first element is 0 followed by 1, 2, 3, etc. Alternatively, we could start from 1 where the first element is 1 followed by 2, 3, 4, etc.
Now, there are a couple reasons why we might choose one over the other, so let me take a moment to map out a few of those arguments.
The Math Argument
From a math perspective, starting from zero tends to make a lot more sense than one. After all, making the first element zero gives us many advantages during various array related tasks. For example, if we want to know the length of a subsequence, we can get it by subtracting the lower bound from the upper bound. Here’s what that might look like in Python:
example = [2, 5, 3, 1, 9] len(example[1:3]) # returns 2
For those not familiar with Python, this code makes a list containing five items. In the next line, we take a slice (i.e., a subsequence) of the list from index 1 to index 3. The convention here (this could be another article entirely) is to include the first value and exclude the second one. As a result, we end up with a list containing just elements at positions 1 and 2.
Because Python has zero-based indexing, we can subtract the upper bound from the lower bound to get the exact length of the subsequence. In this case, we can subtract 1 from 3 to get 2, which aligns with what the `len()` function returns.
Funnily enough, zero-based indexing doesn’t just benefit subsequences. It also has benefits with modular arithmetic. See, `mod`, if implemented correctly, always returns a number in the range of 0 <= x < N, where N is the number we are “modding” by. In a zero-based indexing language, hashing is really easy because we can set N to be the length of our sequence. Like this:
example = [3, 2, 5, 1] example[17 % len(example)] = 4 # [3, 4, 5, 1]
In this example, we create another list. Then, we take an arbitrary number, in this case 17, and mod it by the length of the list. No matter what, we are guaranteed to get a return value that is within the range of the list. In this case, `17 % 4` returns 1, which we set to 4. Now that’s convenient!
The Hardware Argument
Back in the day, a lot of programming had to happen at the hardware level. In an effort to abstract the hardware a bit, we had to invent structures that were nice to use. One of those structures was the invention of the array—or rather pointer arithmetic for arrays.
See, these days arrays are pretty abstract. We might declare their size ahead of time, or we might use some structure that handles that allocation for us. In either case, what’s happening under the hood is pointer arithmetic. Let me explain!
When we declare an array, we know in our minds that the structure itself is contiguous. And this is all possible because we ask for the block of space ahead of time.
At the start of that block is a memory address (virtual or otherwise). Depending on what we need to store in that memory, we break it up into equally sized elements. For example, an array of integers would be broken up into 32-bit sections.
One of the perks of knowing the size of the elements is that we can use arithmetic to access any element at any time: `element = index * size + offset`. In this case, the offset is the memory address of the first element.
To be sure the previous expression works, imagine plugging in 0. What value does expression return? It should be offset (i.e., the address of the 0th element). Using a one-based system would result in a slightly less elegant expression: `element = (index – 1) * size + offset`. Needless to say, this sort of “fixing” using +/- 1 is the type of stuff you often see in one-based languages.
Personally, I find the math argument more compelling than the historical hardware argument. That said, I know that many folks are going to disagree regardless. As a result, here’s my advice: find some way to conceptualize zero-based indexing in your hard. I’ll give you a couple real-world examples to help you get there!
- To measure the length of an object, you probably use a ruler or tape measure. These devices all start from zero, but I’m not sure any objects can have a length of zero.
- To track time, clocks (at least the 24-hour variety) start from 0. More broadly, seconds and minutes start from 0. Though again, a time of 0 seconds is sort of meaningless.
Also, if you’re particularly frustrated by zero-based indexing, check out this article by The Craft of Coding. I found it pretty amusing, and it holds a lot of the same values that I do about programming (i.e., who really cares one way or the other). Here’s one of my favorite quotes:
There are many people who vehemently opposed to the idea of there being anything but zero-based indexing, and go to great lengths to justify their cause. They are often the same people who believe there is only one true language (whatever that may be). The reality is, who really cares what type of array indexing is used? Either that or go with an index-nonspecific language.
Funnily enough, I’ve written pretty similar articles myself about the concept of “programming languages”:
- There Is No Value in Ranking Programming Languages: The Cost of Gatekeeping in Tech
- Who Gets to Decide What Is and Isn’t a Programming Language?
- What is a Programming Language?
With that said, that’s about all the time I care to spend on the subject. Hopefully, it helped! If so, I’d love it if you showed your support by heading over to my list of ways to grow the site. Otherwise, take care!
Life has given me a bit of a beating, so I'm taking some time to recover. See y'all again soon.
Why Is Adding Two Random Numbers Not the Same as Generating One in the Same Range?
Generating random numbers might seem easy at first, but there are definitely some pitfalls.