In this article, I try to bring some clarity around the difference between arrays and lists in Python. In short, Python’s default array-like data structure is the list. Keep reading if you want to learn a bit more about the array module or NumPy arrays.
Table of Contents
Too Long; Didn’t Read (TL;DR)
As I was putting this article together, I found myself a bit frustrated with the way different people talked about lists and arrays. As a result, the beginning of this article briefly discusses terminology. In other words, what is a list—both in terms of theory and Python implementation? Likewise, what is an array? And, does Python even support arrays?
In short, lists tend to be defined as an abstract data type which Python confuses a bit by creating a concrete data structure called “list.” In Python, lists are the default list-like data structure which happens to be mutable, dynamically-sized, and heterogeneous (sort of).
In contrast, Python has support for arrays through the array module, but these arrays aren’t “true” arrays in the theoretical sense. As a result, they’re mutable, dynamically-sized, and homogenous. It seems like this data structure only exists for dealing with low-level data—at least based on its methods.
To further confuse everyone, a very popular 3rd party library called NumPy also uses the term array to describe its list-like data structure. These arrays are basically Python lists with support for computation.
Moral of the story is that Python doesn’t have arrays—at least not the same kinds of arrays you might see in Java or C. As a result, if you’re just looking for the Python equivalent, you can’t get much closer than the list.
What Is a List?
To kick off this discussion, we should start by defining each term.
Broadly, a list is an abstract data structure akin to a sequence. Specifically, a sequence is any data structure which organizes data in a line. In other words, there are no hierarchies or connections between elements. Instead, elements are organized left-to-right—or top-to-bottom depending on how you want to think about it—and can be accessed by index (i.e. a number typically starting from 0 or 1 and counting up by 1).
Confusingly, Python uses the term list to describe a special kind of sequence that happens to be mutable. In other words, you can add and remove items from a list. This is in direct contrast to the tuple which is another sequence data structure that’s immutable (i.e. once the structure is defined, items cannot be added or removed).
In addition, Python lists are dynamically-sized. Initially, a list might be empty, but we can easily change that by adding items to it. This is in direct contrast to the array which typically has a fixed size (i.e. once the structure is defined, the number of elements cannot change).
Another cool thing about Python lists is that their contents are heterogeneous. In other words, there’s no requirement specifying what type of data has to be stored in them. Naturally, that means we can store strings alongside numbers or other objects. Again, this tends to contrast with arrays that depend on a consistent data type for performance purposes.
In general, the motivation behind this sort of design is convenience. One of the challenges of using an array is knowing exactly how many items you need to store before you create one. Meanwhile, a list can be created before we know anything about our data. Also, as we’ll talk about later, Python lists tend to steal many of the performance benefits of arrays as well (spoiler alert: they’re almost the same thing).
What Is an Array?
In contrast, an array is typically defined as a fixed-size homogeneous mutable sequence.
Like lists, arrays tend to be mutable. As a result, once defined, we can add and remove as needed. For example, we might want to sort the items in an array. Mutability can sometimes be a desirable trait while sorting because we can move the items around in place. If arrays were immutable, we would have to store the sorted array in a new array.
Unlike lists, arrays tend to be fixed-size. Basically, this means that once we’ve created an array, we cannot change the number of elements it contains. If we create an array with 5 elements, we’ll have 5 elements to work with for its lifetime. If you’re familiar with languages like Java or C, this is the default behavior of arrays.
Likewise, arrays tend to be homogeneous. In other words, arrays tend to restrict the type of elements they can store to a consistent type (e.g. only integers). Of course, in the world of objects, arrays store references which have a consistent size. Again, if you’re familiar with languages like Java or C, this is the default behavior of arrays.
In general, the motivation behind this design is performance. In other words, if we know what type of value we’re going to store in our sequence, the size of the sequence becomes predictable. For example, if we know we’re going to store seven 32-bit integers, then we can ask the operating system for roughly 224 bits of memory. From there, we can access any of those values with the following formula:
num = address + 32 * index
All that said, Python arrays don’t actually adhere to some of these requirements—namely sizing. As we’ll see later, Python arrays seem to be more of a list wrapper for C arrays.
What’s the Difference?
As it turns out, lists and arrays are quite similar. For example, both allow us to access elements by index. Likewise, both are organized in a sequence structure, and both are mutable. Beyond that, the only differences are how their size is managed and what types of data they can store.
Ironically, Python lists are (dynamic) arrays. The only reason that they’re able to change size is that they have built-in capacity detection. In other words, whenever an item is added to the list that hits the underlying array’s max capacity, a new array is created and the elements are copied over. Likewise, the only reason they can store multiple types of data is that everything in Python is an object. As a result, lists only have to worry about the size of references—not the objects themselves. Otherwise, they work just like arrays.
That said, everything we’ve talked about up to this point is theory. In reality, Python lists and arrays have a lot of practical differences. In the remainder of this section, we’ll take a look at a few.
One of the biggest differences between Python lists and arrays is their syntax. Since lists are built into the language, they can be defined directly:
empty_list =  empty_list = list() numbers = [1, 4, 3]
In contrast, if we want to create an array, we have to import the array module:
import array empty_array = array.array("i") # Specifies that array will store integers numbers = array.array("i", [1, 5, 4])
Naturally, this is quite a bit more clunky because we have to import a library and leverage the array constructor. In addition, we have to specify a type—something we will talk about more in the next section.
Since Python arrays are closer to traditional arrays than lists, they’re stuck adhering to this idea of homogeneous. Again, lists also adhere to this principle as everything in Python is an object. However, the difference is that Python arrays seem to behave like thin list wrappers for C arrays. As a result, they can only store integers, floats, and characters.
This restriction is addressed through the typecode parameter of the constructor. For example, here are a few of the options (not an exhaustive list):
- ‘b’ for 1-byte signed char: -128 to 127
- ‘B’ for 1-byte unsigned char: 0 to 255
- ‘u’ for 1- or 2-byte char
- ‘h’ for 2-byte signed short: -32,768 to 32,767
- ‘H’ for 2-byte unsigned short: 0 to 65,535
Naturally, this means that arrays cannot store data like strings, objects, or even other arrays.
Since both arrays and lists are objects in Python, they come with their own sets of methods. Naturally, these methods give us some insight into how both data structures are meant to be used and how well the adhere to their theoretical structure.
First, let’s talk about Python arrays. Previously in this article, I had mentioned that arrays are typically fixed-size. That’s not actually true with the Python array. After all, both lists and arrays have support for the
numbers_list = [1, 2, 7] numbers_list.append(9) # Stores [1, 2, 7, 9] import array numbers_array = array.array("i", [1, 2, 7]) numbers_array.append(9) # Stores array('i', [1, 2, 7, 9])
Beyond that, the similiarities tend to fall off. For example, while both lists and arrays support methods like
insert(), arrays have several additional methods including:
In short, there seems to be a lot of conversion-related methods for arrays that just don’t exists for lists. For example, arrays support reading from and writing to files, lists, and strings. Also, there appears to be no support for sorting—something that’s provided with lists.
What’s the Big Deal?
Unfortunately, unless you found this article first, you probably read a lot of misleading information about Python lists. For example, if you search Python arrays, you’re likely to be greeted by this article by W3Schools that states the following notes:
Note: Python does not have built-in support for Arrays, but Python Lists can be used instead.
Note: This page shows you how to use LISTS as ARRAYS, however, to work with arrays in Python you will have to import a library, like the NumPy library.
Of course, that first note is misleading at best. Even if Python doesn’t support the theoretical array we discussed at the beginning of this article, the array module is built right into the language.
Meanwhile, the second note is even more problematic because it redirects you to a 3rd party module, NumPy, before ever mentioning the array module. Ironically, NumPy arrays aren’t true arrays either, and I doubt the person looking to learn about arrays is trying to go down the rabbit hole of data science.
To make matters worse, the link they provide takes you to their own internal documentation on W3Schools which states this garbage:
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently.
This is just objectively false. In Python, lists are stored in contiguous memory just like arrays. Don’t believe me? Check out this in depth look at Python’s list implementation. Ugh, why does Google rank these websites so highly?
What About NumPy?
On the off chance that you’re actually interested in the difference between Python lists and NumPy arrays, I figure it’s worth at least chatting about it.
First, NumPy is a 3rd party library. As a result, you will have to install it using a package manager like pip. Naturally, the scope of this article doesn’t really allow for a deep explanation of package management best practices, so I won’t get into it. Instead, I’ll defer to their installation instructions.
Assuming you already have NumPy installed, then the differences really boil down to the following: NumPy arrays are built for computation. For example, if you had a normal list full of integers, you would have to do something like the following to scale all the values:
nums = [2, 6, -4] scaled_nums = [2 * num for num in nums] # stores [4, 12, -8]
Meanwhile, in Numpy, scaling an array is as easy as the following:
import numpy as np nums = np.array([2, 6, -4]) scaled_nums = nums * 2 # stores array([ 4, 12, -8])
Naturally, there are increasingly more complex ways to work with NumPy arrays that just don’t scale as well with Python lists. Being able to use the math operators directly is a huge bonus.
That said, NumPy arrays should really only be used in the context of data science. In general, I don’t recommend adding dependencies to a project unless they’re necessary.
Ultimately, NumPy arrays are similar to the array module in the sense that they aren’t arrays in the traditional sense. At the very least, Python has no support for fixed-size arrays.
To Hell With Terminology
One thing I find very frustrating in our community is how often terms get defined and mixed until they don’t make any sense anymore. It appears that “array” is one of those terms. That said, if you’re familiar with the term “array” and just looking for the Python equivalent, use lists.
With all that said, thanks for taking the time to check this article out. If you’re interested in learning more about Python with my hot takes mixed in, feel free to check out some of these related articles:
Likewise, I’m always happy when folks want to support the site. If you’d like to do that, check out this list.
Meanwhile, here are a few Python resources from the folks at Amazon (ad):
- Effective Python: 90 Specific Ways to Write Better Python
- Python Tricks: A Buffet of Awesome Python Features
- Python Programming: An Introduction to Computer Science
Otherwise, thanks for your time! I appreciate it.
Kicking off a new series of reverse engineering content inspired by VirtualFlatCAD. Today, we're trying to roll our own uppercase function.
When it comes to capitalizing strings in Python, you have a few options. Use the tools Python provides or roll your own.