Recently, I was working on an article about list comprehensions in Python when I thought it would be helpful to talk a little bit about making copies of variables. In particular, I want to take a moment address some of the risks when copying mutable data types.
Table of Contents
Immutability
Before we talk about copying variables, it’s important discuss an important programming language feature called immutability. Immutability describes a variable which cannot be changed. In other words, immutable variables are constants.
More specifically, immutability implies that a variable cannot be mutated. For example, an immutable string cannot have any characters changed or removed without creating a completely new string in the process. We often see this when working with numbers in a language like Java or Python:
num = 5 copy = num
Naturally, we would expect that anything that happens to copy
has no effect on num
. That’s because numbers are typically immutable. In other words, the 5 that is stored in num
has an identity that is unique from the 5 that is stored in copy
.
Unfortunately, in most programming languages, immutability has very limited support. As a result, variables beyond numbers and strings are typically mutable which means the code snippet above will fail to create a copy. Instead, you’ll get what’s called “spooky action at a distance” in quantum entanglement. In other words, whatever you do to one variable will happen to the other variable.
The Basics of Copying
Since most languages lack support for immutability, we’re stuck dealing with the consequences when creating copies. In particular, we have to create new variables with all the same properties of the variable we’d like to copy by hand. In the following subsections, we’ll look at how this plays out.
Copying a List in Python
If we wanted to copy a list in Python, we might try the following:
my_list = [1, 2, 3] my_copy = my_list
If we poke around, we’ll notice that both lists are in fact the same. What a big success, right? Maybe we should take another look:
my_copy[1] = 7 print(my_list) # Prints [1, 7, 3]... uh oh!
As we can see, lists in Python are mutable. When we created a “copy,” we actually copied the reference—not the contents of the list. In programming, we call this aliasing. As the name implies, we now have two names pointing to the same object.
To create a proper copy, we need to iterate over the list and add each element to a new list:
my_copy = [item for item in my_list]
Here, we used a list comprehension to create a copy of the original list. Now when we manipulate the new list, we don’t have to worry about corrupting the old list. But, is this enough?
Copying Nested Lists in Python
As it turns out, a list comprehension isn’t guaranteed to perform a proper copy. For example:
my_list = [[1, 2], [2, 7]] my_shallow_copy = [item for item in my_list]
Here, we’ve created a shallow copy of my_list
. While the new list has a unique identity from the original list, the contents of both lists are the same. In other words, the following is safe:
my_shallow_copy.append([5, -4]) print(my_list) # Prints [[1, 2], [2, 7]]
However, changing any of the nested elements will result in corruption of both lists:
my_shallow_copy[0][1] = -4 print(my_list) # prints [[1, -4], [2, 7]]... uh oh!
If we want perform a deep copy in this case, we have to copy the nested lists as well:
my_deep_copy = [[item for item in sub_list] for sub_list in my_list]
Naturally, this leads us to writing a recursive function that can handle an n-dimensional matrix:
def deep_copy(item): if type(item) is list: return [deep_copy(sub_list) for sub_list in item] else: return item
Of course, even this deep copy function can only go so far. What if our lists contain mutable objects?
Copying Mutable Objects in Python
At this point, we’re fairly comfortable with copying immutable data types like numbers and strings as well as mutable data types like lists, but what if the data types we’re dealing with are something else? For example, what if we make our own class as follows:
class Votes: def __init__(self): self.pro = list() self.anti = list()
Here, we’ve created some class that represents a set of votes which maintains two lists: pro (for) and anti (against). We can populate those lists with unique IDs that represent voters:
town_votes = Votes() town_votes.pro.append("109437139") town_votes.pro.append("476524275") town_votes.pro.append("794314532") town_votes.anti.append("420901790")
Great, now we can do fun things like count the votes for and against:
len(town_votes.pro) # 3 len(town_votes.anti) # 1
Now, let’s say we have several people counting the votes, so we can make sure we got it right. For security purposes, we want to create a deep copy of the town_votes
objects, so corrupt individuals don’t ruin the counts for everyone. If they try, they should fail during the final check.
Of course, how do we copy our town_votes
object? For example, would something like this work:
duplicate = town_votes
Of course not. We’ve only copied the reference which results in the same problem we had with lists. But, what if we make a new Votes
object and duplicate its references:
duplicate = Votes() duplicate.pro = town_votes.pro duplicate.anti = town_votes.anti
Sure, we now have a new Votes
object, but there’s still a problem: the pro and anti lists are the same. In other words, we’ve only created a shallow copy of our Votes
object. Luckily, we know a thing or two about cloning lists:
duplicates.pro = [id for id in town_votes.pro] duplicates.anti = [id for id in town_votes.anti]
Now, we have a deep copy of our town_votes
object. If someone were to come along and tamper with the copy, we’d still be okay.
Copying Constructors
What we just managed to accomplish with the Votes
object is known as a deep copy. Naturally, the process scales up quickly depending on how many references our object is storing. What can make matters worse is if those references store references. To deal with this, it’s not uncommon for libraries to implement what is known as a copy constructor:
def __init__(self, to_copy=None): if to_copy: self.pro = [id for id in to_copy.pro] self.anti = [id for id in to_copy.anti] else: self.pro = list() self.anti = list()
Then, if we ever want a deep copy of our Votes
object, we’ll provide it as the input to the constructor. And, if our votes lists contained references (like hypothetical Voter
objects), we could call their copy constructor directly from the list comprehension:
def __init__(self, to_copy=None): if to_copy: self.pro = [Voter(id) for id in to_copy.pro] self.anti = [Voter(id) for id in to_copy.anti] else: self.pro = list() self.anti = list()
Of course, there are challenges when performing a deep copy. Perhaps the most dangerous are circular references where one object points to another and the other points back. During copying, both objects would need to construct each other in an infinite loop. To deal with that, you usually need to maintain some sort of reference lookup table to see if you’ve ever duplicated that object in the past.
In any case, Python does provide copying libraries that can handle all this fun stuff for you within reason. I won’t go into that here because I wasn’t planning to write a Python article, but you’re welcome to dig into the documentation yourself.
Attack of the Clones
At this point, I hope you’re more comfortable with concepts like immutability and cloning. These concepts apply to just about every popular language used today like C, C++, JavaScript, and Java. You’d be hard pressed to find a language which implements total immutability, but there are a few that exist. I believe most functional languages try to avoid the notion of state, so you might be able to avoid this cloning issue using languages like Haskell.
While you’re here, I recommend checking some of the following articles:
- Rock, Paper, Scissors Using Modular Arithmetic
- How to Sort List of Strings in Python
- How to Clone a List in Python
And, if you’re feeling extra generous, check out the membership page for subscription information. Every little bit helps!
Recent Posts
Recently, I was thinking about the old Pavlov's dog story and how we hardly treat our students any different. While reflecting on this idea, I decided to write the whole thing up for others to read....
In the world of programming languages, expressions are an interesting concept that folks tend to implicitly understand but might not be able to define. As a result, I figured I'd take a crack at...