Recently, I was working on an article about list comprehensions in Python when I thought it would be helpful to talk a little bit about making copies of variables. In particular, I want to take a moment address some of the risks when copying mutable data types.
Table of Contents
Before we talk about copying variables, it’s important discuss an important programming language feature called immutability. Immutability describes a variable which cannot be changed. In other words, immutable variables are constants.
More specifically, immutability implies that a variable cannot be mutated. For example, an immutable string cannot have any characters changed or removed without creating a completely new string in the process. We often see this when working with numbers in a language like Java or Python:
num = 5 copy = num
Naturally, we would expect that anything that happens to
copy has no effect on
num. That’s because numbers are typically immutable. In other words, the 5 that is stored in
num has an identity that is unique from the 5 that is stored in
Unfortunately, in most programming languages, immutability has very limited support. As a result, variables beyond numbers and strings are typically mutable which means the code snippet above will fail to create a copy. Instead, you’ll get what’s called “spooky action at a distance” in quantum entanglement. In other words, whatever you do to one variable will happen to the other variable.
The Basics of Copying
Since most languages lack support for immutability, we’re stuck dealing with the consequences when creating copies. In particular, we have to create new variables with all the same properties of the variable we’d like to copy by hand. In the following subsections, we’ll look at how this plays out.
Copying a List in Python
If we wanted to copy a list in Python, we might try the following:
my_list = [1, 2, 3] my_copy = my_list
If we poke around, we’ll notice that both lists are in fact the same. What a big success, right? Maybe we should take another look:
my_copy = 7 print(my_list) # Prints [1, 7, 3]... uh oh!
As we can see, lists in Python are mutable. When we created a “copy,” we actually copied the reference—not the contents of the list. In programming, we call this aliasing. As the name implies, we now have two names pointing to the same object.
To create a proper copy, we need to iterate over the list and add each element to a new list:
my_copy = [item for item in my_list]
Here, we used a list comprehension to create a copy of the original list. Now when we manipulate the new list, we don’t have to worry about corrupting the old list. But, is this enough?
Copying Nested Lists in Python
As it turns out, a list comprehension isn’t guaranteed to perform a proper copy. For example:
my_list = [[1, 2], [2, 7]] my_shallow_copy = [item for item in my_list]
Here, we’ve created a shallow copy of
my_list. While the new list has a unique identity from the original list, the contents of both lists are the same. In other words, the following is safe:
my_shallow_copy.append([5, -4]) print(my_list) # Prints [[1, 2], [2, 7]]
However, changing any of the nested elements will result in corruption of both lists:
my_shallow_copy = -4 print(my_list) # prints [[1, -4], [2, 7]]... uh oh!
If we want perform a deep copy in this case, we have to copy the nested lists as well:
my_deep_copy = [[item for item in sub_list] for sub_list in my_list]
Naturally, this leads us to writing a recursive function that can handle an n-dimensional matrix:
def deep_copy(item): if type(item) is list: return [deep_copy(sub_list) for sub_list in item] else: return item
Of course, even this deep copy function can only go so far. What if our lists contain mutable objects?
Copying Mutable Objects in Python
At this point, we’re fairly comfortable with copying immutable data types like numbers and strings as well as mutable data types like lists, but what if the data types we’re dealing with are something else? For example, what if we make our own class as follows:
class Votes: def __init__(self): self.pro = list() self.anti = list()
Here, we’ve created some class that represents a set of votes which maintains two lists: pro (for) and anti (against). We can populate those lists with unique IDs that represent voters:
town_votes = Votes() town_votes.pro.append("109437139") town_votes.pro.append("476524275") town_votes.pro.append("794314532") town_votes.anti.append("420901790")
Great, now we can do fun things like count the votes for and against:
len(town_votes.pro) # 3 len(town_votes.anti) # 1
Now, let’s say we have several people counting the votes, so we can make sure we got it right. For security purposes, we want to create a deep copy of the
town_votes objects, so corrupt individuals don’t ruin the counts for everyone. If they try, they should fail during the final check.
Of course, how do we copy our
town_votes object? For example, would something like this work:
duplicate = town_votes
Of course not. We’ve only copied the reference which results in the same problem we had with lists. But, what if we make a new
Votes object and duplicate its references:
duplicate = Votes() duplicate.pro = town_votes.pro duplicate.anti = town_votes.anti
Sure, we now have a new
Votes object, but there’s still a problem: the pro and anti lists are the same. In other words, we’ve only created a shallow copy of our
Votes object. Luckily, we know a thing or two about cloning lists:
duplicates.pro = [id for id in town_votes.pro] duplicates.anti = [id for id in town_votes.anti]
Now, we have a deep copy of our
town_votes object. If someone were to come along and tamper with the copy, we’d still be okay.
What we just managed to accomplish with the
Votes object is known as a deep copy. Naturally, the process scales up quickly depending on how many references our object is storing. What can make matters worse is if those references store references. To deal with this, it’s not uncommon for libraries to implement what is known as a copy constructor:
def __init__(self, to_copy=None): if to_copy: self.pro = [id for id in to_copy.pro] self.anti = [id for id in to_copy.anti] else: self.pro = list() self.anti = list()
Then, if we ever want a deep copy of our
Votes object, we’ll provide it as the input to the constructor. And, if our votes lists contained references (like hypothetical
Voter objects), we could call their copy constructor directly from the list comprehension:
def __init__(self, to_copy=None): if to_copy: self.pro = [Voter(id) for id in to_copy.pro] self.anti = [Voter(id) for id in to_copy.anti] else: self.pro = list() self.anti = list()
Of course, there are challenges when performing a deep copy. Perhaps the most dangerous are circular references where one object points to another and the other points back. During copying, both objects would need to construct each other in an infinite loop. To deal with that, you usually need to maintain some sort of reference lookup table to see if you’ve ever duplicated that object in the past.
In any case, Python does provide copying libraries that can handle all this fun stuff for you within reason. I won’t go into that here because I wasn’t planning to write a Python article, but you’re welcome to dig into the documentation yourself.
Attack of the Clones
While you’re here, I recommend checking some of the following articles:
- Rock, Paper, Scissors Using Modular Arithmetic
- How to Sort List of Strings in Python
- How to Clone a List in Python
And, if you’re feeling extra generous, check out the membership page for subscription information. Every little bit helps!
Kicking off a new series of reverse engineering content inspired by VirtualFlatCAD. Today, we're trying to roll our own uppercase function.
When it comes to capitalizing strings in Python, you have a few options. Use the tools Python provides or roll your own.