Be Careful When Copying Mutable Data Types

Be Careful When Copying Mutable Data Types Featured Image

Recently, I was working on an article about list comprehensions in Python when I thought it would be helpful to talk a little bit about making copies of variables. In particular, I want to take a moment address some of the risks when copying mutable data types.

Table of Contents

Immutability

Before we talk about copying variables, it’s important discuss an important programming language feature called immutability. Immutability describes a variable which cannot be changed. In other words, immutable variables are constants.

More specifically, immutability implies that a variable cannot be mutated. For example, an immutable string cannot have any characters changed or removed without creating a completely new string in the process. We often see this when working with numbers in a language like Java or Python:

num = 5
copy = num

Naturally, we would expect that anything that happens to copy has no effect on num. That’s because numbers are typically immutable. In other words, the 5 that is stored in num has an identity that is unique from the 5 that is stored in copy.

Unfortunately, in most programming languages, immutability has very limited support. As a result, variables beyond numbers and strings are typically mutable which means the code snippet above will fail to create a copy. Instead, you’ll get what’s called “spooky action at a distance” in quantum entanglement. In other words, whatever you do to one variable will happen to the other variable.

The Basics of Copying

Since most languages lack support for immutability, we’re stuck dealing with the consequences when creating copies. In particular, we have to create new variables with all the same properties of the variable we’d like to copy by hand. In the following subsections, we’ll look at how this plays out.

Copying a List in Python

If we wanted to copy a list in Python, we might try the following:

my_list = [1, 2, 3]
my_copy = my_list

If we poke around, we’ll notice that both lists are in fact the same. What a big success, right? Maybe we should take another look:

my_copy[1] = 7
print(my_list)  # Prints [1, 7, 3]... uh oh!

As we can see, lists in Python are mutable. When we created a “copy,” we actually copied the reference—not the contents of the list. In programming, we call this aliasing. As the name implies, we now have two names pointing to the same object.

To create a proper copy, we need to iterate over the list and add each element to a new list:

my_copy = [item for item in my_list]

Here, we used a list comprehension to create a copy of the original list. Now when we manipulate the new list, we don’t have to worry about corrupting the old list. But, is this enough?

Copying Nested Lists in Python

As it turns out, a list comprehension isn’t guaranteed to perform a proper copy. For example:

my_list = [[1, 2], [2, 7]]
my_shallow_copy = [item for item in my_list]

Here, we’ve created a shallow copy of my_list. While the new list has a unique identity from the original list, the contents of both lists are the same. In other words, the following is safe:

my_shallow_copy.append([5, -4])
print(my_list)  # Prints [[1, 2], [2, 7]]

However, changing any of the nested elements will result in corruption of both lists:

my_shallow_copy[0][1] = -4
print(my_list) # prints [[1, -4], [2, 7]]... uh oh!

If we want perform a deep copy in this case, we have to copy the nested lists as well:

my_deep_copy = [[item for item in sub_list] for sub_list in my_list]

Naturally, this leads us to writing a recursive function that can handle an n-dimensional matrix:

def deep_copy(item):
  if type(item) is list:
    return [deep_copy(sub_list) for sub_list in item]
  else:
    return item

Of course, even this deep copy function can only go so far. What if our lists contain mutable objects?

Copying Mutable Objects in Python

At this point, we’re fairly comfortable with copying immutable data types like numbers and strings as well as mutable data types like lists, but what if the data types we’re dealing with are something else? For example, what if we make our own class as follows:

class Votes:
  def __init__(self):
    self.pro = list()
    self.anti = list()

Here, we’ve created some class that represents a set of votes which maintains two lists: pro (for) and anti (against). We can populate those lists with unique IDs that represent voters:

town_votes = Votes()
town_votes.pro.append("109437139")
town_votes.pro.append("476524275")
town_votes.pro.append("794314532")
town_votes.anti.append("420901790")

Great, now we can do fun things like count the votes for and against:

len(town_votes.pro)  # 3
len(town_votes.anti)  # 1

Now, let’s say we have several people counting the votes, so we can make sure we got it right. For security purposes, we want to create a deep copy of the town_votes objects, so corrupt individuals don’t ruin the counts for everyone. If they try, they should fail during the final check.

Of course, how do we copy our town_votes object? For example, would something like this work:

duplicate = town_votes

Of course not. We’ve only copied the reference which results in the same problem we had with lists. But, what if we make a new Votes object and duplicate its references:

duplicate = Votes()
duplicate.pro = town_votes.pro
duplicate.anti = town_votes.anti

Sure, we now have a new Votes object, but there’s still a problem: the pro and anti lists are the same. In other words, we’ve only created a shallow copy of our Votes object. Luckily, we know a thing or two about cloning lists:

duplicates.pro = [id for id in town_votes.pro]
duplicates.anti = [id for id in town_votes.anti]

Now, we have a deep copy of our town_votes object. If someone were to come along and tamper with the copy, we’d still be okay.

Copying Constructors

What we just managed to accomplish with the Votes object is known as a deep copy. Naturally, the process scales up quickly depending on how many references our object is storing. What can make matters worse is if those references store references. To deal with this, it’s not uncommon for libraries to implement what is known as a copy constructor:

def __init__(self, to_copy=None):
  if to_copy:
    self.pro = [id for id in to_copy.pro]
    self.anti = [id for id in to_copy.anti]
  else:
    self.pro = list()
    self.anti = list()

Then, if we ever want a deep copy of our Votes object, we’ll provide it as the input to the constructor. And, if our votes lists contained references (like hypothetical Voter objects), we could call their copy constructor directly from the list comprehension:

def __init__(self, to_copy=None):
  if to_copy:
    self.pro = [Voter(id) for id in to_copy.pro]
    self.anti = [Voter(id) for id in to_copy.anti]
  else:
    self.pro = list()
    self.anti = list()

Of course, there are challenges when performing a deep copy. Perhaps the most dangerous are circular references where one object points to another and the other points back. During copying, both objects would need to construct each other in an infinite loop. To deal with that, you usually need to maintain some sort of reference lookup table to see if you’ve ever duplicated that object in the past.

In any case, Python does provide copying librariesOpens in a new tab. that can handle all this fun stuff for you within reason. I won’t go into that here because I wasn’t planning to write a Python article, but you’re welcome to dig into the documentation yourself.

Attack of the Clones

At this point, I hope you’re more comfortable with concepts like immutability and cloning. These concepts apply to just about every popular language used today like C, C++, JavaScript, and Java. You’d be hard pressed to find a language which implements total immutability, but there are a few that exist. I believe most functional languages try to avoid the notion of state, so you might be able to avoid this cloning issue using languages like Haskell.

While you’re here, I recommend checking some of the following articles:

And, if you’re feeling extra generous, check out the membership pageOpens in a new tab. for subscription information. Every little bit helps!

Coding Tangents (43 Articles)—Series Navigation

As a lifelong learner and aspiring teacher, I find that not all subjects carry the same weight. As a result, some topics can fall through the cracks due to time constraints or other commitments. Personally, I find these lost artifacts to be quite fun to discuss. That’s why I’ve decided to launch a whole series to do just that. Welcome to Coding Tangents, a collection of articles that tackle the edge case topics of software development.

In this series, I’ll be tackling topics that I feel many of my own students have been curious about but never really got the chance to explore. In many cases, these are subjects that I think deserve more exposure in the classroom. For instance, did you ever receive a formal explanation of access modifiers? How about package management? Version control?

In some cases, students are forced to learn these subjects on their own. Naturally, this forms a breeding ground for misconceptions which are made popular in online forums like Stack Overflow and Reddit. With this series, I’m hoping to get back to the basics where these subjects can be tackled in their entirety.

Jeremy Grifski

Jeremy grew up in a small town where he enjoyed playing soccer and video games, practicing taekwondo, and trading Pokémon cards. Once out of the nest, he pursued a Bachelors in Computer Engineering with a minor in Game Design. After college, he spent about two years writing software for a major engineering company. Then, he earned a master's in Computer Science and Engineering. Today, he pursues a PhD in Engineering Education in order to ultimately land a teaching gig. In his spare time, Jeremy enjoys spending time with his wife, playing Overwatch and Phantasy Star Online 2, practicing trombone, watching Penguins hockey, and traveling the world.

Recent Posts