As with most articles in this series, I was doing some browsing on Google, and I found that some folks had an interest in learning how to obfuscate code in Python. Naturally, I thought that would be a fun topic. By no means am I an expert, but I’m familiar with the idea. As a result, treat this like a fun thought experiment.
Table of Contents
- Problem Description
- Solutions
- Obfuscate Code by Removing Comments
- Obfuscate Code by Removing Type Hints
- Obfuscate Code by Removing Whitespace
- Obfuscate Code by Abandoning Naming Conventions
- Obfuscate Code by Manipulating Strings
- Obfuscate Code by Manipulating Numbers
- Obfuscate Code by Introducing Dead Code
- Obfuscate Code by Adding Dead Parameters
- Other Ways to Obfuscate Code
- Challenge
- A Little Recap
Problem Description
Unlike most articles in this series, I’m not looking for a quick answer to code obfuscation—the process of making code unreadable. Instead, I want to look at various obfuscation methods. To do that, we’ll need some piece of nicely formatted source code:
def read_solution(solution_path: str) -> list: """ Reads the solution and returns it as a list of lines. :param solution_path: path to the solution :return: the solution as a list of lines """ with open(solution_path, encoding="utf8") as solution: data = solution.readlines() return data
Cool! Here’s a standalone function that I pulled from my auto-grader project. It’s not the best code in the world, but I figured it would serve as a nice example. After all, it’s a short snippet that performs a simple function: reads a file and dumps the results as a list of lines.
In this article, we’ll take a look at a few ways of making this code snippet as unintelligible as possible. Keep in mind that I’m not an expert at this. Rather, I thought this would be a fun exercise where we could all learn something.
Solutions
In this section, we’ll take a look at several ways to obfuscate code. In particular, we’ll be taking the original solution and gradually manipulating it throughout this article. As a result, each solution will not be a standalone solution. Instead, it will be an addition to all previous solutions.
Obfuscate Code by Removing Comments
One surefire way to make code hard to read is to begin by avoiding best practices. For instance, we could start by removing any comments and docstrings:
def read_solution(solution_path: str) -> list: with open(solution_path, encoding="utf8") as solution: data = solution.readlines() return data
In this case, the solution is self-documenting, so it’s fairly easy to read. That said, the removal of the comment does make it slightly harder to see exactly what this method accomplishes.
Obfuscate Code by Removing Type Hints
With the comments out of the way, we can begin removing other helpful pieces of syntax. For example, we have a few bits of syntax which help people track variable types throughout the code. In particular, we indicated that the input parameter solution_path
should be a string. Likewise, we also indicated that the function returns a list. Why not remove those type hints?
def read_solution(solution_path): with open(solution_path, encoding="utf8") as solution: data = solution.readlines() return data
Again, this function is still fairly manageable, so it wouldn’t be too hard to figure out what it does. In fact, almost all Python code looked like this at one point, so I wouldn’t say we’ve reached any level of obfuscation yet.
Obfuscate Code by Removing Whitespace
Another option for visual obfuscation is removing all extraneous whitespace. Unfortunately, in Python, whitespace has value. In fact, we use it to indicate scope. That said, there’s still some work we can do:
def read_solution(solution_path): with open(solution_path,encoding="utf8") as solution: data=solution.readlines() return data
Here, we were only able to remove three spaces: one between solution_path
and encoding
, one between data
and =
, and one between =
and solution.readlines()
. As a result, the code is still fairly readable. That said, as we begin to obfuscate our code a bit more, we’ll see this solution pay dividends.
Obfuscate Code by Abandoning Naming Conventions
One thing we have full control over in code is naming conventions. In other words, we decide what we name our functions and variables. As a result, it’s possible to come up with names that completely obfuscate the intent of a variable or function:
def x(a): with open(a,encoding="utf8") as z: p=z.readlines() return p
Here, we’ve lost all semantic value that we typically get from variable and function names. As a result, it’s even hard to figure out what this program does.
Personally, I don’t think this goes far enough. If we were particularly sinister, we’d generate long sequences of text for each name, so it’s even more difficult to understand:
def IdDG0v5lX42t(hjqk4WN0WwxM): with open(hjqk4WN0WwxM,encoding="utf8") as ltZH4QOxmGy8: QVsxkg07bMCs=ltZH4QOxmGy8.readlines() return QVsxkg07bMCs
Hell, I might even use a single random string of characters and only modify bits of it. For example, we could try using the function name repeatedly with slight alterations (e.g. 1 for l, O for 0, etc.):
def IdDG0v5lX42t(IdDG0v51X42t): with open(IdDG0v51X42t,encoding="utf8") as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
Of course, while this looks harder to read, nothing is really stopping the user from using an IDE to follow each reference. Likewise, compiling and decompiling this function (i.e. .py -> .pyc -> .py) would probably undo all our hard labor. As a result, we’ll have to go deeper.
Obfuscate Code by Manipulating Strings
Another way to make code unintelligible is to find hardcoded strings like “utf8” in our example and add an unnecessary layer of abstraction to them:
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt=chr(117)+chr(116)+chr(102)+chr(56) with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
Here, we’ve constructed the string “utf8” from its ordinal values. In other words, ‘u’ corresponds to 117, ‘t’ corresponds to 116, ‘f’ corresponds to 102, and ‘8’ corresponds to 56. This additional complexity is still pretty easy to map. As a result, it might be worthwhile to introduce even more complexity:
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt="".join([chr(117),chr(116),chr(102),chr(56)]) with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
Instead of direct concatenation, we’ve introduced the join method. Now, we have a list of characters as numbers. Let’s reverse the list just to add a bit of entropy to the system:
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt="".join(reversed([chr(56),chr(102),chr(116),chr(117)])) with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
How about that? Now, we have even more code we can begin modifying.
Obfuscate Code by Manipulating Numbers
With our “utf8” string represented as a reversed list of numbers, we can begin changing their numeric representation. For example, 56 is really 28 * 2 or 14 * 2 * 2 or 7 * 2 * 2 * 2. Likewise, Python supports various bases, so why not introduce hexadecimal, octal, and binary to the mix?
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101)])) with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
Suddenly, it’s unclear what numbers we’re even working with. To add a bit of chaos, I thought it would be fun to insert a whitespace character:
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
Then, we can call the strip method to remove that extra space.
Obfuscate Code by Introducing Dead Code
In the previous example, we added a whitespace character to our string to make it slightly more difficult to decode. We can now take that idea and begin to add code that doesn’t really do anything:
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt else: return list()
Here, I’ve introduce a dead branch. In other words, we’re operating under the assumption that the input is a valid string. As a result, we can add a silly case where we check if the string has a length greater than -1—which is always true. Then, on the dead branch, we return some generic value.
At this point, what is stopping us from writing a completely ridiculous dead block? In other words, instead of returning a simple junk value, we could construct a complex junk value:
def IdDG0v5lX42t(IdDG0v51X42t): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
Honestly, I could have put anything in the dead block. For fun, I decided to play with the input string. For instance, I constructed a substring and repeated it. Then, I constructed a list from the characters in that new string.
Obfuscate Code by Adding Dead Parameters
If we can introduce dead branches, we can absolutely introduce dead parameters. However, we don’t want to alter the behavior of the underlying function, so we’ll want to introduce default parameters:
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
Of course, this parameter is of no use currently. In other words, let’s try doing something with it:
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
Now, there is something beautiful about the expression LdDG0v51X42t%2!=0
. To me, it looks like a password—not a test for odd numbers.
Of course, why stop there? Another cool thing we can do with parameters is take advantage of variable length arguments:
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
Now, we’ve opened the door to an unlimited number of arguments. Let’s add some code to make this interesting:
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt elif LdDG0v51X42tf: return list() else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
Again, we’ll never hit this branch because the first condition is always true. Of course, the casual reader doesn’t know that. At any rate, let’s have some fun with it:
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt elif LdDG0v51X42tf: while LdDG0v51X42tf: LdDG0v51X42tx=LdDG0v51X42tf.pop() LdDG0v51X42tf.append(LdDG0v51X42tx) return LdDG0v51X42tf else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
Yep, that’s an infinite loop! Unfortunately, it’s sort of obvious. That said, I suspect that the variable names will obscure the intent for a little while.
Other Ways to Obfuscate Code
Once again, I’ll mention that this article was more of a thought experiment for me. I had seen obfuscated code in the past, and I thought it would be fun to give it a try myself. As a result, here’s the original snippet and the final snippet for comparison:
def read_solution(solution_path: str) -> list: """ Reads the solution and returns it as a list of lines. :param solution_path: path to the solution :return: the solution as a list of lines """ with open(solution_path, encoding="utf8") as solution: data = solution.readlines() return data
def IdDG0v5lX42t(IdDG0v51X42t,LdDG0v51X42t=0x173,*LdDG0v51X42tf): I6DGOv51X4Rt="".join(reversed([chr(2*2*7*2),chr(0x66),chr(0o164),chr(0b1110101),chr(0x20)])).strip() if LdDG0v51X42t%2!=0 or len(IdDG0v51X42t*3)>-1: with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt elif LdDG0v51X42tf: while LdDG0v51X42tf: LdDG0v51X42tx=LdDG0v51X42tf.pop() LdDG0v51X42tf.append(LdDG0v51X42tx) return LdDG0v51X42tf else: IdDG0v51X42t=IdDG0v51X42t[len(IdDG0v51X42t)/2::3]*6 return [I6DG0v51X42t for I6DG0v51X42t in IdDG0v51X42t]
At this point, I suppose we could continue to iterate, but I’m not sure that would be the best use of my time. That said, there were a few things I considered trying. For instance, I thought about compressing lines of code such as:
with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: IdDGOv51X4Rt=IdDGOv51X42t.readlines() return IdDGOv51X4Rt
Into something like:
with open(IdDG0v51X42t,encoding=I6DGOv51X4Rt) as IdDGOv51X42t: return IdDGOv51X42t.readlines()
However, part of me felt like this would actually make the code easier to read since we wouldn’t have to map variable names.
In addition, I thought about making some methods just to pollute the namespace a little bit. For example, we could create functions that overwrite some of the standard library. Then, give them totally different behavior. In our case, we might redefine reversed
to confuse the reader into thinking it has its typical behavior:
def reversed(x): return "utf8"
Then, we could pass whatever we wanted into it as bait. Wouldn’t that be sinister?
Beyond that, I’m aware that there are obfuscation tools out there, but I’m not sure how widely used they are. Here are a few examples:
- pyarmor: “A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.”
- pyminifier: “Minify, obfuscate, and compress Python code”
- Opy: “Obfuscator for Python”
- Oxyry: “the power to protect your python source code”
I haven’t tried many of these tools, but Oxyry is definitely the most convenient. When I plug our function into it, it generates the following code:
def read_solution (OOOO0OO0OO00OOOOO :str )->list :#line:1 ""#line:6 with open (OOOO0OO0OO00OOOOO ,encoding ="utf8")as OO0O00OO0O0O0OO0O :#line:7 OO0000O00O0OO0O0O =OO0O00OO0O0O0OO0O .readlines ()#line:8 return OO0000O00O0OO0O0O
Clearly, that’s not great, but I suppose it’s effective. If you know of any other tools or cool techniques, feel free to share them in the comments.
Challenge
For today’s challenge, pick a piece of code and try to obfuscate it. Feel free to use all of the ideas leveraged in this article. However, the challenge will be to come up with your own ideas. What other ways can we obfuscate Python code?
If you’re looking for some ideas, I mentioned a couple in the previous section. Of course, there are other things you could try. For instance, you could always add a logger which prints erroneous messages to the console. Something like this would have no effect on your program’s behavior, but it could confuse a reader.
If you want to go the extra mile, try writing a program which performs your favorite obfuscation technique. For instance, could you write a program which could identify Python variables? If so, you could generate your own symbol table which would track all variables. Then, you could generate new names without any worries about clashes.
At the end of the day, however, treat this challenge like a fun thought experiment. I don’t expect any of these methods to be all that practical. After all, if a machine can run the code even in an obfuscated state, so can a human (eventually).
A Little Recap
Typically, in this section, I would list off all the solutions. However, the code snippets are quite long, and I don’t think it makes a lot of sense for me to dump them here. As a result, I’ll just share the options as a list:
- Remove comments, type hints, and whitespace
- Abandon naming conventions
- Manipulate strings and numbers
- Introduce dead code and parameters
- Try something else
With that, I think we’re don for the day. If you like this sort of content, I’d appreciate it if you checked out an article on the different ways you can support the site. Otherwise, here are a few security related books on Amazon (ad):
- Violent Python: A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers
- Black Hat Python: Python Programming for Hackers and Pentesters
Finally, here are some related articles:
- How to Compare Strings in Python: Equality and Identity
- How to Perform a Reverse Dictionary Lookup in Python: Generator Expressions and More
Once again, thanks for stopping by. See you next time!
Recent Code Posts
Python has a cool feature that allows you to overload the operators. Let's talk about what that means and how you might use it!
This week, we're hitting another beginner topic: the assignment operator. While the idea is simple, the concept is rich in related ideas like scope, iterable unpacking, and augmented assignment.