One of the more nasty tricks we have in our obfuscation toolbelt is the reality that text has a lot of visually similar characters. If we sprinkle enough of them around, we can have some tough to read code!
Table of Contents
Similar Looking Characters
Previously, I had talked about the benefit of naming conventions and how abandoning them could allow for fun obfuscation techniques like shadowing built-in functions. Well, as it turns out, there are potentially far more effective techniques for rendering code practically unreadable. And, you’ll be surprised at just how easy they are to implement.
One such technique is to consider characters that are similar looking and use the hell out of them in your naming conventions. For example, capital ‘O’ and ‘0’ are very similar looking, so it would be possible to litter these throughout your variable and function names for maximum chaos.
In fact, there are a lot of characters that look similar. A common nasty one is lowercase ‘l’ and ‘1’. Depending on the font, these can look identical. For example, you might have tried to learn the Unix/Linux command line where ‘ls’ is a common command. When I’ve shown this command to students, it’s common for them to think it says “1s” and not “ls’, leading to hilarious bugs.
As it turns out, there are also visually similar character combinations. For example, ‘d’ kind of looks like ‘cl’ at a glance. Meanwhile, ‘m’ looks like ‘rn’. And of course, any form of 1337 (i.e., leet) speak can work as lookalikes.
Here’s a quick table I put together of character lookalikes that you can abuse in your own code. For a more complete reference, I might recommend the following source on homoglyph detection.
Some of these might look immediately different, but you’ll never notice them when they’re buried in code. Or worse, as I’ll describe shortly, spammed!
Spamming Similar Looking Characters
Whenever we talk about naming conventions, we usually look for ways to make our code more readable. As I stated before, by abandoning naming conventions, we can make our code much harder to read. To demonstrate this, let’s look at a function that prints “Hello, World!”:
def hello_world(): print("Hello, World!")
Any time someone wants to use this function, they would call it by name:
if is_first_program: hello_world()
This is, of course, very easy to read. It’s practically in plain English. To make things much harder, we can rename our function using our similar looking characters:
def O00O00OO0O00OOOO00O0O(): print("Hello, World!")
Now, the function is still somewhat obvious in what it does, but it’s absolutely not obvious when it’s called:
if is_first_program: O00O00OO0O00OOOO00O0O()
And guess what! We can do this to every single thing we can name, such as variables, functions, and classes.
if OO000OOOOO00O00O: O00O00OO0O00OOOO00O0O()
Good luck figuring out what this chunk of code does! Here’s the same code block with different similar looking characters:
if ll1l11l111lll1l1ll1l: lllll1l1llll1l1l1ll()
It’s an absolute masterpiece, isn’t it?
One of the downsides of rewriting all your symbols with visually similar characters is that most IDEs will have no problem parsing these for their users. Therefore, a simple CTRL+CLICK on a function call will take them straight to the function declaration.
That said, this technique combined with a variety of other obfuscation techniques discussed in this series could easily push any source code into unreadable territory. Of course, as I always say, these techniques are more for fun, and I wouldn’t recommend actually employing them as a form of security.
With that said, let’s go ahead and call it for the day. If you found this article interesting, I’d encourage you to keep browsing the following articles:
- Obfuscation Techniques: Writing Malicious Comments
- Obfuscation Techniques: The Yoda Conditional
- How to Obfuscate Code in Python: A Thought Experiment
And if you really want to take your support to a new level, we have a variety of ways to support the site. Otherwise, take care!
Recently, I was giving a lecture about Java's "common" methods (i.e., all of the methods of Object), and I had epiphany about how Java only has toString() while Python has str() and repr(). So, it...
Magic numbers are numerical constants that have no clear meaning in the code and therefore make code harder to read. Anything that makes code harder to read is something we can use to obfuscate our...