Lua Virtualization Part 2: Obfuscation techniques
Table of Contents
This is part 2 of a 5 part series about Lua Virtualization:
- Part 1: The Internals of the Lua VM
- Part 2: Obfuscation techniques
- Part 3: Taking a Look at Luraph
- Part 4: Devirtualizing Luraph
- Part 5: Actually Devirtualizing Luraph
The misconception that good obfuscation makes code undecompilable is outdated. True obfuscation doesn’t need to be unbreakable. It needs to withstand automation. If your obfuscated code can’t be easily automated and actively resists tooling, it’s a good obfuscation.
In fact, manual reversibility is not necessarily a failure. If it takes an expert a week to reverse and modify what would otherwise take a few minutes, that’s a win. Good obfuscation makes your code irregular. It replaces predictable structures with dynamically generated nonsense.
In this section, we’ll discuss obfuscation methods I’ve encountered in the wild and how I overcame them.
Obfuscating Constants via Mixed Boolean-Arithmetic (MBA)
A common and simple obfuscation technique is encoding constants through complex-looking arithmetic. Lua 5.1 doesn’t support bitwise operations, making MBA expressions rely on the combinations of multiplication, division, modulo, and subtraction.
For example, writing 122 can be encoded like this:
print((((7 * 10) - (102 % 23) + 9) * 2 - (3 * 7 + 14)) + ((144 / 2) - (8 * 4)) - (((99 - 30) / 3) - 2))
Now you have a value that can’t easily be read. However, in Lua, you can beautify it using the compiler Luac. When compiling to Luac the compiler will optimize the bytecode. If the entire expression consists of inline constants, the bytecode will be the following:
main (4 instructions, 16 bytes at 00780520)
0+ params, 2 slots, 0 upvalues, 0 locals, 2 constants, 0 functions
1 [1] GETGLOBAL 0 -1 ; print
2 [1] LOADK 1 -2 ; 122
3 [1] CALL 0 2 1
4 [1] RETURN 0 1
constants (2) for 00780520:
1 "print"
2 122
In this snippet, we notice that the compiler has optimized the MBA, it has turned the long MBA expression back into the 122. To prevent this, you must write some of the constants as variables:
local a = 7
print((((a * 10) - (102 % 23) + 9) * 2 - (3 * 7 + 14)) + ((144 / 2) - (8 * 4)) - (((99 - 30) / 3) - 2))
Now the compiler is forced to generate the full bytecode for each arithmetic step:
main (11 instructions, 44 bytes at 008E0520)
0+ params, 3 slots, 0 upvalues, 1 local, 8 constants, 0 functions
1 [1] LOADK 0 -1 ; 7
2 [2] GETGLOBAL 1 -2 ; print
3 [2] MUL 2 0 -3 ; - 10
4 [2] SUB 2 2 -3 ; - 10
5 [2] ADD 2 2 -4 ; - 9
6 [2] MUL 2 2 -5 ; - 2
7 [2] SUB 2 2 -6 ; - 35
8 [2] ADD 2 2 -7 ; - 40
9 [2] SUB 2 2 -8 ; - 21
10 [2] CALL 1 2 1
11 [2] RETURN 0 1
constants (8) for 008E0520:
1 7
2 "print"
3 10
4 9
5 2
6 35
7 40
8 21
Obscuring Numbers with Table Lengths
Another clever trick is to encode constants using the length of Lua tables. Because #{...} is calculated at runtime, it allows you to express numbers without writing them:
#{"a", 12, {}, "foo", 42, true, "x", 99} -- evaluates to 8
While this can be defeated by just running the code, it breaks static analyzers.
load()
Lua’s load (or loadstring in 5.1) is one of the most abused obfuscation methods. Many use it to simply encode their source into an ASCII string and call it a day. For this, they use an obfuscator like so:
local function obfuscate(str)
return ("load(\"%s\")()"):format(str:gsub(".", function(b)
return "\\" .. b:byte()
end))
end
print(obfuscate([[print("Hello World!")]]))
This will spit out the obfuscated code:
load("\112\114\105\110\116\40\39\72\101\108\108\111\32\87\111\114\108\100\33\39\41")() -- print("Hello World!")
This is the weakest kind of obfuscation, this can easily be broken.
Junk Code
Junk code is perhaps one of the well-known obfuscations. Its power lies not in complexity, but in polluting the code. This is done by:
Dead branches:
if 1 == 2 then
error("this will never happen")
end
Pcall nonsense:
pcall(function()
local x = "foo" + {} -- invalid, won't execute
end)
Load nonsense:
load("\40\52\53\32\37\32\55\32\43\32\49\51\41\32\42\32\50\32\45\32\56\55\50\32\45\32\40\49\52\32\43\32\54\41\32\42\32\50\32\37\32\53\32\42\32\40\40\57\32\43\32\51\41\32\42\32\40\53\32\45\32\50\41\41\32\37\32\55\32\42\32\40\53\54\32\37\32\57\32\42\32\52\41\32\43\32\40\49\50\32\45\32\51\41\32\42\32\40\56\32\43\32\52\32\42\32\51\41\32\37\32\40\53\32\43\32\49\41")() -- won't do anything
This junk serves no functional purpose. But to a parser, they’re indistinguishable from legitimate code. This forces the user to manually figure out which code is junk and which isn’t.
Return a lambda function
This is when an anonymous function is returned and executed as the main. Here’ll all functions used will be parsed from a table at runtime:
return (function(tbl)
return tbl[-23232]()("Hello World!")
end)({
[-23232] = function() return print end
})
This adds an obscuring layer which makes i harder to read, especially when combined with junk code, as well as MBA.
Control Flow
Control flow adds structured execution paths. Typically, using a while loop along a variable that tracks the state. Instead of:
print("A")
print("B")
You get:
local state = 0
while true do
if state == 0 then
print("A")
state = 1
elseif state == 1 then
print("B")
break
end
end
The more branches and conditions you add, the harder it is to reverse. This strategy combines well with virtualized execution.
Virtual Machine Obfuscation
VM-based obfuscation is arguably the final boss of Lua obfuscation. The core idea is to remake the Lua interpreter from scratch. You can either use your custom bytecode or the original.
Father of Lua VMs
The first public Lua VM used for obfuscation appeared over 12 years ago: LBI. This was groundbreaking at the time. There was almost no public documentation on Lua internals. Since then, Rerumu has released three Lua VMs likely inspired by LBI: Rerubi, FiOne, and FiThree. Two target Lua 5.1, and one targets Lua 5.3. These projects are valuable for understanding how Lua bytecode can be emulated.
That said, most obfuscators you’ll see in the wild today are based on IronBrew. We know this because of a signature. IronBrew’s unique OP_JMP design, which has one optimization that the other VMs don’t have. This pattern has been spotted repeatedly in commercial and skidded obfuscators: IronBrew’s OP_JMP implementation
IronBrew has gained popularity among skidders because it’s a ready-to-use obfuscator with okay anti-decompiler tricks. making it a prime candidate for skids, selling it with minimal or no modification.