Links:

Lua Virtualization Part 5: Actually Devirtualizing Luraph

This is part 5 of a 5 part series about Lua Virtualization:

It’s been a while since the last update. Time constraints pushed this project aside for a bit. But we’re back, and now we’ll actually devirtualize Luraph.

Picking up from where we left off: we fully understood the pre-VM deserialization process and identified Luraph’s dual VM architecture: the Deserialization VM and the Real VM.

The logical next step seemed to be analyzing the IR executed by the Deserialization VM. However, this quickly turned into a tedious and boring task. After stepping away and revisiting the problem with fresh eyes, we notice that this step is unnecessary.

The Real VM is being executed through the same handler, h_funcs["VM"]:

local deserialized_execution_data = h_funcs["VM"](
    Deserialize(),
    h_funcs["upvalues"]
)(
    Deserialize,
    Self.nil_self_index,
    h_funcs["get_self_index"],
    execute_first_arg,
    h_funcs["gFloat"],
    h_funcs["gBits8"],
    h_funcs["gBits32"],
    Self.control_flow_tbl_for_first_vm,
    nil,
    h_funcs["VM"]
)

local ret =  unpack({
    h_funcs["VM"](deserialized_execution_data, h_funcs["upvalues"])
})

return ret

Intercepting the Real Payload

This observation changes everything. Instead of fully reverse engineering the Deserialization VM, we can treat it as a black box and simply intercept its output.

The key insight is that deserialized_execution_data is the fully unpacked IR, the real program, ready to execute. It has already passed through every layer of decryption and transformation. All we need to do is capture it right before it enters the Real VM:

local deserialized_execution_data = h_funcs["VM"](
    Deserialize(),
    h_funcs["upvalues"]
)(
    Deserialize,
    Self.nil_self_index,
    h_funcs["get_self_index"],
    execute_first_arg,
    h_funcs["gFloat"],
    h_funcs["gBits8"],
    h_funcs["gBits32"],
    Self.control_flow_tbl_for_first_vm,
    nil,
    h_funcs["VM"]
)

-- Intercept here

local ret =  unpack({
    h_funcs["VM"](deserialized_execution_data, h_funcs["upvalues"])
})

return ret

By inserting our own code at this point, we can dump the entire IR to disk and analyze it, no need to understand how the Deserialization VM produced it.

Dumping the IR

The following script performs the interception and serializes the IR into a JSON file:

do
    local Insts_magic = 3
    local REG_B_magic = 4
    local REG_A_magic = 5
    local REG_C_magic = 10
    local decrypted_constants_magic = 8
    local function_prototypes_magic = 7
    local constants_magic = 9

    local function safe_index(t,k)
        local ok,r=pcall(function() return t[k] end)
        return ok and r or "nil"
    end

    local Insts=deserialized_execution_data[Insts_magic]
    local REG_B=deserialized_execution_data[REG_B_magic]
    local REG_A=deserialized_execution_data[REG_A_magic]
    local REG_C=deserialized_execution_data[REG_C_magic]
    local decrypted_constants=deserialized_execution_data[decrypted_constants_magic]
    local function_prototypes=deserialized_execution_data[function_prototypes_magic]
    local constants=deserialized_execution_data[constants_magic]

    local f=assert(io.open("./real_vm_ir.json","w"))

    f:write('{"stk_size":'..tostring(deserialized_execution_data[6] or 0)..',"n_instructions":'..#Insts..',"instrs":[')
    for i = 1, #Insts do
        if i > 1 then f:write(',') end

        local dc=safe_index(decrypted_constants, i)
        local fp=safe_index(function_prototypes, i)
        local cn=safe_index(constants, i)

        f:write(string.format('{"pc":"%d","op":"%s","A":"%s","B":"%s","C":"%s","dec_const":"%s","func_proto":"%s","const":"%s"}',
        i, tostring(Insts[i]), tostring(REG_A[i]), tostring(REG_B[i]), tostring(REG_C[i]), dc, fp, cn))
    end

    f:write(']}')
    f:close()
    io.write("Dumped IR\n")
    os.exit(0)
end

Note that the magic offsets used to access deserialized_execution_data are unique to each obfuscated file. They must be resolved manually per sample.

The output looks like this:

{
  "stk_size": 4,
  "n_instructions": 19,
  "instrs": [
    { "pc": "1",  "op": "29",  "A": "0", "B": "5",   "C": "0",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "2",  "op": "29",  "A": "0", "B": "5",   "C": "0",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "3",  "op": "29",  "A": "0", "B": "5",   "C": "0",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "4",  "op": "17",  "A": "97","B": "468", "C": "417", "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "5",  "op": "21",  "A": "0", "B": "0",   "C": "0",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "6",  "op": "29",  "A": "0", "B": "6",   "C": "0",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "7",  "op": "15",  "A": "0", "B": "0",   "C": "3",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" },
    { "pc": "8",  "op": "59",  "A": "nil","B": "3",  "C": "nil", "dec_const": "nil",          "func_proto": "16",    "const": "2"   },
    { "pc": "15", "op": "84",  "A": "1", "B": "0",   "C": "nil", "dec_const": "nil",          "func_proto": "print", "const": "nil" },
    { "pc": "16", "op": "196", "A": "0", "B": "nil", "C": "148", "dec_const": "Hello World!", "func_proto": "nil",   "const": "nil" },
    { "pc": "17", "op": "49",  "A": "0", "B": "1",   "C": "0",   "dec_const": "nil",          "func_proto": "nil",   "const": "nil" }
  ]
}

Even a quick glance reveals useful information: the strings "print" and "Hello World!" are plainly visible. From these alone, we can already guess the original program:

print("Hello World!")

Lifting the Opcodes

Strings are helpful, but relying on them alone is not a robust strategy. To properly reconstruct the program, we need to map each numeric op value to its corresponding VM opcode.

This mapping was established earlier in the series. Applying it to the full instruction stream yields:

[1] OP_JMP                  0 5 0
[2] OP_JMP                  0 5 0
[3] OP_JMP                  0 5 0
[4] OP_CALL                 97 468 417
[5] OP_RETURN               0 0 0
[6] OP_JMP                  0 6 0
[7] LOAD_REG_C              0 0 3
[8] LOAD_DESERIALIZE_K      0 3 0; func_proto=16, const=2
[9] LOAD_INSTS              0 0 3
[10] LOAD_DESERIALIZE_K     0 3 0; func_proto=16, const=45
[11] NOP                    3 0 0
[12] LOAD_DESERIALIZE_K     0 3 0; func_proto=6, const=13
[13] NOP                    0 3 0
[14] NOP                    0 0 0
[15] OP_GETGLOBAL           1 0 0; func_proto="print"
[16] NOP                    0 0 148; dec_const="Hello World!"
[17] OP_CALL                0 1 0
[18] OP_JMP                 0 4 0
[19] OP_JMP                 0 5 0

At first glance, this doesn’t make much sense. There are NOPs in places where real instructions should be, familiar looking opcodes like OP_GETGLOBAL sit alongside mysterious LOAD variants, and control flow seems to jump around. The IR looks incoherent, almost as if it were intentionally scrambled.

That’s because it was.

Polymorphic IR

The key to understanding this instruction stream is recognizing that it is not meant to be read statically. The VM rewrites its own instructions at runtime before executing them. This behavior is Polymorphic. The instruction stream mutates during execution, so its static form is intentionally misleading.

Let’s step through execution to see how this works:

[1]  Jump to VIP=6
[6]  Jump to VIP=7
[7]  Stk[3] = REG_C              -- Load the REG_C table onto the stack
[8]  REG_C[16] = 2               -- Patch instruction 16's C register
[9]  Stk[3] = Insts              -- Load the instruction table onto the stack
[10] Insts[16] = 45              -- Patch instruction 16's opcode to LOAD_DECRYPTED_STRING
[11] NOP
[12] Insts[6] = 13               -- Patch instruction 6's opcode
[13] NOP
[14] NOP
[15] Stk[1] = vm_env["print"]    -- Look up "print" in the environment
[16] Stk[2] = "Hello World!"     -- Load the string (after being patched)
[17] Stk[1](Stk[1 + 1])          -- Call print("Hello World!")
[18] Jump to VIP=5
[5]  return

The interesting behavior begins at VIP=7. The VM pushes REG_C onto the stack. Then at VIP=8, it writes the value 2 into position 16 of that table, effectively patching instruction 16’s C operand.

The same pattern repeats for the instruction table itself: at VIP=9, the VM loads Insts (the opcode array), and at VIP=10, it overwrites position 16 with the value 45 — which maps to LOAD_DECRYPTED_STRING. In other words, instruction 16 starts its life as a NOP and is transformed into a string load operation before execution reaches it.

This is the essence of Polymorphic IR. The raw instruction dump is not the real program. It’s a trap: a lot of jumps, loads, and patches that assembles the actual program during execution. Only after these runtime mutations are applied does the true control flow emerge.

Once we account for the mutations, the program resolves cleanly:

print("Hello World!")

From an automation standpoint, the correct approach is to implement handlers for these mutation opcodes and simulate execution of the dumped IR. The goal is to replay only the patching behavior, not the actual program logic, so that we recover the fully resolved instruction stream without running untrusted code.

Remaining Protections

Successfully devirtualizing a single sample does not mean Luraph is fully defeated. The platform includes numerous additional protections, including configurable macros and alternative obfuscation settings that were not covered in this series.

Thoughts on Automation

In theory, this entire process can be automated. In practice, it’s non-trivial.

A fully automated pipeline would require:

  • Identifying deserialized_execution_data
  • Identifying the correct interception point
  • Resolving the magic offsets
  • Mapping opcode identifiers
  • Resolving the self modifying behavior.

Opcode recovery can be partially automated using pattern matching against compiled luac output, a technique that has worked well for me in the past. However, reliably identifying deserialized_execution_data, the magic offsets, and the interception point remains challenging due to the variability introduced by Luraph.

For this reason, a semi-automated workflow is likely the most practical approach: manually locate the interception point and offsets, then automate the dumping, opcode mapping, and mutation resolution. This strikes a balance between scalability and reliability without over-engineering the process.

Luraph v14.7

After looking at a provided v14.7 sample, it appears that the dual VM architecture is still intact. The overall structure, the Deserialization VM* feeding into the Real VM, the interception point between them, the Polymorphic IR, all of it looks functionally identical to what we’ve analyzed throughout this series.

The one notable change is how constants are handled. In earlier versions, constants were embedded directly in the deserialized output and could be read straight from the intercepted IR. In v14.7, they are no longer present in deserialized_execution_data. Instead, they appear to be resolved through a separate mechanism, likely an additional protection layer.

This means the interception technique still works for recovering the instruction stream, opcodes, and register operandsm, but the constant pool comes up empty. Solving this final piece: figuring out where and how v14.7 resolves its constants, would make it possible to fully devirtualize samples again.