Saturday, February 26, 2022

Lua Devirtualization

Lua Devirtualization Part 1: Introduction 21/03/2021

In this series of articles, I will take you on a journey to show you a darker side of the programming world. A place that is fueled by money, script kiddies, and even more money. To prevent code from being cracked and resold, we must outsmart each other and develop security mechanisms that are either too hard to solve or take up to much time to solve. One of those security mechanisms are obfuscators, today will be all about why the need for obfuscators and how they work.

All articles will target Lua, and for those who didn't know yet, Lua is a very minimalistic scripting language with only a handful of bytecodes. Our target is using Lua version 5.1, so to keep things simple we will also target Lua with version 5.1, meaning that, every time I refer to 'Lua', I am referring to Lua version 5.1 (unless I explicitly state a version).

This article is part 1 of 4, you can find a complete overview of all the articles below:


Lua Crash Course

Before we get started, there are a few things we need to know about Lua. Lua is a very basic scripting language that comes with exactly 38 bytecodes and a total of 5 registers. The registers can't be used at the same time, because some of them are shared.

The registers are named A, B, C, Ax, Bx and sBx. The first one, A, is a 8-bit register while the next two, B and C are both 9-bit registers. Then the Ax register is a 26-bit register, which is just a combination of register A, B and C. Our last two registers, Bx and sBx are also a combination, Bx is only 18-bits and is combined from register B and C. Lastly, the sBx register is a signed Bx register, this register is often used for all kinds of jumps.

NOTE: everything that is struck through applies to more modern versions of Lua and is out of scope for this article.

But why Lua?

Decompiling and reverse engineering Lua is pretty easy thanks to the nature of the language, it's not a flaw, it was a design choice. But that design choice may be a huge disadvantage for developers who want to make money with their Lua scripts. And trust me, I know plenty of people that do make money by selling such scripts.

The Lua language has become very popular thanks to video games like World of Warcraft, League of Legends, Roblox, Garry's Mod, and probably a lot more. Not all games allow you to execute Lua from the user interface, but World of Warcraft for example allows you to use third-party Lua-based AddOns, such AddOns are limited to only modify the user interface. But spoiler alert, people often modify the game to extend the Lua APIs so that the Lua interface is capable of (for example) automating gameplay.

Now that people figured out how to put Lua on steroids, they can start developing more versatile scripts using just Lua combined with a tool that extends the Lua API. These tools to extend the Lua API are often called 'Lua Unlockers', because, they 'unlock' Lua API's that were not originally in the game. Most of those Lua Unlockers are sold on game hacking forums, and they often are well documented so anyone can use them right away, which, makes everything just a little more interesting.

Lua Obfuscation

When people create their versatile Lua script they often put a lot of time into solving a given problem. Solving that problem often require a lot of research and only a little amount of code, meaning that most of your valuable time was put into questioning "how to fix problem X" while very little time was spend on smashing the keystrokes. So after you found your 'magic' solution, the last thing you want is, having the first guy you sell the Lua script to steal your magic solution.

And this is where Luraph comes in place, Luraph is an obfuscation tool for Lua that, you guessed it, obfuscates Lua. Below you will find a snippet from a Luraph obfuscated Lua file so you can have an idea of how a Luraph obfuscated file looks like.

local lIll1il1I11i111l1iii1 = assert
local lIllIl1IIi1iII1Iii1 = select
local lii1iii11ilIIIl11Il = tonumber
local iI1lili1I1Iiii11i1l = unpack
local i11iIIII1lilIl1i1Il = pcall
local I1lIII1ii111IIIlii1 = setfenv
<...>
    -- table loops here
<...>
local function lIll11ili1IiiIilill()
    while true do
        local IIliIIiiI11111iI1l1 = il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il]
        local liiiil1lii1llll1i11 = IIliIIiiI11111iI1l1[26353]
        IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il + 1
        local I1ilIiIil1ii1iI1i1I = IIliIIiiI11111iI1l1[26628]
        local lilli1II1il1lliiIii = IIliIIiiI11111iI1l1[19330]
        local iIli1iiiIl1li1il1I1 = IIliIIiiI11111iI1l1[63082]
        local iII111IiiIi11liliiI = IIliIIiiI11111iI1l1[19330] - lIl11I1lIliIl1iIilil1
        local lIliiI1iIIIIiill1iI = IIliIIiiI11111iI1l1[22182]
        if liiiil1lii1llll1i11 >= 17 then
            if liiiil1lii1llll1i11 < 25 then
                if liiiil1lii1llll1i11 < 21 then
                    if liiiil1lii1llll1i11 >= 19 then
                        if liiiil1lii1llll1i11 ~= 20 then
                            if I1ilIiIil1ii1iI1i1I == 4 then
                                IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il - 1
                                il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] = {
                                    [26353] = 31,
                                    [63082] = (iIli1iiiIl1li1il1I1 - 25) % 256,
                                    [22182] = (lIliiI1iIIIIiill1iI - 25) % 256,
                                    [19330] = 0
                                }
                            elseif I1ilIiIil1ii1iI1i1I == 121 then
                                IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il - 1
                                il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] = {
                                    [26353] = 9,
                                    [63082] = (iIli1iiiIl1li1il1I1 - 233) % 256,
                                    [26628] = (lIliiI1iIIIIiill1iI - 233) % 256,
                                    [19330] = 0
                                }
                            elseif I1ilIiIil1ii1iI1i1I == 75 then
                                IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il - 1
                                il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] = {
                                    [26353] = 6,
                                    [63082] = (iIli1iiiIl1li1il1I1 - 75) % 256,
                                    [26628] = (lIliiI1iIIIIiill1iI - 75) % 256,
                                    [19330] = 0
                                }
                            else
<...>
        return I11i1IIiil1l11Il11l
    end
    local liIli1lil11ll1Iilli = lIlIilii1illI111i1iii()
    return ll1I1lliii1iiIii1ii(liIli1lil11ll1Iilli, Iii1i11ii1IlIillli1)()
end
lIllll1i1iilIIIi11lIi(
    "LPH!F03BAE013H00D7043H00164H00710A0200393B393BC84FFF4E2H393F436B0F0<...>9C9F5B59001961A7E0E",
    i1llilll1iliI1ii1l1()
)

NOTE 1: I have removed content wherever the <...> signs are located, the file was about 72KB total.
NOTE 2: The Lua script has been parsed through a Lua beautifier to keep things pretty.

The Luraphed file can be divided into four sections, the first section is responsible for setting up the virtual environment for the Lua VM (spoiler alert, we are looking at a Virtual Machine). Section two seems to define a lot of helper functions, which we will discuss in detail later on. Section two also seems to set up some kind of local environment, Section three is this big IF section that seems to be responsible for interpreting instructions, and finally, the last section contains a big string, starting with "LPH!, which seems to be holding hexadecimal values.

Cleaning it up

Before actually starting doing something, I had a look at all those variables and started renaming them using just notepad++. Notepad++ comes with this 'search and replace' feature, which I used to name the first few variables as seen below.

local lassert = assert
local lselect = select
local tonumberf = tonumber
local lunpack = unpack
local pCallF = pcall
local setfenvf = setfenv
local setmettabll = setmetatable
local typef = type
local getfenvv = getfenv
local ToStr = tostring
local err = error
local StrSub = string.sub
local StrByte = string.byte
local StrChar = string.char
local StrRep = string.rep
local StrGsub = string.gsub
local StrMatch = string.match

Not only did I rename those, but I have also renamed a few more obvious things such as the variable name of the "LPH!, and that's when I took another look at the whole script. After having a quick look I released there are functions from section two that get referenced a lot, so I attempted to reverse engineer those first. Have a look at a few of the functions below.

Original:

local function IiIil1I1111ll1i1ii1()
    local lIlliI1i1IIiiI1i1llil = Iiii11iiiIl1lllil1l(IIi1liiiiil1iIiil1l, iiIIIl11IllI1lIl111, iiIIIl11IllI1lIl111)
    iiIIIl11IllI1lIl111 = iiIIIl11IllI1lIl111 + 1
    return lIlliI1i1IIiiI1i1llil
end
Renamed:
local function LPH_GetByte()
  local var1 = StrByte(LPHSTRING, LPH_IP, LPH_IP)
  LPH_IP = LPH_IP + 1
  return var1
end

Original:

local function Ii111I1II1lIl1ll1ll()
    local lIlliI1i1IIiiI1i1llil, lIliIl1il1Ill1l1Iiill, lIll111IlilIlIIi1i1lI, lili1l11lIiIIlIl1i1 =
        Iiii11iiiIl1lllil1l(IIi1liiiiil1iIiil1l, iiIIIl11IllI1lIl111, iiIIIl11IllI1lIl111 + 3)
    iiIIIl11IllI1lIl111 = iiIIIl11IllI1lIl111 + 4
    return lili1l11lIiIIlIl1i1 * 16777216 + lIll111IlilIlIIi1i1lI * 65536 + lIliIl1il1Ill1l1Iiill * 256 +
        lIlliI1i1IIiiI1i1llil
end
Renamed:
local function LPH_GetDWORD()
  local b1, b2, b3, b4 = StrByte(LPHSTRING, LPH_IP, LPH_IP + 3)
  LPH_IP = LPH_IP + 4
  local result = b4 * 0x1000000 + b3 * 0x10000 + b2 * 0x100 + b1
  return result
end

I hope those two are enough to show you how effective the variable re-naming is, and how much content got revealed doing so. Another thing that took my attention was this global variable that always got increased, I renamed it to LPH_IP. Such a variable is often referred to as what's called a Virtual Instruction Pointer, it's what the VM will use to keep track of its current instruction pointer. But that didn't really turn out to be the case, since those helper functions are responsible for decoding the LPH content and thus are only used to initialize the contents for the Lua VM.

For the record, there are more than just those two helper functions. The reason you only got to see these two is that I only need two to prove my point. Below is a summary of all the helper functions I found, the function names are guessed based on their body, I will keep using these function names trough out the whole article.

  • LPH_GetByte: Decodes a byte from the LPH string.
  • LPH_GetDWORD: Decodes a int from the LPH string.
  • LPH_GetBits: Performs weird bitwise logic (possible instruction decoder)
  • LPH_GetFloat: Decodes a Float (or Double, not sure) from the LPH string.
  • LPH_GetDWORD_2: Decodes Unknown 4bytes from the LPH string.
  • LPH_GetString: Decodes Unknown 4bytes from the LPH string.
All of them except LPH_GetBits() increase the LPH_IP variable base on the amount of bytes they take from the LPH string. The LPH_GetDWORD_2() seems to be very similar to the LPH_GetString(), I assume that LPH_GetDWORD_2() may be used for some kind of encrypted string handeling.

Unpacking

Section one was basically reversed by simply cleaning up and renaming those variables. Unfortunately, section two won't be as easy as that. It seems like someone spend some actual time in here by using tables with random-looking numbers to throw me off track. Below is the main function for section two:

local function FourLoopFunc()
    local table_result = {[69434] = {}, [58352] = {}, [92302] = {}, [122901] = {}} -- random numbers as obfuscation
    LPH_GetByte()
    local endd = LPH_GetDWORD()
    
    -- do: table_result[#4]
    for index = unk_var_1, endd do
    <...>
    
    -- do: table_result[#2]
    local endd = LPH_GetDWORD() -
    (#{<...>
    for index = unk_var_1, endd do
    <...>
    
    -- do: table_result[9173]
    LPH_GetDWORD()
    LPH_GetByte()
    LPH_GetByte() -- IP += 6
    table_result[9173] = LPH_GetByte()

    -- do: table_result[#3]
    local endd = LPH_GetDWORD() -
    (#{<...>
    for index = unk_var_1, endd do
    <...>

    -- do: table_result[#1]
    LPH_GetDWORD()
    LPH_GetByte()
    LPH_GetByte() -- IP += 6
    local endd = LPH_GetDWORD()
    for index = unk_var_1, endd do
        table_result[69434][index] = LPH_GetDWORD()
    end

    -- do: table_result[81381]
    LPH_GetByte()
    LPH_GetDWORD() -- IP += 5
    table_result[81381] = LPH_GetByte()

    -- do: table_result[109654]
    LPH_GetDWORD()
    LPH_GetDWORD()
    LPH_GetByte()
    LPH_GetByte() -- IP += 10
    table_result[109654] = LPH_GetByte()

    LPH_GetDWORD()
    LPH_GetDWORD()
    LPH_GetByte()
    LPH_GetDWORD()
    LPH_GetDWORD() -- IP += 17
    return table_result
end

Have a good look and you will see that it all comes down to table_result, the table is assigned with 4 entries that have weird numbers. You can see I have added comments such as do: table_result[#1] to indicate which number belongs to which index of the table. But other than that, I snipped out all nasty loops since I don't feel like spending a night or two on this, so well played Luraph, you win this round.

Alright then, keep your secrects, Meme

Just kidding, we can just continue to the next section because these tables are only responsible for the Lua VM constants, registers, upvalues, and some other things that will be more explained in Part 2. So bear with me while I explain to you the third section of the Lua VM.

Interpreting the interpreter

The second last section, section three, is where it's at. Do you remember that one function with all the ugly IF statements? well, this is him now:

local function UnpackFunctionidk()
    while true do
        local inst_table = flp_ret_58352[index]
        local loop_opcode = inst_table[26353] -- OPCODE
        index = index + 1 -- VM instruction pointer? (LPH_IP is just stack data?)
        local loop_v1 = inst_table[26628] -- A or B
        local loop_v2 = inst_table[19330] -- Bx
        local loop_v3 = inst_table[63082] -- A or B
        local loop_v4 = inst_table[19330] - unk_var_2 -- sBx (- 2^18/2, 17bit)
        local loop_v5 = inst_table[22182] -- C
        if loop_opcode >= 17 then
          if loop_opcode < 25 then
             if loop_opcode < 21 then
                if loop_opcode >= 19 then
                   if loop_opcode ~= 20 then
                      if loop_v1 == 4 then
                         index = index - 1
                         flp_ret_58352[index] = {
                            [26353] = 31,
                            [63082] = (loop_v3 - 25) % 256,
                            [22182] = (loop_v5 - 25) % 256,
                            [19330] = 0
                         }
                      elseif loop_v1 == 121 then
                         index = index - 1
                         flp_ret_58352[index] = {
                            [26353] = 9,
                            [63082] = (loop_v3 - 233) % 256,
                            [26628] = (loop_v5 - 233) % 256,
                            [19330] = 0
                         }
                      elseif loop_v1 == 75 then
                         index = index - 1
                         flp_ret_58352[index] = {
                            [26353] = 6,
                            [63082] = (loop_v3 - 75) % 256,
                            [26628] = (loop_v5 - 75) % 256,
                            [19330] = 0
                         }
                      else
                         if loop_v5 == 1 then
                            return true
                         end
                         local IiIiil11IIll1ii1li1 = loop_v3 + loop_v5 - 2
                         if loop_v5 == 0 then
                            IiIiil11IIll1ii1li1 = lIlIIIli1iIlll1IlliiI
                         end
                         return true, loop_v3, IiIiil11IIll1ii1li1
                      end
                   else -- opcode 20 (LEN)
                      if loop_v5 > 255 then
                         loop_v5 = table_set_1[loop_v5 - 256][int_44827]
                      else
                         loop_v5 = result_packed[loop_v5]
                      end
                      if loop_v1 > 255 then
                         loop_v1 = table_set_1[loop_v1 - 256][int_44827]
                      else
                         loop_v1 = result_packed[loop_v1]
                      end
                      result_packed[loop_v3] = loop_v5 ^ loop_v1
                   end
                elseif loop_opcode ~= 18 then -- opcode 17
                <...>

Remember those nasty tables we just talked about? here they are again. Pay close attention to the start of the IF chain, the loop_v1 to loop_v5 are temporarily storage variables for the registers, which are obtained from the table inst_table. The following data from the inst_table can be mapped to instruction info:

  • inst_table[26353]: Lua Bytecode (custom).
  • inst_table[26628]: Register B
  • inst_table[19330]: Register Bx.
  • inst_table[63082]: Register A.
  • inst_table[19330] - unk_var_2: Register sBx.
  • inst_table[22182]: Register C.
We can see that inst_table with index 26353 is used to obtain a variable that is constantly checked against a number, ranging from within byte range (0 ~ 255). I assume those bytes represent custom Luraph bytecode, which can be mapped against normal Lua bytecodes, making it almost look like the Lua bytescode are 'renamed'. Another thing I noticed is that 19330 was used twice, which makes sense because the register sBx is derived from Bx, this seems to be done by subtracting unk_var_2 from Bx. The value of the unk_var_2 should be exactly 0x1FFFF in order to clear the first 17 bits of Bx, which is needed to calculate sBx.

We can verify the value of unk_var_2 by looking right below section one, there you will find the following code.

local getbyte_7E = StrByte("~", 1) -- 7E
local unk_var_1, unk_var_2 = #{273},
    #{ 5703, 3015, 5331, 6890, 5857, 5221, 219, 1250, 2422, 4066, 2329, 3462, 2189, 6944, 4479, 2107, 6710, 5803, 4390, 5185, 806, 3642, 5866} + getbyte_7E + 130922
The variables have already been renamed to make it easy to read. Variable getbyte_7E converts the ASCII ~ to a byte, which, according to the ASCII table, represents value 0x7E. Our next line defines both unk_var_1 and unk_var_2, the first one get value #{273}, the curly brackets indicate it's a table while the hashtag indicates it's grabbing the size of the table, meaning that unk_var_1 will receive value 1. Our next variable is a little more complex, but again, it comes down to the length of a table with random values, with the addition of our getbyte_7E variable holding a value of 0x7E, and lastly, the addition of number 130922, making it look very complex. Quick recap, we know the table contains 23 entries + 0x7E + 130922 which equals 131071 or 0x1FFFF in hexadecimal. Now look at that, 0x1FFFF is the exact value that is needed to subtract from Bx to obtain sBx.

Now that we know which register is what, we can have a look at the IF statements. One of the first thing I noticed is those IF statements almost never have an equal statement, instead, they seem to use different kinds of operands like 'less then' or 'greater then'. Basically, any operator that is not the equal operator will be used (if possible) to make reversing a bit harder, as we will see in the next part.

Lifting to Lua

This is it, this is what y'all been waiting for. For those who don't know yet, Lifting is basically the process of mapping one instruction set to another, and in our case, we will be lifting the Luraph instructions to the original Lua instructions. Before we can do this we must identify the Luraph instructions, this explains why the IF statements are obfuscated in the first place. Doing a simple RegEx to check the Luraph bytecode and then read the body of that IF statement to identify the corresponding Lua bytecode will be a bit harder to do. Not only do we have to figure out a way to identify the Luraph bytecode, but we also need to understand the actual functionality of the original Lua bytecodes before we can identify them.

This Lua 5.3 bytecode Reference, which is for Lua 5.3 (which isn't 5.1, I know), can be used to get a better understanding of how each Lua bytecode works. Please note that the reference listed is for Lua 5.3, lets not forget that Lua 5.1 was released on 21 Feb 2006, so finding fancy documentation isn't easy. Lucky for you, there is in fact a Lua Bytecode Interpreter project on Github, which was written for Lua 5.1, in Lua 5.1. The file src/lbi.lua contains the Lua 5.1 bytecode interpreter at line 268, we can use this to manually see how Luraph is interpreting each instruction. Of cours, once we do a few instructions manual, we should jump into making a parser that can automaticlly identify the Luraph bytecode from the IF statement and then compare the body of that IF statement in order to lift the Luraph bytecode to a Lua bytecode.

Automated Lifting

Unfortunately, all the Luraph bytecodes change for every file it generates, meaning that we do have to identify all Luraph bytecodes again for every new Luraph script we want to reverse. Therefore I would like to automate the process. Not only do the bytecodes change, but the table indexing, (used for registers, constants, upvalues, etc.) may also change, which, makes it a little more difficult and very time-consuming to do manual.

Conclusion

We have analyzed a Luraph obfuscated script that is virtualizing custom/unknown bytecodes. Presumably, those bytecodes may have a different identifiers, but their functionalities will be equal to the original Lua bytecodes. After realizing that we concluded that we need to lift Luraph bytecode to Lua bytecode and then decompile them to get a somewhat human-readable Lua script.

In Part 2: Decompiling Lua, we will create a Lua decoder and decompiler that we will use to, first of all, get a better understanding of how compiled Lua works, and then continue to develop tools that we will be used to de-virtualize Luraph.

Next article: Part 2: Decompiling Lua


Have something to say?

Contact me at admin@ferib.be


from Hacker News https://ift.tt/i9bjsnC

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.