Lua Devirtualization Part 1: Introduction 21/03/2021
In this series of articles, I will take you on a journey to show you a darker side of the programming world. A place that is fueled by money, script kiddies, and even more money. To prevent code from being cracked and resold, we must outsmart each other and develop security mechanisms that are either too hard to solve or take up to much time to solve. One of those security mechanisms are obfuscators, today will be all about why the need for obfuscators and how they work.
All articles will target Lua, and for those who didn't know yet, Lua is a very minimalistic scripting language with only a handful of bytecodes. Our target is using Lua version 5.1, so to keep things simple we will also target Lua with version 5.1, meaning that, every time I refer to 'Lua', I am referring to Lua version 5.1 (unless I explicitly state a version).
This article is part 1 of 4, you can find a complete overview of all the articles below:
Lua Crash Course
Before we get started, there are a few things we need to know about Lua. Lua is a very basic scripting language that comes with exactly 38 bytecodes and a total of 5 registers. The registers can't be used at the same time, because some of them are shared.
The registers are named A, B, C, Ax, Bx and sBx. The first one, A, is a 8-bit register while the next two, B and C are both 9-bit registers. Then the Ax register is a 26-bit register, which is just a combination of register A, B and C. Our last two registers, Bx and sBx are also a combination, Bx is only 18-bits and is combined from register B and C. Lastly, the sBx register is a signed Bx register, this register is often used for all kinds of jumps.
NOTE: everything that is struck through applies to more modern versions of Lua and is out of scope for this article.
But why Lua?
Decompiling and reverse engineering Lua is pretty easy thanks to the nature of the language, it's not a flaw, it was a design choice. But that design choice may be a huge disadvantage for developers who want to make money with their Lua scripts. And trust me, I know plenty of people that do make money by selling such scripts.
The Lua language has become very popular thanks to video games like World of Warcraft, League of Legends, Roblox, Garry's Mod, and probably a lot more. Not all games allow you to execute Lua from the user interface, but World of Warcraft for example allows you to use third-party Lua-based AddOns, such AddOns are limited to only modify the user interface. But spoiler alert, people often modify the game to extend the Lua APIs so that the Lua interface is capable of (for example) automating gameplay.
Now that people figured out how to put Lua on steroids, they can start developing more versatile scripts using just Lua combined with a tool that extends the Lua API. These tools to extend the Lua API are often called 'Lua Unlockers', because, they 'unlock' Lua API's that were not originally in the game. Most of those Lua Unlockers are sold on game hacking forums, and they often are well documented so anyone can use them right away, which, makes everything just a little more interesting.
Lua Obfuscation
When people create their versatile Lua script they often put a lot of time into solving a given problem. Solving that problem often require a lot of research and only a little amount of code, meaning that most of your valuable time was put into questioning "how to fix problem X" while very little time was spend on smashing the keystrokes. So after you found your 'magic' solution, the last thing you want is, having the first guy you sell the Lua script to steal your magic solution.
And this is where Luraph comes in place, Luraph is an obfuscation tool for Lua that, you guessed it, obfuscates Lua. Below you will find a snippet from a Luraph obfuscated Lua file so you can have an idea of how a Luraph obfuscated file looks like.
local lIll1il1I11i111l1iii1 = assert local lIllIl1IIi1iII1Iii1 = select local lii1iii11ilIIIl11Il = tonumber local iI1lili1I1Iiii11i1l = unpack local i11iIIII1lilIl1i1Il = pcall local I1lIII1ii111IIIlii1 = setfenv <...> -- table loops here <...> local function lIll11ili1IiiIilill() while true do local IIliIIiiI11111iI1l1 = il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] local liiiil1lii1llll1i11 = IIliIIiiI11111iI1l1[26353] IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il + 1 local I1ilIiIil1ii1iI1i1I = IIliIIiiI11111iI1l1[26628] local lilli1II1il1lliiIii = IIliIIiiI11111iI1l1[19330] local iIli1iiiIl1li1il1I1 = IIliIIiiI11111iI1l1[63082] local iII111IiiIi11liliiI = IIliIIiiI11111iI1l1[19330] - lIl11I1lIliIl1iIilil1 local lIliiI1iIIIIiill1iI = IIliIIiiI11111iI1l1[22182] if liiiil1lii1llll1i11 >= 17 then if liiiil1lii1llll1i11 < 25 then if liiiil1lii1llll1i11 < 21 then if liiiil1lii1llll1i11 >= 19 then if liiiil1lii1llll1i11 ~= 20 then if I1ilIiIil1ii1iI1i1I == 4 then IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il - 1 il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] = { [26353] = 31, [63082] = (iIli1iiiIl1li1il1I1 - 25) % 256, [22182] = (lIliiI1iIIIIiill1iI - 25) % 256, [19330] = 0 } elseif I1ilIiIil1ii1iI1i1I == 121 then IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il - 1 il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] = { [26353] = 9, [63082] = (iIli1iiiIl1li1il1I1 - 233) % 256, [26628] = (lIliiI1iIIIIiill1iI - 233) % 256, [19330] = 0 } elseif I1ilIiIil1ii1iI1i1I == 75 then IiI1lIii1iliiI1l1il = IiI1lIii1iliiI1l1il - 1 il1lIllli1illIiiliI[IiI1lIii1iliiI1l1il] = { [26353] = 6, [63082] = (iIli1iiiIl1li1il1I1 - 75) % 256, [26628] = (lIliiI1iIIIIiill1iI - 75) % 256, [19330] = 0 } else <...> return I11i1IIiil1l11Il11l end local liIli1lil11ll1Iilli = lIlIilii1illI111i1iii() return ll1I1lliii1iiIii1ii(liIli1lil11ll1Iilli, Iii1i11ii1IlIillli1)() end lIllll1i1iilIIIi11lIi( "LPH!F03BAE013H00D7043H00164H00710A0200393B393BC84FFF4E2H393F436B0F0<...>9C9F5B59001961A7E0E", i1llilll1iliI1ii1l1() )
NOTE 1: I have removed content wherever the <...>
signs are located, the file was about 72KB total.
NOTE 2: The Lua script has been parsed through a Lua beautifier to keep things pretty.
The Luraphed file can be divided into four sections, the first section is responsible for setting up the virtual environment for the Lua VM (spoiler alert, we are looking at a Virtual Machine). Section two seems to define a lot of helper functions, which we will discuss in detail later on. Section two also seems to set up some kind of local environment, Section three is this big IF section that seems to be responsible for interpreting instructions, and finally, the last section contains a big string, starting with "LPH!
, which seems to be holding hexadecimal values.
Cleaning it up
Before actually starting doing something, I had a look at all those variables and started renaming them using just notepad++. Notepad++ comes with this 'search and replace' feature, which I used to name the first few variables as seen below.
local lassert = assert local lselect = select local tonumberf = tonumber local lunpack = unpack local pCallF = pcall local setfenvf = setfenv local setmettabll = setmetatable local typef = type local getfenvv = getfenv local ToStr = tostring local err = error local StrSub = string.sub local StrByte = string.byte local StrChar = string.char local StrRep = string.rep local StrGsub = string.gsub local StrMatch = string.match
Not only did I rename those, but I have also renamed a few more obvious things such as the variable name of the "LPH!
, and that's when I took another look at the whole script. After having a quick look I released there are functions from section two that get referenced a lot, so I attempted to reverse engineer those first. Have a look at a few of the functions below.
Original:
local function IiIil1I1111ll1i1ii1() local lIlliI1i1IIiiI1i1llil = Iiii11iiiIl1lllil1l(IIi1liiiiil1iIiil1l, iiIIIl11IllI1lIl111, iiIIIl11IllI1lIl111) iiIIIl11IllI1lIl111 = iiIIIl11IllI1lIl111 + 1 return lIlliI1i1IIiiI1i1llil endRenamed:
local function LPH_GetByte() local var1 = StrByte(LPHSTRING, LPH_IP, LPH_IP) LPH_IP = LPH_IP + 1 return var1 end
Original:
local function Ii111I1II1lIl1ll1ll() local lIlliI1i1IIiiI1i1llil, lIliIl1il1Ill1l1Iiill, lIll111IlilIlIIi1i1lI, lili1l11lIiIIlIl1i1 = Iiii11iiiIl1lllil1l(IIi1liiiiil1iIiil1l, iiIIIl11IllI1lIl111, iiIIIl11IllI1lIl111 + 3) iiIIIl11IllI1lIl111 = iiIIIl11IllI1lIl111 + 4 return lili1l11lIiIIlIl1i1 * 16777216 + lIll111IlilIlIIi1i1lI * 65536 + lIliIl1il1Ill1l1Iiill * 256 + lIlliI1i1IIiiI1i1llil endRenamed:
local function LPH_GetDWORD() local b1, b2, b3, b4 = StrByte(LPHSTRING, LPH_IP, LPH_IP + 3) LPH_IP = LPH_IP + 4 local result = b4 * 0x1000000 + b3 * 0x10000 + b2 * 0x100 + b1 return result end
I hope those two are enough to show you how effective the variable re-naming is, and how much content got revealed doing so. Another thing that took my attention was this global variable that always got increased, I renamed it to LPH_IP
. Such a variable is often referred to as what's called a Virtual Instruction Pointer, it's what the VM will use to keep track of its current instruction pointer. But that didn't really turn out to be the case, since those helper functions are responsible for decoding the LPH content and thus are only used to initialize the contents for the Lua VM.
For the record, there are more than just those two helper functions. The reason you only got to see these two is that I only need two to prove my point. Below is a summary of all the helper functions I found, the function names are guessed based on their body, I will keep using these function names trough out the whole article.
LPH_GetByte
: Decodes a byte from the LPH string.LPH_GetDWORD
: Decodes a int from the LPH string.LPH_GetBits
: Performs weird bitwise logic (possible instruction decoder)LPH_GetFloat
: Decodes a Float (or Double, not sure) from the LPH string.LPH_GetDWORD_2
: Decodes Unknown 4bytes from the LPH string.LPH_GetString
: Decodes Unknown 4bytes from the LPH string.
LPH_GetBits()
increase the LPH_IP
variable base on the amount of bytes they take from the LPH string. The LPH_GetDWORD_2()
seems to be very similar to the LPH_GetString()
, I assume that LPH_GetDWORD_2()
may be used for some kind of encrypted string handeling.
Unpacking
Section one was basically reversed by simply cleaning up and renaming those variables. Unfortunately, section two won't be as easy as that. It seems like someone spend some actual time in here by using tables with random-looking numbers to throw me off track. Below is the main function for section two:
local function FourLoopFunc() local table_result = {[69434] = {}, [58352] = {}, [92302] = {}, [122901] = {}} -- random numbers as obfuscation LPH_GetByte() local endd = LPH_GetDWORD() -- do: table_result[#4] for index = unk_var_1, endd do <...> -- do: table_result[#2] local endd = LPH_GetDWORD() - (#{<...> for index = unk_var_1, endd do <...> -- do: table_result[9173] LPH_GetDWORD() LPH_GetByte() LPH_GetByte() -- IP += 6 table_result[9173] = LPH_GetByte() -- do: table_result[#3] local endd = LPH_GetDWORD() - (#{<...> for index = unk_var_1, endd do <...> -- do: table_result[#1] LPH_GetDWORD() LPH_GetByte() LPH_GetByte() -- IP += 6 local endd = LPH_GetDWORD() for index = unk_var_1, endd do table_result[69434][index] = LPH_GetDWORD() end -- do: table_result[81381] LPH_GetByte() LPH_GetDWORD() -- IP += 5 table_result[81381] = LPH_GetByte() -- do: table_result[109654] LPH_GetDWORD() LPH_GetDWORD() LPH_GetByte() LPH_GetByte() -- IP += 10 table_result[109654] = LPH_GetByte() LPH_GetDWORD() LPH_GetDWORD() LPH_GetByte() LPH_GetDWORD() LPH_GetDWORD() -- IP += 17 return table_result end
Have a good look and you will see that it all comes down to table_result
, the table is assigned with 4 entries that have weird numbers. You can see I have added comments such as do: table_result[#1]
to indicate which number belongs to which index of the table. But other than that, I snipped out all nasty loops since I don't feel like spending a night or two on this, so well played Luraph, you win this round.
Just kidding, we can just continue to the next section because these tables are only responsible for the Lua VM constants, registers, upvalues, and some other things that will be more explained in Part 2. So bear with me while I explain to you the third section of the Lua VM.
Interpreting the interpreter
The second last section, section three, is where it's at. Do you remember that one function with all the ugly IF statements? well, this is him now:
local function UnpackFunctionidk() while true do local inst_table = flp_ret_58352[index] local loop_opcode = inst_table[26353] -- OPCODE index = index + 1 -- VM instruction pointer? (LPH_IP is just stack data?) local loop_v1 = inst_table[26628] -- A or B local loop_v2 = inst_table[19330] -- Bx local loop_v3 = inst_table[63082] -- A or B local loop_v4 = inst_table[19330] - unk_var_2 -- sBx (- 2^18/2, 17bit) local loop_v5 = inst_table[22182] -- C if loop_opcode >= 17 then if loop_opcode < 25 then if loop_opcode < 21 then if loop_opcode >= 19 then if loop_opcode ~= 20 then if loop_v1 == 4 then index = index - 1 flp_ret_58352[index] = { [26353] = 31, [63082] = (loop_v3 - 25) % 256, [22182] = (loop_v5 - 25) % 256, [19330] = 0 } elseif loop_v1 == 121 then index = index - 1 flp_ret_58352[index] = { [26353] = 9, [63082] = (loop_v3 - 233) % 256, [26628] = (loop_v5 - 233) % 256, [19330] = 0 } elseif loop_v1 == 75 then index = index - 1 flp_ret_58352[index] = { [26353] = 6, [63082] = (loop_v3 - 75) % 256, [26628] = (loop_v5 - 75) % 256, [19330] = 0 } else if loop_v5 == 1 then return true end local IiIiil11IIll1ii1li1 = loop_v3 + loop_v5 - 2 if loop_v5 == 0 then IiIiil11IIll1ii1li1 = lIlIIIli1iIlll1IlliiI end return true, loop_v3, IiIiil11IIll1ii1li1 end else -- opcode 20 (LEN) if loop_v5 > 255 then loop_v5 = table_set_1[loop_v5 - 256][int_44827] else loop_v5 = result_packed[loop_v5] end if loop_v1 > 255 then loop_v1 = table_set_1[loop_v1 - 256][int_44827] else loop_v1 = result_packed[loop_v1] end result_packed[loop_v3] = loop_v5 ^ loop_v1 end elseif loop_opcode ~= 18 then -- opcode 17 <...>
Remember those nasty tables we just talked about? here they are again. Pay close attention to the start of the IF chain, the loop_v1
to loop_v5
are temporarily storage variables for the registers, which are obtained from the table inst_table
. The following data from the inst_table
can be mapped to instruction info:
inst_table[26353]
: Lua Bytecode (custom).inst_table[26628]
: Register Binst_table[19330]
: Register Bx.inst_table[63082]
: Register A.inst_table[19330] - unk_var_2
: Register sBx.inst_table[22182]
: Register C.
inst_table
with index 26353
is used to obtain a variable that is constantly checked against a number, ranging from within byte range (0 ~ 255). I assume those bytes represent custom Luraph bytecode, which can be mapped against normal Lua bytecodes, making it almost look like the Lua bytescode are 'renamed'. Another thing I noticed is that 19330
was used twice, which makes sense because the register sBx is derived from Bx, this seems to be done by subtracting unk_var_2
from Bx. The value of the unk_var_2
should be exactly 0x1FFFF
in order to clear the first 17 bits of Bx, which is needed to calculate sBx.
We can verify the value of unk_var_2
by looking right below section one, there you will find the following code.
local getbyte_7E = StrByte("~", 1) -- 7E local unk_var_1, unk_var_2 = #{273}, #{ 5703, 3015, 5331, 6890, 5857, 5221, 219, 1250, 2422, 4066, 2329, 3462, 2189, 6944, 4479, 2107, 6710, 5803, 4390, 5185, 806, 3642, 5866} + getbyte_7E + 130922The variables have already been renamed to make it easy to read. Variable
getbyte_7E
converts the ASCII ~
to a byte, which, according to the ASCII table, represents value 0x7E
. Our next line defines both unk_var_1
and unk_var_2
, the first one get value #{273}
, the curly brackets indicate it's a table while the hashtag indicates it's grabbing the size of the table, meaning that unk_var_1
will receive value 1
. Our next variable is a little more complex, but again, it comes down to the length of a table with random values, with the addition of our getbyte_7E
variable holding a value of 0x7E
, and lastly, the addition of number 130922
, making it look very complex. Quick recap, we know the table contains 23 entries + 0x7E + 130922
which equals 131071
or 0x1FFFF
in hexadecimal. Now look at that, 0x1FFFF
is the exact value that is needed to subtract from Bx to obtain sBx.
Now that we know which register is what, we can have a look at the IF statements. One of the first thing I noticed is those IF statements almost never have an equal statement, instead, they seem to use different kinds of operands like 'less then' or 'greater then'. Basically, any operator that is not the equal operator will be used (if possible) to make reversing a bit harder, as we will see in the next part.
Lifting to Lua
This is it, this is what y'all been waiting for. For those who don't know yet, Lifting is basically the process of mapping one instruction set to another, and in our case, we will be lifting the Luraph instructions to the original Lua instructions. Before we can do this we must identify the Luraph instructions, this explains why the IF statements are obfuscated in the first place. Doing a simple RegEx to check the Luraph bytecode and then read the body of that IF statement to identify the corresponding Lua bytecode will be a bit harder to do. Not only do we have to figure out a way to identify the Luraph bytecode, but we also need to understand the actual functionality of the original Lua bytecodes before we can identify them.
This Lua 5.3 bytecode Reference, which is for Lua 5.3 (which isn't 5.1, I know), can be used to get a better understanding of how each Lua bytecode works. Please note that the reference listed is for Lua 5.3, lets not forget that Lua 5.1 was released on 21 Feb 2006, so finding fancy documentation isn't easy. Lucky for you, there is in fact a Lua Bytecode Interpreter project on Github, which was written for Lua 5.1, in Lua 5.1. The file src/lbi.lua
contains the Lua 5.1 bytecode interpreter at line 268, we can use this to manually see how Luraph is interpreting each instruction. Of cours, once we do a few instructions manual, we should jump into making a parser that can automaticlly identify the Luraph bytecode from the IF statement and then compare the body of that IF statement in order to lift the Luraph bytecode to a Lua bytecode.
Automated Lifting
Unfortunately, all the Luraph bytecodes change for every file it generates, meaning that we do have to identify all Luraph bytecodes again for every new Luraph script we want to reverse. Therefore I would like to automate the process. Not only do the bytecodes change, but the table indexing, (used for registers, constants, upvalues, etc.) may also change, which, makes it a little more difficult and very time-consuming to do manual.
Conclusion
We have analyzed a Luraph obfuscated script that is virtualizing custom/unknown bytecodes. Presumably, those bytecodes may have a different identifiers, but their functionalities will be equal to the original Lua bytecodes. After realizing that we concluded that we need to lift Luraph bytecode to Lua bytecode and then decompile them to get a somewhat human-readable Lua script.
In Part 2: Decompiling Lua, we will create a Lua decoder and decompiler that we will use to, first of all, get a better understanding of how compiled Lua works, and then continue to develop tools that we will be used to de-virtualize Luraph.
Next article: Part 2: Decompiling Lua
Have something to say?
Contact me at admin@ferib.befrom Hacker News https://ift.tt/i9bjsnC
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.