Reversing REvil Part 1: Deobfuscating the Binary
Table of Contents
This is part 1 of a 2 part series, where we reverse REvil:
REvil is malware, and in this blog we’ll be resolving the dynamic IAT table, as well as decrypting strings, making the binary easier to reverse. I’ve chose reversing REvil while it has debug information inside the binary, making it easier for a beginner as myself. You can find the sample here: REvil_malware.bin SHA-256 hash: 0dab0428b414b0440288a12fbc20dab72339ef72ff5859e8c18d76dd8b169f50
Starting off by loading the binary into Binary Ninja, we’ll quickly notice in the Triage view, that not many imports is made, which is is a clue that the binary might be using a dynamic IAT. Another smoking gun that the binary is using a dynamic IAT is the invoking of data, like in the picture below:
In the picture we see that the function sub_4111ef, quickly after the prototype, pushes two arguments to the stack, followed by a call to data_41fe00. At data_41fe00 the data 0x42a2897c resides, which is seemingly invalid. This invoking of data is an indication of the binary using dynamic IAT.
Taking a look at the strings view in Binary Ninja, we also discover that there are no meaningful valid strings, leading us to suspect the strings is encrypted.
These are the two first challenges we’ll be solving in this blog.
Decrypting the Strings
Starting off with decrypting the strings. We’ll start by going to the entrypoint _start, where we’ll find the malware’s main function: sub_411219, from there i opened up the first function which is called: sub_415718. From here i hopped into the function sub_4145ef
It was here i found the decryption function: sub_413185. sub_413185 stuck out, while all the calls was referencing 0x41ff48, with different arguments. Inspecting the data located at 0x41ff48, we found what looked like encrypted data. Inspecting what sub_413185 was doing, we found only a function call to sub_414420:
Meaning this sub_413185 was likely a wrapper for the decrypt function: sub_414420. After taking a look at sub_414420, i’ve identified the encryption method to be the RC4 algorithm:
With this fully annotated and analyzed we realize that the outer wrapper function is actually an wrapper for decrypting an array of encrypted data:
Taking a look at the cross references for both mw_RC4_decrypt_array and mw_RC4_decrypt_str we realize most of the time the malware is using the array, rather than decrypting a string. Array meaning in this sense that there is a main data entry either 0x41f060 or 0x41ff48, and from there the decrypt array function would get parsed in a key index as well as the lenght of the key and data:
mw_RC4_decrypt_array(lpData: 0x41f060, key_index: 0x394, key_len: 5, data_len: 0x124, result: &result)
Automating the Decryption
Okay now we know how this encryption works, we’ll start automating:
TARGET_ADDR = 0x00413185 # RC4 decrypt function addr (mw_RC4_decrypt_array)
def safe_decode(buf):
if not buf:
return None
if len(buf) >= 2 and all(buf[i] == 0x00 for i in range(1, min(len(buf), 32), 2)):
return buf.decode("utf-16le").rstrip("\x00")
return buf.decode("utf-8", errors="replace").rstrip("\x00")
def decode_rc4_call(instr):
args = instr.params
lpData = args[0].constant
key_index = args[1].constant
key_len = args[2].constant
data_len = args[3].constant
key_addr = lpData + key_index
data_addr = key_addr + key_len
key = bv.read(key_addr, key_len)
data = bv.read(data_addr, data_len)
if not key or not data:
return None
decoded = Transform["RC4"].decode(data, {"key": key})
return safe_decode(decoded)
def annotate_rc4_calls():
refs = bv.get_code_refs(TARGET_ADDR)
if not refs:
print(f"[!] No code references found for target address: 0x{instr.address:X}")
return
for ref in refs:
func = bv.get_function_at(ref.function.start)
if not func:
continue
mlil = func.medium_level_il
if not mlil:
continue
for instr in mlil.instructions:
if instr.address != ref.address:
continue
if instr.operation != MediumLevelILOperation.MLIL_CALL:
continue
decoded = decode_rc4_call(instr)
if not decoded:
continue
bv.set_comment_at(instr.address, f"\"{decoded}\"")
print(f"Annotated 0x{instr.address:X}: {decoded}")
annotate_rc4_calls()
This simple python script i’ve wrote loops over all refeence to the mw_RC4_decrypt_array function, then it uses the internal api Transform["RC4"].decode provided by binary ninja, and lastly it will decode the formatting, and apply a comment at the call:
Resolving the Dynamic IAT
While inspecting the decrypted strings, these strings in particular was interesting for me. We know the binary uses a dynamic IAT loader, and the names of DLLs could be relevant.
Annotated 0x414AB1: user32.dll
Annotated 0x414B23: winmm.dll
Annotated 0x414A78: shlwapi.dll
Annotated 0x414AEA: winhttp.dll
Annotated 0x41478A: gdi32.dll
Annotated 0x4147CE: mpr.dll
Annotated 0x414844: oleaut32.dll
Annotated 0x414A3F: shell32.dll
Annotated 0x414718: advapi32.dll
Annotated 0x414751: crypt32.dll
Annotated 0x414813: ole32.dll
Inspecting further, checking at some of the addresses, we’ll see they all has the same structure:
They start off by decrypting the name of the dll, then they call sub_4148e0 with a constant: 0x417c8ab1, and then call the eax, then return.
Going one layer up, to where all these functions are invoked from we find they all originate from: sub_4148e0, which after analyzing looks like:
So the function takes a hash, which it formats, and finally derive the dll hash from it. Which is uses to get a ptr to the dll DOS header.
Afterwards it finds the EAT Export Address Table which was recognized by looking in 010 Editor at the offset in a dll, which lead us to EAT:
The EAT contains the exported function names from given dll, which the malware walks to find the correct function name, if it’s found it’ll return the address of the function, and now the malware has turned the hash to a function addr, which it effectively can call.
Looking at the cross references for the function i’ve called mw_resolve_import, we see it’s invoked from a loop:
It loops over 0x41fc88 using the esi regiser as it’s index, pushes current value onto the stack, then calls mw_resolve_import, which returns the address of a the found function, and stores it at esi+0x41fc88, increment esi by 4, then repeat until esi is 0x274. From this we can reason that 0x41fc88 holds function hashes which will be imported, so it’s essentially a custom IAT.
Resolving the IAT
Next step resolving every hash in the IAT. For this we’ll use c++, while it’s the most straight forward. Below you’ll see the quick and dirty code i wrote:
#include <windows.h>
#include <cstdint>
#include <cstdio>
#include <vector>
#include <string>
uint32_t mw_iat_hashing(const char* name)
{
uint32_t result = 0x2B;
while (*name)
{
result = result * 0x10F + static_cast<uint8_t>(*name);
++name;
}
return result;
}
uint32_t mw_format_iat_hash(uint32_t h)
{
return ((h ^ 0x5167) << 16) ^ h ^ 0x13B6;
}
const char* select_dll(uint32_t formatted)
{
switch (formatted >> 21)
{
case 0x4D5: return "kernel32.dll";
case 0x4E1: return "user32.dll";
case 0x5B9: return "crypt32.dll";
case 0x69B: return "ole32.dll";
case 0x6B2: return "shell32.dll";
case 0x7BA: return "shlwapi.dll";
case 0x3C2: return "oleaut32.dll";
case 0x32: return "winmm.dll";
case 0x25C: return "winhttp.dll";
case 0x27A: return "mpr.dll";
case 0x2AA: return "gdi32.dll";
case 0x2B0: return "advapi32.dll";
case 0x39B: return "ntdll.dll";
default: return nullptr;
}
}
bool resolve_from_dll(const char* dll_name, uint32_t target_hash, std::string& out_name)
{
HMODULE mod = LoadLibraryExA(dll_name, nullptr, DONT_RESOLVE_DLL_REFERENCES);
if (!mod)
return false;
IMAGE_DOS_HEADER* dos = (IMAGE_DOS_HEADER*)mod;
IMAGE_NT_HEADERS* nt = (IMAGE_NT_HEADERS*)((uint8_t*)mod + dos->e_lfanew);
_IMAGE_DATA_DIRECTORY& dir = nt->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
if (!dir.VirtualAddress)
return false;
IMAGE_EXPORT_DIRECTORY* exp = (IMAGE_EXPORT_DIRECTORY*)((uint8_t*)mod + dir.VirtualAddress);
DWORD* names = (DWORD*)((uint8_t*)mod + exp->AddressOfNames);
for (DWORD i = 0; i < exp->NumberOfNames; i++)
{
const char* fname = (const char*)mod + names[i];
if ((mw_iat_hashing(fname) & 0x1FFFFF) == target_hash)
{
out_name = fname;
return true;
}
}
return false;
}
static const uint32_t mw_IAT[]
{
0x66fe62c3,
0xb4499641,
0xdb45dc2d,
0xd1551a97,
0x60a5ade8,
// ...
};
int main()
{
size_t count = sizeof(mw_IAT) / sizeof(mw_IAT[0]);
printf("struct mw_IAT __packed\n");
printf("{\n");
for (size_t i = 0; i < count; i++)
{
uint32_t raw = mw_IAT[i];
uint32_t formatted = mw_format_iat_hash(raw);
uint32_t target = formatted & 0x1FFFFF;
const char* dll = select_dll(formatted);
std::string name = "UNRESOLVED";
if (dll)
resolve_from_dll(dll, target, name);
printf("void* %s;\n", name.c_str());
}
printf("};\n");
return 0;
}
This generated a struct we can apply at 0x41fc88 in binary ninja:
struct mw_IAT __packed
{
void* DeleteObject;
void* NtOpenFile;
void* CryptAcquireContextW;
void* CreateThread;
void* GetForegroundWindow;
void* ExitProcess;
void* Wow64DisableWow64FsRedirection;
void* WideCharToMultiByte;
void* NtQueryInformationFile;
void* GlobalAlloc;
void* CompareFileTime;
void* RegCreateKeyExW;
void* RegQueryValueExW;
void* GetModuleFileNameW;
void* HeapDestroy;
void* RtlGetLastWin32Error;
void* OpenSCManagerW;
void* NtClose;
void* WNetEnumResourceW;
void* WinHttpQueryDataAvailable;
void* GetCurrentProcess;
void* CreateStreamOnHGlobal;
void* MapViewOfFile;
void* ControlService;
void* CoInitializeSecurity;
void* SelectObject;
void* RegCloseKey;
void* CreateCompatibleDC;
void* CreateFileW;
void* CloseServiceHandle;
void* FillRect;
void* DeleteService;
void* SetPixel;
void* ImpersonateLoggedOnUser;
void* MoveFileExW;
void* CreateIoCompletionPort;
void* DeleteDC;
void* CreateFileMappingW;
void* FreeSid;
void* SystemTimeToFileTime;
void* GetDeviceCaps;
void* SetBkColor;
void* WinHttpConnect;
void* GetKeyboardLayoutList;
void* GetUserDefaultUILanguage;
void* SystemParametersInfoW;
void* Wow64RevertWow64FsRedirection;
void* VariantClear;
void* OpenProcessToken;
void* GlobalFree;
void* DeleteFileW;
void* GetDriveTypeW;
void* PostQueuedCompletionStatus;
void* WinHttpSetOption;
void* FindNextFileW;
void* SetFilePointerEx;
void* GetDiskFreeSpaceExW;
void* CreateFontW;
void* EnumServicesStatusExW;
void* GetProcAddress;
void* GetTempPathW;
void* SetErrorMode;
void* IsValidSid;
void* OpenServiceW;
void* StrToIntW;
void* WinHttpReadData;
void* ReleaseMutex;
void* RtlAllocateHeap;
void* SysFreeString;
void* LocalFree;
void* GetUserNameW;
void* RtlFreeHeap;
void* GetSystemInfo;
void* SHDeleteKeyW;
void* GetFileAttributesW;
void* CoUninitialize;
void* WinHttpCrackUrl;
void* SetBkMode;
void* AllocateAndInitializeSid;
void* RtlDeleteCriticalSection;
void* FindClose;
void* GetCommandLineW;
void* HeapCreate;
void* GetSystemDirectoryW;
void* RtlInitializeCriticalSection;
void* Process32FirstW;
void* LocalAlloc;
void* RevertToSelf;
void* DrawTextW;
void* GetObjectW;
void* CommandLineToArgvW;
void* CryptStringToBinaryW;
void* WaitForSingleObject;
void* GetComputerNameW;
void* OpenProcess;
void* _snwprintf;
void* GetSystemDefaultUILanguage;
void* SetThreadExecutionState;
void* RtlTimeToTimeFields;
void* MulDiv;
void* GetTokenInformation;
void* GetFileSizeEx;
void* GetWindowsDirectoryW;
void* RtlInitUnicodeString;
void* CloseHandle;
void* OpenMutexW;
void* MultiByteToWideChar;
void* CoInitializeEx;
void* WinHttpCloseHandle;
void* GetNativeSystemInfo;
void* MoveFileW;
void* PathFindExtensionW;
void* GetFileAttributesExW;
void* RtlLeaveCriticalSection;
void* GetDC;
void* ReadFile;
void* GetFileSize;
void* FindFirstFileW;
void* WinHttpOpenRequest;
void* WinHttpOpen;
void* UnmapViewOfFile;
void* CreateCompatibleBitmap;
void* GetStockObject;
void* ReleaseDC;
void* timeBeginPeriod;
void* SysAllocString;
void* GetVolumeInformationW;
void* WinHttpQueryHeaders;
void* TerminateProcess;
void* GetQueuedCompletionStatus;
void* WinHttpReceiveResponse;
void* RtlEnterCriticalSection;
void* timeGetTime;
void* RegSetValueExW;
void* GetDIBits;
void* CheckTokenMembership;
void* ShellExecuteExW;
void* SHDeleteValueW;
void* CreateProcessW;
void* WriteFile;
void* CoCreateInstance;
void* GetCurrentProcessId;
void* Sleep;
void* CreateMutexW;
void* SetFileAttributesW;
void* CreateToolhelp32Snapshot;
void* WNetOpenEnumW;
void* CryptBinaryToStringW;
void* wsprintfW;
void* WNetCloseEnum;
void* GetProcessHeap;
void* SetTextColor;
void* WinHttpSendRequest;
void* VirtualAlloc;
void* Process32NextW;
void* RegOpenKeyExW;
void* CryptGenRandom;
};
Before:
After:
This makes it way easier to statically analyze this binary.
Extracting Shell code
With the IAT resolved and strings decrypted, I started looking through the binary. We quickly noticed there was still some encrypted data, except it didn’t use the mw_RC4_decrypt_array function. It was using the inner function mw_RC4_decrypt_str, which indicated that this was separate from the other strings. We can quickly identify what the strings likely are, thanks to the debugging information still left in the binary:
To analyze these two payloads, we can decrypt them with the following Python code:
data = bv.read(0x00404600, 0x9600)
key = bv.read(0x0041fefc, 0x1b)
RC4 = Transform["RC4"]
data = RC4.decode(data, {"key": key})
with open("mw_enc_shell_priv_esc_64.bin", "wb") as f:
f.write(data)
data = bv.read(0x00401000, 0x3600)
key = bv.read(0x0041ff18, 0x2f)
RC4 = Transform["RC4"]
data = RC4.decode(data, {"key": key})
with open("mw_enc_shell_priv_esc_32.bin", "wb") as f:
f.write(data)
Now they’re ready to be analyzed. While I won’t be covering that here, it might be something we dig into in a future post.
Check for RU keyboard
While looking through the binary, we discovered that sub_412681 checks for a Russian keyboard layout. I find this interesting. If the user has a Russian keyboard layout installed on Windows, the malware will not execute. Because of this, it might actually be a good idea to have a Russian keyboard layout installed on Windows, even if you never plan to use it.
Next blog
In the next post, we’ll be digging into the file encryption routine and reversing it, and hopefully develop a tool that can revert the encryption.