Reversing REvil Part 2: Reversing the File Encryption
Table of Contents
This is part 2 of a 2 part series, where we reverse REvil:
Correcting Calling Conventions
Starting off I’ve noticed that Binary Ninja has misidentified some of the calling conventions.
Take sub_412aac as an example. Binary Ninja infers its calling convention as regparm, giving it this rather ugly definition:
PVOID __convention("regparm") sub_412aac(int32_t arg1, int32_t arg2, int32_t arg3, int32_t arg4, int32_t arg5, int32_t arg6, int32_t arg7, int32_t* arg8)
The HLIL looks messy, and closer inspection reveals that the first three arguments are never actually referenced inside the function:
The regparm convention expects the first three arguments to be passed in registers (eax, edx, ecx), with the rest on the stack. But looking at the call site:
There are 4 push instructions and no register setup whatsoever. This is a __stdcall function. All arguments on the stack, caller cleans up. Correcting the definition removes the ghost parameters entirely:
PVOID sub_412aac(int32_t arg4, int32_t arg5, int32_t arg6, int32_t arg7, int32_t* arg8)
With the calling convention fixed, the call sites clean up nicely:
Several other functions suffered from the same misidentification, all corrected before moving on.
Analyzing The Encryption Functionality
Locating the encryption entry point is straightforward: we found the string: "start encrypt files". This leads us directly to mw_encrypt_files at 0x0041004b.
Opening the HLIL, something immediately stands out. Large parts of code are greyed out, as if Binary Ninja believes they’re dead code:
That’s rarely correct. Switching to the assembly view reveals the truth:
Binary Ninja is treating a struct as raw memory offsets, so the disassembler loses track of the control flow. Defining the struct manually resolves it:
struct mw_enc_context __packed
{
HANDLE hFindFile;
int32_t (__stdcall* sub_40f90b)(int16_t* arg1, int16_t* arg2);
int32_t (__stdcall* sub_41075e)(int16_t* arg1, int32_t arg2, int32_t arg3, int32_t arg4);
uint32_t unk_0C;
void* iocp_handle;
uint32_t unk_14;
uint32_t unk_18;
uint32_t unk_1C;
uint32_t n_files_to_encrypt;
uint32_t unk_24;
int32_t (__stdcall* sub_40f8f8)(char* arg1);
int32_t (__stdcall* sub_4106dc)(int32_t* arg1, char* arg2, int32_t arg3, int32_t arg4);
};
Applying the struct definition transforms the HLIL from a mess into readable code:
Analyzing Struct Members
With the struct in place, we can start identifying each field by observing how it’s used at runtime.
unk_00 → HANDLE hFindFile: This field is passed as the first argument to FindNextFileW, making it clearly the search handle used during file enumeration.
unk_20 → uint32_t n_files_to_encrypt: The field appears in a loop termination condition:
if (mw_encrypted_file_count u>= mw_enc_context.unk_20)
break;
This is clearly the total file count.
sub_40f90b → mw_check_is_whitelist_folder: Examining this function, we find two decrypted strings: "program files" and "program files (x86)". The function compares the provided path against these strings. A simple whitelist check to avoid encrypting installed programs and bricking the victim’s system entirely:
Updated definition:
bool (__stdcall* mw_check_is_whitelist_folder)(char* path, char* folder_name)
sub_41075e → mw_is_extension_whitelisted: This function normalizes the file path to lowercase, performs what appears to be a drive check (likely filtering for C:\), and then extracts the file extension and checks it against a hash. The hashing algorithm is djb2. This will skip executable files:
Updated definition:
bool (__stdcall* mw_is_extension_whitelisted)(LPCWSTR pszPath, char* folder_name, int32_t arg3);
sub_40f8f8 → mw_enc_make_note: Debug strings reveal this function logs whitelisted files and folders as they’re skipped during enumeration:
int16_t* (__stdcall* mw_enc_make_note)(char* fileName, PVOID path, int32_t arg3);
unk_0C → char* fileName: This field is passed as the fileName argument to mw_enc_make_note, confirming it holds the current file name being processed.
sub_4106dc → mw_enc_add: Debug output shows this function submits a file to the encryption queue:
int32_t mw_enc_add(void* iocp_handle, char* path, char* folder_name, int32_t arg4)
With almost all members accounted for, the final struct definition is:
struct mw_enc_context __packed
{
HANDLE hFindFile;
bool (__stdcall* mw_check_is_whitelist_folder)(char* path, char* folder_name);
int32_t (__stdcall* mw_is_exe)(LPCWSTR pszPath, int32_t `char* folder_name`, int32_t arg3);
char* fileName;
void* iocp_handle;
uint32_t unk_14;
uint32_t mw_n_whitelisted;
uint32_t unk_1C;
uint32_t n_files_to_encrypt;
uint32_t unk_24;
int32_t (__stdcall* mw_enc_make_note)(char* fileName, PVOID path, int32_t);
int32_t (__stdcall* mw_enc_add)(void* iocp_handle, char* path, char* folder_name, int32_t arg4);
};
Routine Analysis of mw_encrypt_files
With the struct resolved, the high-level flow of mw_encrypt_files becomes clear:
- Spawns the worker thread:
mw_IOCP_enc_workeris launched immediately. This thread, backed by an I/O Completion Port (IOCP), is responsible for the actual work: reading files from disk, encrypting them, and writing the ciphertext back. - Builds the target file list: The malware enumerates both local drives and network shares, walking the directory tree recursively. Any path that passes the whitelist checks gets added to a linked list. This list is the encryption work queue consumed by
mw_IOCP_enc_worker.
The worker thread runs a loop conditioned by mw_is_file_encrypted, pulling items from the queue and passing them through four sequential functions: sub_414e06, sub_410482, sub_4104ef, and sub_40fea5. These almost certainly correspond to the four stages of REvil’s encryption process: key generation, file reading, encryption, and output. Though this is speculation.
Understanding the Encryption
At this point, fully reversing REvil’s cryptographic implementation is beyond the scope of my skills. For a deep dive these are excellent resources: