Links:

Reversing REvil Part 2: Reversing the File Encryption

This is part 2 of a 2 part series, where we reverse REvil:

Correcting Calling Conventions

Starting off I’ve noticed that Binary Ninja has misidentified some of the calling conventions.

Take sub_412aac as an example. Binary Ninja infers its calling convention as regparm, giving it this rather ugly definition:

PVOID __convention("regparm") sub_412aac(int32_t arg1, int32_t arg2, int32_t arg3, int32_t arg4, int32_t arg5, int32_t arg6, int32_t arg7, int32_t* arg8)

The HLIL looks messy, and closer inspection reveals that the first three arguments are never actually referenced inside the function:

The regparm convention expects the first three arguments to be passed in registers (eax, edx, ecx), with the rest on the stack. But looking at the call site:

There are 4 push instructions and no register setup whatsoever. This is a __stdcall function. All arguments on the stack, caller cleans up. Correcting the definition removes the ghost parameters entirely:

PVOID sub_412aac(int32_t arg4, int32_t arg5, int32_t arg6, int32_t arg7, int32_t* arg8)

With the calling convention fixed, the call sites clean up nicely:

Several other functions suffered from the same misidentification, all corrected before moving on.

Analyzing The Encryption Functionality

Locating the encryption entry point is straightforward: we found the string: "start encrypt files". This leads us directly to mw_encrypt_files at 0x0041004b.

Opening the HLIL, something immediately stands out. Large parts of code are greyed out, as if Binary Ninja believes they’re dead code:

That’s rarely correct. Switching to the assembly view reveals the truth:

Binary Ninja is treating a struct as raw memory offsets, so the disassembler loses track of the control flow. Defining the struct manually resolves it:

struct mw_enc_context __packed
{
    HANDLE hFindFile;
    int32_t (__stdcall* sub_40f90b)(int16_t* arg1, int16_t* arg2);
    int32_t (__stdcall* sub_41075e)(int16_t* arg1, int32_t arg2, int32_t arg3, int32_t arg4);
    uint32_t unk_0C;
    void* iocp_handle;
    uint32_t unk_14;
    uint32_t unk_18;
    uint32_t unk_1C;
    uint32_t n_files_to_encrypt;
    uint32_t unk_24;
    int32_t (__stdcall* sub_40f8f8)(char* arg1);
    int32_t (__stdcall* sub_4106dc)(int32_t* arg1, char* arg2, int32_t arg3, int32_t arg4);
};

Applying the struct definition transforms the HLIL from a mess into readable code:

Analyzing Struct Members

With the struct in place, we can start identifying each field by observing how it’s used at runtime.

unk_00HANDLE hFindFile: This field is passed as the first argument to FindNextFileW, making it clearly the search handle used during file enumeration.

unk_20uint32_t n_files_to_encrypt: The field appears in a loop termination condition:

if (mw_encrypted_file_count u>= mw_enc_context.unk_20)
    break;

This is clearly the total file count.

sub_40f90bmw_check_is_whitelist_folder: Examining this function, we find two decrypted strings: "program files" and "program files (x86)". The function compares the provided path against these strings. A simple whitelist check to avoid encrypting installed programs and bricking the victim’s system entirely:

Updated definition:

bool (__stdcall* mw_check_is_whitelist_folder)(char* path, char* folder_name)

sub_41075emw_is_extension_whitelisted: This function normalizes the file path to lowercase, performs what appears to be a drive check (likely filtering for C:\), and then extracts the file extension and checks it against a hash. The hashing algorithm is djb2. This will skip executable files:

Updated definition:

bool (__stdcall* mw_is_extension_whitelisted)(LPCWSTR pszPath, char* folder_name, int32_t arg3);

sub_40f8f8mw_enc_make_note: Debug strings reveal this function logs whitelisted files and folders as they’re skipped during enumeration:

int16_t* (__stdcall* mw_enc_make_note)(char* fileName, PVOID path, int32_t arg3);

unk_0Cchar* fileName: This field is passed as the fileName argument to mw_enc_make_note, confirming it holds the current file name being processed.

sub_4106dcmw_enc_add: Debug output shows this function submits a file to the encryption queue:

int32_t mw_enc_add(void* iocp_handle, char* path, char* folder_name, int32_t arg4)

With almost all members accounted for, the final struct definition is:

struct mw_enc_context __packed
{
    HANDLE hFindFile;
    bool (__stdcall* mw_check_is_whitelist_folder)(char* path, char* folder_name);
    int32_t (__stdcall* mw_is_exe)(LPCWSTR pszPath, int32_t `char* folder_name`, int32_t arg3);
    char* fileName;
    void* iocp_handle;
    uint32_t unk_14;
    uint32_t mw_n_whitelisted;
    uint32_t unk_1C;
    uint32_t n_files_to_encrypt;
    uint32_t unk_24;
    int32_t (__stdcall* mw_enc_make_note)(char* fileName, PVOID path, int32_t);
    int32_t (__stdcall* mw_enc_add)(void* iocp_handle, char* path, char* folder_name, int32_t arg4);
};

Routine Analysis of mw_encrypt_files

With the struct resolved, the high-level flow of mw_encrypt_files becomes clear:

  1. Spawns the worker thread: mw_IOCP_enc_worker is launched immediately. This thread, backed by an I/O Completion Port (IOCP), is responsible for the actual work: reading files from disk, encrypting them, and writing the ciphertext back.
  2. Builds the target file list: The malware enumerates both local drives and network shares, walking the directory tree recursively. Any path that passes the whitelist checks gets added to a linked list. This list is the encryption work queue consumed by mw_IOCP_enc_worker.

The worker thread runs a loop conditioned by mw_is_file_encrypted, pulling items from the queue and passing them through four sequential functions: sub_414e06, sub_410482, sub_4104ef, and sub_40fea5. These almost certainly correspond to the four stages of REvil’s encryption process: key generation, file reading, encryption, and output. Though this is speculation.

Understanding the Encryption

At this point, fully reversing REvil’s cryptographic implementation is beyond the scope of my skills. For a deep dive these are excellent resources: