Using Speakeasy Emulation Framework Programmatically to Unpack Malware

Read the original article: Using Speakeasy Emulation Framework Programmatically to Unpack Malware


Andrew
Davis
recently announced
the public release
of his new Windows emulation framework named
Speakeasy. While
the introductory blog post focused on using Speakeasy as an automated
malware sandbox of sorts, this entry will highlight another powerful
use of the framework: automated malware unpacking. I will demonstrate,
with code examples, how Speakeasy can be used programmatically to:

  • Bypass unsupported Windows APIs to continue emulation and
    unpacking
  • Save virtual addresses of dynamically allocated
    code using API hooks
  • Surgically direct execution to key
    areas of code using code hooks
  • Dump an unpacked PE from
    emulator memory and fix its section headers
  • Aid in
    reconstruction of import tables by querying Speakeasy for symbolic
    information

Initial Setup

One approach to interfacing with Speakeasy is to create a subclass
of Speakeasy’s Speakeasy class. Figure 1 shows a Python code
snippet that sets up such a class that will be expanded in upcoming examples.

import speakeasy

class
MyUnpacker(speakeasy.Speakeasy):
    def
__init__(self, config=None):
       
super(MyUnpacker, self).__init__(config=config)

Figure 1: Creating a Speakeasy subclass

The code in Figure 1 accepts a Speakeasy configuration dictionary
that may be used to override the default configuration. Speakeasy
ships with several
configuration files
. The Speakeasy
class is a wrapper class for an underlying emulator class. The
emulator class is chosen automatically when a binary is loaded based
on its PE headers or is specified as shellcode. Subclassing Speakeasy makes it easy to access, extend, or
modify interfaces. It also facilitates reading and writing stateful
data before, during, and after emulation.

Emulating a Binary

Figure 2 shows how to load a binary into the Speakeasy emulator.

self.module =
self.load_module(filename)

Figure 2: Loading the binary into the emulator

The load_module function returns a PeFile object for the provided binary on disk. It
is an instance of the PeFile class defined
in speakeasy/windows/common.py, which is
subclassed from pefile’s
PE class. Alternatively, you can provide
the bytes of a binary using the data
parameter rather than specifying a file name. Figure 3 shows how to
emulate a loaded binary.

self.run_module(self.module)

Figure 3: Starting emulation

API Hooks

The Speakeasy framework ships with support for hundreds of Windows
APIs with more being added frequently. This is accomplished via Python
API handlers defined in appropriate files in the speakeasy/winenv/api directory. API
hooks can be installed to have your own code executed when
particular APIs are called during emulation. They can be installed for
any API, regardless of whether a handler exists or not. An API hook
can be used to override an existing handler and that handler can
optionally be invoked from your hook. The API hooking mechanism in
Speakeasy provides flexibility and control over emulation. Let’s
examine a few uses of API hooking within the context of emulating
unpacking code to retrieve an unpacked payload.

Bypassing Unsupported APIs

When Speakeasy encounters an unsupported Windows API call, it stops
emulation and provides the name of the API function that is not
supported. If the API function in question is not critical for
unpacking the binary, you can add an API hook that simply returns a
value that allows execution to continue. For example, a recent
sample’s unpacking code contained API calls that had no effect on the
unpacking process. One such API call was to GetSysColor. In order to bypass this call and
allow execution to continue, an API hook may be added as shown in
Figure 4.

self.add_api_hook(self.getsyscolor_hook,
 
                ‘user32’,
                 
‘GetSysColor’,
                  argc=1
       
          )

Figure 4: Adding an API hook

According to MSDN,
this function takes 1 parameter and returns an RGB color value
represented as a DWORD. If the calling
convention for the API function you are hooking is not stdcall, you can specify the calling convention in
the optional call_conv parameter. The
calling convention constants are defined in the speakeasy/common/arch.py file. Because the
GetSysColor return value does not impact
the unpacking process, we can simply return 0. Figure 5 shows the definition of the getsyscolor_hook function specified in Figure 4.

def getsyscolor_hook(self, emu, api_name,
func, params):
            return 0

Figure 5: The GetSysColor hook returns 0

If an API function requires more finessed handling, you can
implement a more specific and meaningful hook that suits your needs.
If your hook implementation is robust enough, you might consider
contributing it to the Speakeasy project as an API handler!  

Adding an API Handler

Within the speakeasy/winenv/api directory you’ll find
usermode and kernelmode subdirectories that contain Python
files for corresponding binary modules. These files contain the API
handlers for each module. In usermode/kernel32.py, we see a handler defined for
SetEnvironmentVariable as shown in Figure 6.

1: @apihook(‘SetEnvironmentVariable’,
argc=2)
2: def SetEnvironmentVariable(self, emu, argv,
ctx={}):
3:     ”’
4:     BOOL
SetEnvironmentVariable(
5:         LPCTSTR
lpName,
6:         LPCTSTR lpValue
7:        
);
8:     ”’
9:     lpName, lpValue =
argv
10:    cw = self.get_char_width(ctx)
11:   
if lpName and lpValue:
12:        name =
self.read_mem_string(lpName, cw)
13:        val =
self.read_mem_string(lpValue, cw)
14:        argv[0] =
name
15:        argv[1] = val
16:       
emu.set_env(name, val)
17:    return True

Figure 6: API handler for SetEnvironmentVariable

A handler begins with a function decorator (line 1) that defines the
name of the API and the number of parameters it accepts. At the start
of a handler, it is good practice to include MSDN’s documented
prototype as a comment (lines 3-8).

The handler’s code begins by storing elements of the argv parameter in variables named after their
corresponding API parameters (line 9). The handler’s ctx parameter is a dictionary that contains
contextual information about the API call. For API functions that end
in an ‘A’ or ‘W
(e.g., CreateFileA), the character width can
be retrieved by passing the ctx parameter to
the get_char_width function (line 10). This
width value can then be passed to calls such as read_mem_string (lines 12 and 13), which reads the
emulator’s memory at a given address and returns a string.

It is good practice to overwrite string pointer values in the argv parameter with their corresponding string
values (lines 14 and 15). This enables Speakeasy to display string
values instead of pointer values in its API logs. To illustrate the
impact of updating argv values, examine the
Speakeasy output shown in Figure 7. In the VirtualAlloc entry, the symbolic constant string
PAGE_EXECUTE_READWRITE replaces the value
0x40. In the GetModuleFileNameA and CreateFileA entries, pointer values are replaced
with a file path.

KERNEL32.VirtualAlloc(0x0, 0x2b400, 0x3000,
"PAGE_EXECUTE_READWRITE") -> 0x7c000
KERNEL32.GetModuleFileNameA(0x0,
"C:\\Windows\\system32\\sample.exe", 0x104) ->
0x58
KERNEL32.CreateFileA("C:\\Windows\\system32\\sample.exe",
"GENERIC_READ", 0x1, 0x0, "OPEN_EXISTING",
0x80, 0x0) -> 0x84

Figure 7: Speakeasy API logs

Saving the Unpacked Code Address

Packed samples often use functions such as VirtualAlloc to allocate memory used to store the
unpacked sample. An effective approach for capturing the location and
size of the unpacked code is to first hook the memory allocation
function used by the unpacking stub. Figure 8 shows an example of
hooking VirtualAlloc to capture the virtual
address and amount of memory being allocated by the API call.

1: def virtualalloc_hook(self, emu,
api_name, func, params):
2:     ”’
3:    
LPVOID VirtualAlloc(
4:        LPVOID
lpAddress,
5:        SIZE_T dwSize,
6:       
DWORD  flAllocationType,
7:        DWORD 
flProtect
8:      );
9:     ”’
10:   
PAGE_EXECUTE_READWRITE = 0x40
11:    lpAddress,
dwSize, flAllocationType, flProtect = params
12:   
rv = func(params)
13:    if lpAddress == 0 and
flProtect == PAGE_EXECUTE_READWRITE:
14:       
self.logger.debug("[*] unpack stub VirtualAlloc call,
saving dump info")
15:        self.dump_addr =
rv
16:        self.dump_size = dwSize

17:    return rv

Figure 8: VirtualAlloc hook to save memory dump information

The hook in Figure 8 calls Speakeasy’s API handler for VirtualAlloc on line 12 to allow memory to be
allocated. The virtual address returned by the API handler is saved to
a variable named rv. Since VirtualAlloc may be used to allocate memory not
related to the unpacking process, additional checks are used on line
13 to confirm the intercepted VirtualAlloc
call is the one used in the unpacking code. Based on prior analysis,
we’re looking for a VirtualAlloc call that
receives the lpAddress value 0 and the flProtect
value PAGE_EXECUTE_READWRITE (0x40). If these arguments are present, the virtual
address and specified size are stored on lines 15 and 16 so they may
be used to extract the unpacked payload from memory after the
unpacking code is finished. Finally, on line 17, the return value from
the VirtualAlloc handler is returned by the hook.

Surgical Code Emulation Using API and Code Hooks

Speakeasy is a robust emulation framework; however, you may
encounter binaries that have large sections of problematic code. For
example, a sample may call many unsupported APIs or simply take far
too long to emulate. An example of overcoming both challenges is
described in the following scenario.

Unpacking Stubs Hiding in MFC Projects

A popular technique used to disguise malicious payloads involves
hiding them inside a large, open-source MFC project. MFC is short for
Microsoft
Foundation Class
, which is a popular library used to build
Windows desktop applications. These MFC projects are often arbitrarily
chosen from popular Web sites such as Code Project. While the MFC
library makes it easy to create desktop applications, MFC applications
are difficult to reverse engineer due to their size and complexity.
They are particularly difficult to emulate due to their large
initialization routine that calls many different Windows APIs. What
follows is a description of my experience with writing a Python script
using Speakeasy to automate unpacking of a custom packer that hides
its unpacking stub within an MFC project.

Reverse engineering the packer revealed the unpacking stub is
ultimately called during initialization of the CWinApp object, which occurs after initialization
of the C runtime and MFC. After attempting to bypass unsupported APIs,
I realized that, even if successful, emulation would take far too long
to be practical. I considered skipping over the initialization code
completely and ju

[…]


Read the original article: Using Speakeasy Emulation Framework Programmatically to Unpack Malware