Read the original article: WOW64!Hooks: WOW64 Subsystem Internals and Hooking Techniques
Microsoft is known for their backwards compatibility. When they
rolled out the 64-bit variant of Windows years ago they needed to
provide compatibility with existing 32-bit applications. In order to
provide seamless execution regardless of application bitness, the WoW
(Windows on Windows) system was coined. This layer, which will be
referred to as ‘WOW64’ from here on out, is responsible for
translating all Windows API calls from 32-bit userspace to the 64-bit
operating system kernel. This blog post is broken up into two
sections. First we start by diving deep into the WOW64 system. To do
this, we trace a call from 32-bit userspace and follow the steps it
takes to finally transition to the kernel. The second part of the post
assesses two hooking techniques and their effectiveness. I will cover
how this system works, the ways malware abuses it, and detail a
mechanism by which all WoW syscalls can be hooked from userspace. Note
that all information here is true as of Windows 10, version 2004 and
in some cases has changed from how previous Windows versions
were implemented.
Recognition
First and foremost, this is a topic which has existing research by
multiple authors. This work was critical in efficient exploration of
the internals and research would have taken much longer had these
authors not publicly posted their awesome work. I would like to
callout the following references:
- (wbenny):
An extremely detailed view of WOW64 internals on ARM - (ReWolf): A PoC heaven’s
gate implementation - (JustasMasiulis):
A very clean C++ heaven’s gate implementation - (MalwareTech):
A WOW64 segmentation explanation
WOW64 Internals
To understand how the WOW64 system works internally we will explore
the call sequence starting in 32-bit usermode before transitioning
into the kernel from within a system DLL. Within these system DLLs the
operating system will check arguments and eventually transition to a
stub known as a syscall stub. This syscall stub is responsible for
servicing the API call in the kernel. On a 64-bit system, the syscall
stub is straightforward as it directly executes the syscall
instruction as shown in Figure 1.
Figure 1: Native x64 Syscall Stub
Figure 2 shows a syscall stub for a 32-bit process running on WOW64
Figure 2: WOW64 Syscall Stub
Notice that instead of a syscall instruction in the WOW64 version,
Wow64SystemServiceCall is called. In the
WOW64 system what would normally be an entry into the kernel is
instead replaced by a call to a usermode routine. Following this Wow64SystemServiceCall, we can see in Figure 3
that it immediately performs an indirect jmp through a pointer named
Wow64Transition.
Figure 3: Wow64SystemService transitions
through a pointer ‘Wow64Transition’
Note that the Wow64SystemServiceCall
function is found within ntdll labeled as ntdll_77550000; a WOW64
process has two ntdll modules loaded, a 32-bit one and a 64-bit one.
WinDbg differentiates between these two by placing the address of the
module after the 32-bit variant. The 64-bit ntdll can be found in
%WINDIR%\System32 and the 32-bit in %WINDIR%\SysWOW64. In the PDBs,
the 64bit and 32bit ntdlls are referred to as ntdll.pdb and wntdll.pdb
respectively, try loading them in a disassembler! Continuing with the
call trace, if we look at what the Wow64Transition pointer holds we can see its
destination is wow64cpu!KiFastSystemCall. As
an aside, note that the address of wow64cpu!KiFastSystemCall is held in the 32-bit
TEB (Thread Environment Block) via member WOW32Reserved, this isn’t
relevant for this trace but is useful to know. In Figure 4 we see the
body of KiFastSystemCall.
Figure 4: KiFastSystemCall transitions to
x64 mode via segment selector 0x33
The KiFastSystemCall performs a jmp using
the 0x33 segment selector to a memory location just after the
instruction. This 0x33 segment transitions the CPU into 64-bit mode
via a GDT entry as described by (MalwareTech).
Let’s recap the trace we’ve performed to this point. We started from
a call in ntdll, NtResumeThread. This function calls the
Wow64SystemServiceCall function which then executes the
Wow64Transition. The KiFastSystemCall performs the transition from
32-bit to 64-bit execution. The flow is shown in Figure 5.
Figure 5: 32-bit to 64-bit transition
The destination of the CPU transition jump is the 64-bit code show
in Figure 6.
Figure 6: Destination of KiFastSystemCall
Figure 6 shows the first 64-bit instruction we’ve seen executed in
this call trace so far. In order to understand it, we need to look at
how the WOW64 system initializes itself. For a detailed explanation of
this refer to (wbenny). For now, we can look at the important parts in
wow64cpu!RunSimulatedCode.
Figure 7: 64bit registers are saved in RunSimulatedCode
Figure 7 depicts the retrieval of the 64-bit TEB which is used to
access Thread Local Storage at slot index 1. Then the moving of a
function pointer table into register r15. The TLS data retrieved is an
undocumented data structure WOW64_CPURESERVED that contains register data and
CPU state information used by the WOW64 layer to set and restore
registers across the 32-bit and 64-bit boundaries. Within this
structure is the WOW64_CONTEXT structure, partially
documented on the Microsoft website. I have listed both
structures at the end of this post. We’ll look at how this context
structure is used later, but for our understanding of the jmp earlier
all we need to know is that r15 is a function pointer table.
It’s interesting to note at this point the architecture of the WOW64
layer. From the perspective of the 64-bit kernel the execution of
32-bit (Wow64) usermode applications is essentially a big while loop.
The loop executes x86 instructions in the processor’s 32-bit execution
mode and occasionally exits the loop to service a system call. Because
the kernel is 64-bit, the processor mode is temporarily switched to
64-bit, the system call serviced, then the mode switched back and the
loop continued where it was paused. One could say the WOW64 layer acts
like an emulator where the instructions are instead executed on the
physical CPU.
Going back to the jmp instruction we saw in Figure 6, we now know
what is occurring. The instruction jmp [r15 + 0xF8] is equivalent to
the C code jmp TurboThunkDispatch[0xF8 / sizeof(uint64_t)]. Looking at
the function pointer at this index we can see we’re at the function
wow64cpu!CpupReturnFromSimulatedCode
(Figure 8).
Figure 8: TurboThunk table’s last
function pointer entry is an exit routine
This routine is responsible for saving the state of the 32-bit
registers into the WOW64_CONTEXT structure
we mentioned before as well as retrieving the arguments for the
syscall. There is some trickiness going on here, so let’s examine this
in detail. First a pointer to the stack is moved into r14 via xchg,
the value at this location will be the return address from the syscall
stub where Wow64SystemServiceCall was
called. The stack pointer r14 is then incremented by 4 to get a
pointer to where the stack should be reset when it’s time to restore
all these context values. These two values are then stored in the
context’s EIP and ESP variables respectively. The r14 stack pointer is
then incremented one more time to get the location where the __stdcall
arguments are (remember stdcall passes all arguments on the stack).
This argument array is important for later, remember it. The arguments
pointer is moved into r11, so in C this means that r11 is equivalent
to an array of stack slots where each slot is an argument uint32_t
r11[argCount]. The rest of the registers and EFlags are then saved.
Once the 32-bit context is saved, the WOW64 layer then calculates
the appropriate TurboThunk to invoke by grabbing the upper 16 bits of
the syscall number and dispatches to that thunk. Note that at the
beginning of this array is the function TurboDispatchJumpAddressEnd, shown in Figure 9,
which is invoked for functions that do not support TurboThunks.
Figure 9: TurboThunk table’s first
function pointer entry is an entry routine
TurboThunks are described by (wbenny)—read his blog post at this
point if you have not. To summarize the post, for functions that have
simple arguments with widths <= sizeof(uint32_t) the WOW64 layer
will directly widen these arguments to 64 bits via zero or
sign-extension and then perform a direct syscall into the kernel. This
all occurs within wow64cpu, rather than executing a more complex path
detailed as follows. This acts as an optimization. For more complex
functions that do not support TurboThunks the TurboDispatchJumpAddressEnd stub is used which
dispatches to wow64!SystemServiceEx to
perform the system call as shown in Figure 10.
Figure 10: Complex system calls go
through Wow64SystemServiceEx
We’ll look at this routine in a moment as it’s the meat of this blog
post, but for now let’s finish this call trace. Once Wow64SystemServiceEx returns from doing the system
call the return value in eax is moved into the WOW64_CONTEXT structure and then the 32-bit
register states are restored. There’s two paths for this, a common
case and a case that appears to exist only to be used by NtContinue and other WOW64 internals. A flag at
the start of the WOW64_CPURESERVED structure
retrieved from the TLS slot is checked, and controls which restore
path to follow as shown in Figure 11.
Figure 11: CPU state is restored once the
system call is done; there’s a simple path and a complex one
handling XMM registers
The simpler case will build a jmp that uses the segment selector
0x23 to transition back to 32-bit mode after restoring all the saved
registers in the WOW64_CONTEXT. The more
complex case will additionally restore some segments, xmm values, and
the saved registers in the WOW64_CONTEXT
structure and then will do an iret to transition back. The common case
jmp once built is shown in Figure 12.
Figure 12: Dynamically built jmp to
transition back to 32bit mode
At this point our call trace is complete. The WOW64 layer has
transitioned back to 32-bit mode and will continue execution at the
ret after Wow64SystemServiceCall in the
syscall stub we started with. Now that an understanding of the flow of
the WOW64 layer itself is understood, let’s examine the Wow64SystemServiceEx call we glossed over before.
A little bit into the Wow64SystemServiceEx
routine, Figure 13 shows some interesting logic that we will use later.
Figure 13: Logging routines invoked
before and after dispatching the syscalls
The routine starts by indexing into service tables which hold
pointers to routines that convert the passed argument array into the
wider 64-bit types expected by the regular 64-bit system modules. This
argument array is exactly the stack slot that was stored earlier in r14.
Two calls to the LogService function
exist, however these are only called if the DLL
%WINDIR%\system32\wow64log.dll is loaded and has the exports
Wow64LogInitialize, Wow64LogSystemService, Wow64LogMessageArgList, and
Wow64LogTerminate. This DLL is not present on Windows by default, but
it can be placed there with administrator privileges.
The next section will detail how this logging DLL can be used to
hook syscalls that transition through this wow64layer. Because the
logging routine LogService is invoked before
and after the syscall is serviced we can achieve a standard looking
inline hook style callback function capable of inspecting arguments
and return values.
Bypassing Inline Hooks
As described in this blog post, Windows provides a way for 32-bit
applications to execute 64-bit syscalls on a 64-bit system using the
WOW64 layer. However, the segmentation switch we noted earlier can be
manually performed, and 64-bit shellcode can be written to setup a
syscall. This technique is popularly called “Heaven’s Gate”.
JustasMasiulis’ work call_function64
can be used as a reference to see how this may be done in practice
(JustasMasiulis). When system calls are performed this way the 32-bit
syscall stub that the WOW64 layer uses is completely skipped in the
execution chain. This is unfortunate for security products or tracing
tools because any inline hooks in-place on these stubs are also
bypassed. Malware authors know this and utilize “Heaven’s Gate” as a
bypass technique in some cases. Figure 14 and
[…]
Read the original article: WOW64!Hooks: WOW64 Subsystem Internals and Hooking Techniques