PRE

Introduction

In this second blog post about Practical Revere Engineering solutions I’d like to focus on the following exercise on Page 128. This one is the first related to Asynchronous and Ad-Hoc Execution kernel objects,  and specifically on how System Threads are invoked via the PsCreateSystemThread routine.

Here is the original exercise statement:

After reading some online forums, you notice some people suggesting that PsCreateSystemThread will create a thread in the context of the calling process. In other words, they are suggesting that if you call PsCreateSystemThread in an IOCTL handler, the new thread will be in the context of the requesting user-mode application. Assess the validity of this statement by writing a driver that calls PsCreateSystemThread in the IOCTL handler. Next, experiment with a non-NULL ProcessHandle and determine if the context differs.

Before jumping into answering the actual exercise, we should get acquainted with all the bits and pieces that make up the current state of affairs.

System Process and System Threads

Process ID 4, also known as System Process is a container for kernel-mode system threads: these threads have similar ETHREAD/KTHREAD attributes to a userland threads but their code runs only in system address space (that is, either ntoskrnl or a device driver). These threads are employed by the Windows kernel as part of its submodules like memory management, I/O management, Security and so forth.

As a rule, system threads live and die just within the System process. However a driver can spawn system threads in an arbitrary process context: this is done in case a thread requires userland address space interaction or, in general, access to another process resource.

Creating System Threads

Going back to the original exercise’s statement from the book, it also suggests to crawl forums containing discussions on System Threads inside the IOCTL handler’s context. A first hit is from OSR which looks exactly the one we are after.

Kernel device drivers can create system threads by using either PsCreateSystemThread or, starting from Windows 8,IoCreateSystemThread and no matter what routine we are using to spawn it, the driver must terminate the thread via PsTerminateSystemThread.

Since the exercise explicitly asks for PsCreateSystemThread we are not going to spend any extra energy on the second API.

Here’s the PsCreateSystemThread syntax breakdown from MSDN:

NTSTATUS PsCreateSystemThread(
_Out_ PHANDLE ThreadHandle,
_In_ ULONG DesiredAccess,
_In_opt_ POBJECT_ATTRIBUTES ObjectAttributes, 
_In_opt_ HANDLE ProcessHandle,
_Out_opt_ PCLIENT_ID ClientId, 
_In_ PKSTART_ROUTINE StartRoutine, S
_In_opt_ PVOID StartContext);

The fourth argument, ProcessHandle is of great interest to us as we’ll find out shortly.

MSDN hints:

If this parameter is not supplied, the thread will be created in the initial system process. This value should be NULL for a driver-created thread. Use the NtCurrentProcess macro, defined in Ntddk.h, to specify the current process.

Enough with these premises, though. It’s now time to take a closer look at IDA and get down to it. It immediately appears that PsCreateSystemThread is just a wrapper for PsCreateSystemThreadEx, so we are going to focus only on the latter.

Quite at the beginning of this function, the ProcessHandle parameter held in r9 is then saved into rbx and verified to be non-null afterwards.

ida1

From the above snippet, if ProcessHandle (rbx) is null (left branch), PspSystemPartition offset 0x68 is dereferenced and saved into rbx which, at this point, acquires new significance as it will be used as ProcessPointer parameter when calling PspCreateThread.

But now the question is, what’s really at offset 0x68 of PspSystemPartition?

It looks like PspSystemPartition is an opaque structure, but at first glance it seems that this offset is used by kernel functions to get a hold of the System’s EPROCESS value.

Let’s verify this assumption:

2: kd> dq poi(nt!PspSystemPartition)+68 L1
ffffb18c`a0a96d48  ffffb18c`a0a5f040

2: kd> !process 0 0 system
PROCESS ffffb18ca0a5f040
    SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
    DirBase: 001ad000  ObjectTable: ffffd702008245c0  HandleCount: 2583.
    Image: System

Cool, the two EPROCESS addresses do indeed match.

Moving our attention to the right branch, we can notice that when a non-null ProcessHandle is given,ObpReferenceObjectByHandleWithTag is invoked with the original caller process handle as an argument and it will be fetch the kernel object associated with that handle. The function will return an Object of EPROCESS type, which will be used as a process context when invoking PspCreateThread.

To summarize, if we we provide a null ProcessHandle, the value of System EPROCESS will be passed when calling PspCreateThread, else will be used the one obtained after having queried ObpReferenceObjectByHandleWithTag.

Building the PoC driver

Armed with this information we can now write a quick PoC driver that creates two system threads: one in the System process' context and one in the context of the calling IOCTL process.

After having created the usual device driver and the symbolic link through IoCreateDevice and IoCreateSymbolicLink respectively, we move on and define the following Major Functions:

	DriverObject->MajorFunction[IRP_MJ_CREATE] = SysThreadCreateClose;
	DriverObject->MajorFunction[IRP_MJ_CLOSE] = SysThreadCreateClose;
	DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = SysThreadDeviceControl;
	DriverObject->DriverUnload = SysThreadUnload;

The hero of the story is - it goes without saying - the function associated with the IRP_MJ_DEVICE_CONTROL routine, that is SysThreadDeviceControl in our instance (error checks omitted).

NTSTATUS SysThreadDeviceControl(PDEVICE_OBJECT deviceObject,PIRP Irp)
	{
        irpSp = IoGetCurrentIrpStackLocation(Irp);

		if (irpSp->Parameters.DeviceIoControl.IoControlCode == IOCTL_SYSTHREAD_METHOD_BUFFERED)
		{
			// NULL ProcessHandle = System Thread created in SYSTEM(4) process context.
			threadStatus = PsCreateSystemThread(&hThreadA,THREAD_ALL_ACCESS,NULL,NULL,NULL,(PKSTART_ROUTINE)SysThreadStartSystem,NULL);
            ZwClose(hThreadA);

			// ProccessHandle of caller process retrieved via NtCurrentProcess() macro = System Thread in Caller Process context.
			threadStatus = PsCreateSystemThread(&hThreadB,THREAD_ALL_ACCESS,NULL,NtCurrentProcess(),NULL,(PKSTART_ROUTINE)SysThreadStartIOCTLHandler,NULL);
            ZwClose(hThreadB);
		}
	
		Irp->IoStatus.Status = status;
		Irp->IoStatus.Information = 0;

		IoCompleteRequest(Irp, IO_NO_INCREMENT);

		return status;
	}

Once the current IRP is processed and the IOCTL matches the one defined in IOCTL_SYSTHREAD_METHOD_BUFFERED, we assume that the user application is willing to interact with this very routine. We then call PsCreateSystemThread and start a first thread with a NULL ProcessHandle (fourth parameter) and SysThreadStartSystem as start routine. Immediately afterward, we launch another thread in the userland IOCTL context by using the NtCurrentProcess() macro to obtain the ProcessHandle parameter and SysThreadStartIOCTLHandler as start routine. Note that we have to use the Nt (and not the Zw) version of this macro since the parameters are coming from a user mode process.

Apart from facilitating the debugging process, the two thread StartRoutines are not doing that much.


VOID SysThreadStartSystem(PVOID StartContext){
	UNREFERENCED_PARAMETER(StartContext);
	DbgPrint("SysThreadStartSystem has been called.\n");
	PsTerminateSystemThread(STATUS_SUCCESS);
}

VOID SysThreadStartIOCTLHandler(PVOID StartContext) {
	UNREFERENCED_PARAMETER(StartContext);
	DbgPrint("SysThreadStartIOCTLHandler has been called.\n");
	PsTerminateSystemThread(STATUS_SUCCESS);
}

That’s about the gist of it and - sure enough - we are going to need also the userland app in order to trigger the two StartThread functions (for the entire project check the link at the end of the post).

Validating assumptions

Ok, since we know that the truth lies only in the debugger (as oppsoed to the source code) we want to validate our theories. We so place two breakpoints at both ‘ThreadStart’ functions:

0: kd> x SysThreadHandler!*ThreadStart*
fffff802`16d81350 SysThreadHandler!SysThreadStartSystem (void *)
fffff802`16d81320 SysThreadHandler!SysThreadStartIOCTLHandler (void *)

0: kd>  bp SysThreadHandler!SysThreadStartSystem
0: kd>  bp SysThreadHandler!SysThreadStartIOCTLHandler

As soon as we fire the user application we are going to hit the first function’s breakpoint:

Breakpoint 0 hit
SysThreadHandler!SysThreadStartSystem:
fffff802`16d81350 48894c2408      mov     qword ptr [rsp+8],rcx

0: kd> !process -1 0
PROCESS ffffb18ca0a5f040
    SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000
    DirBase: 001ad000  ObjectTable: ffffd702008245c0  HandleCount: 2815.
    Image: System

And by looking at theCid we can tell is the well-known System (4) process id.

Continuing execution…

Breakpoint 1 hit
SysThreadHandler!SysThreadStartIOCTLHandler:

2: kd> !process -1 0
PROCESS ffffb18ca9ec5340
    SessionId: 1  Cid: 23f8    Peb: a651337000  ParentCid: 1ff8
    DirBase: 127e5f000  ObjectTable: ffffd7021b4185c0  HandleCount:  34.
    Image: UserApp.exe

…. we hit the second breakpoint and confirm that the second kernel thread belongs to the userapp process.

Mapping calls to PsCreateSystemThread: dynamic vs static

The second part of this exercise is asking for:

Cross-reference as many calls to PsCreateSystemThread as possible in the kernel image. Determine whether any of them pass a non-NULL ProcessHandle parameter. Explain the purpose of these routines. Repeat the exercise for as many functions as possible.

As we already mentioned, the reason for creating a system-thread in a non-system process context is to be able to interact with userland address space. The only requirement is to pass a valid ProcessHandle so we craft a conditional breakpoint that checks that r9 (fourth argument) is non-null.

bp nt!PsCreateSystemThread ".if @r9 != 0x0  {.printf \"Found NON-NULL call to PsCreateSystemThread with Process Handle value of 0x%p \\n\",r9} .else {gc}"

And verify who the handle belongs to as soon as we have a hit.

Found NON-NULL call to PsCreateSystemThread with Process Handle value of 0xffffffff80000fd0 
nt!PsCreateSystemThread:
fffff802`0dc55040 4c8bdc          mov     r11,rsp

2: kd> !handle @r9

PROCESS ffffb18ca3e5b080
    SessionId: 0  Cid: 0368    Peb: bfe1bac000  ParentCid: 02d0
    DirBase: 20df61000  ObjectTable: ffffd702045793c0  HandleCount: 1301.
    Image: svchost.exe

Kernel handle table at ffffd702008245c0 with 3102 entries in use

80000fd0: Object: ffffb18ca5a6c040  GrantedAccess: 001fffff (Protected) Entry: ffffd70204a35f40
Object: ffffb18ca5a6c040  Type: (ffffb18ca0a957a0) Process
    ObjectHeader: ffffb18ca5a6c010 (new version)
        HandleCount: 1  PointerCount: 64395

We can further check call stack to pinpoint the originating function.

2: kd> k
 # Child-SP          RetAddr           Call Site
00 ffffea0d`059a99b8 fffff802`0d8c9bdc nt!PsCreateSystemThread
01 ffffea0d`059a99c0 fffff802`0d8c99cd nt!SmKmStoreHelperStart+0x3c
02 ffffea0d`059a9a10 fffff802`0dc54ce8 nt!SMKM_STORE<SM_TRAITS>::SmStStart+0x251
03 ffffea0d`059a9ad0 fffff802`0dc554cc nt!SmProcessCreateRequest+0x228
04 ffffea0d`059a9c30 fffff802`0dc55550 nt!SmpDirtyStoreCreate+0xd4
05 ffffea0d`059a9cf0 fffff802`0dcf65d9 nt!SmpKeyedStoreCreate+0x4c
06 ffffea0d`059a9d30 fffff802`0dcf5231 nt!SmProcessCreateNotification+0x45
07 ffffea0d`059a9d60 fffff802`0dc6c477 nt!PspAllocateProcess+0x1ba1
08 ffffea0d`059aa2d0 fffff802`0da071b5 nt!NtCreateUserProcess+0xa17
09 ffffea0d`059aaa90 00007ffd`1512d8a4 nt!KiSystemServiceCopyEnd+0x25
0a 000000bf`e1cfcbc8 00007ffd`12bb2d2c 0x00007ffd`1512d8a4
0b 000000bf`e1cfcbd0 00000000`00000000 0x00007ffd`12bb2d2c

In this case the system thread has been created by the Store Manager sub component, within the context of svchost.exe service process.

This approach is obviously time-consuming and error prone as it is going to depend on external dynamic events to map each and every function call. To map all the call we could use IdaPython and statically fetch all the cross references by filtering out any non-null value of ProcessHandle, for example:

ps1  = idc.get_name_ea_simple("PsCreateSystemThread")
ps2 = idc.get_name_ea_simple("PsCreateSystemThreadEx")

def filterXrf(targetFunction):
	for addr in CodeRefsTo(targetFunction, 0):
		funcName  = get_func_name(addr)
		func = ida_funcs.get_func(addr)
		funcStart = func.start_ea
		funcEnd = func.end_ea
		print("Calling function: %s at 0x%x %s - function start 0x%x " % (funcName,addr, idc.generate_disasm_line(addr, 0),funcStart))
		cur_addr  = addr

		while(cur_addr > funcStart):
			cur_asm = idc.generate_disasm_line(cur_addr, 0)
			if "xor     r9d, r9d" in cur_asm:
				break
			if "r9" in cur_asm:
				print("0x%x %s" % (cur_addr, idc.generate_disasm_line(cur_addr, 0)))
				break
			cur_addr = idc.prev_head(cur_addr)

filterXrf(ps1)
filterXrf(ps2)

If we run it in IDA we can spot a few calls from Cache and Store manager.

Calling function: SmKmStoreHelperStart
0x1402c9bbf mov     r9, rdx; ProcessHandle
Calling function: ?SmStWorkerThreadStartThread@?$SMKM_STORE@USM_TRAITS@@@@SAJPEAU1@PEAXP6AX1@ZPEAPEAU_ETHREAD@@@Z
0x1402c9c5f mov     r9, rdx; ProcessHandle
Calling function: ?SmCompressCtxCreateThread@?$SMKM_STORE_MGR@USM_TRAITS@@@@SAJPEAU_SM_COMPRESS_CONTEXT@1@PEAU1@K@Z
0x14032203e mov     r9, [rdi+740h]; ProcessHandle
Calling function: CcInitializePartition
0x1403bdb71 mov     r9, [rbp+70h]; ProcessHandle
Calling function: CcInitializeAsyncRead
0x1403be10b mov     r9, [rax+70h]; ProcessHandle

Even though win32kfull.sys imports PsCreateSystemThread I could not spot any intermodular calls during debugging: this is - guesswork on steroids here - due to the recent modifications on desktop heaps after Redstone 5.

Kudos to Felix Schwyzer for the inspirational blog post on this subject.

The complete project can be found here.