PRE

Introduction

This post is about ‘Work Items’ , the third part of my ‘Practical Reverse Engineering’ solutions series and a natural continuation to the previous one about kernel system threads. Luckily, thanks to Alex Ionescu, while researching the topic, I had the chance to get a pre-proof copy of Windows Internals 7th edition, Part 2 ahead of time so I could check my initial findings against the ones from the authors of the book. Also, shout-out to Satoshi Tanda for shedding some light along the path.

As we saw in the previous post, there are times where we want to run a large chunk of code on a different thread rather than the current one, and this it what dedicated system threads are for. On the other hand, if we need to run a smaller piece of code, it’s better to delegate execution to one of the kernel’s thread pool running within the system process, instead of having extra scheduling and memory overhead associated with additional threads in the system Work items are just asynchronous mechanisms that are queued into one of the system thread pool and, due to their lightweight nature, they are becoming more and more adopted instead of ad-hoc system threads.

Work items are employed when a driver wants do delay execution by lowering the IRQL level on a given processor and defer tasks to be executed at PASSIVE_LEVEL instead, which is the IRQL level which work items are designed to operate. As an example, writing a file on the disk is something that is not allowed at DISPATCH_LEVEL, so a Deferred Procedure Call (DPC) could delegate work to a work item which will execute the operation at PASSIVE_LEVEL. Once the work items are enqueued by the driver, a system worker thread will eventually retrieve the item from the queue and run the callback routine.

Before jumping into the exercise’s requirement and solution, let’s first get an overview of their structures and internals.

Structures overview

The main structure that compose a work item it’s aptly named IO_WORKITEM: starting from Windows 10 1507 Threshold 1, a new member calledWorkOnBehalfThread (formerly known as WorkingOnBehalfClient) has been added. It references an ETHREAD structure and it’s used whenever a routine is going to be executed on behalf of another thread.

 0: kd> dt nt!_IO_WORKITEM  -r 1
   +0x000 WorkItem         : _WORK_QUEUE_ITEM
      +0x000 List             : _LIST_ENTRY
         +0x000 Flink            : ???? 
         +0x008 Blink            : ???? 
      +0x010 WorkerRoutine    : ???? 
      +0x018 Parameter        : ????
   +0x020 Routine          : Ptr64     void 
   +0x028 IoObject         : Ptr64 Void
   +0x030 Context          : Ptr64 Void
   +0x038 WorkOnBehalfThread : Ptr64 _ETHREAD    
    +0x040 Type             : Uint4B
   +0x044 ActivityId       : _GUID

As we can see from above breakdown, the WorkItem field is just a list entry containing the actual routine and parameter as depicted here:

struct _WORK_QUEUE_ITEM
{
    struct _LIST_ENTRY List;                                                //0x0
    VOID (*WorkerRoutine)(VOID* arg1);                                      //0x10
    VOID* Parameter;                                                        //0x18
};

The parameters that are present in this structure are the actual work routine + arguments that are going to be enqueued and ultimately executed.

Enqueuing work items

A kernel object that wants to place system worker thread’s services can do so by calling either the functions ExQueueWorkItem or, to be used by device drivers only, IoQueueWorkItem. Both of these function places the work item in dedicated queues where system workers threads are waiting to pick them up.

On an historical note, the original NT work item had only two priority levels: DelayedWorkQueue and the CriticalWorkQueue :)

Whereas, on Windows 8 there were four different queue types:

 typedef enum _WORK_QUEUE_TYPE {
    CriticalWorkQueue       = 0,
    DelayedWorkQueue        = 1,
    HyperCriticalWorkQueue  = 2,
    MaximumWorkQueue        = 3
  } WORK_QUEUE_TYPE;

And their granularity has increased even more on modern Windows 10:

typedef enum _WORK_QUEUE_TYPE {
  CriticalWorkQueue,
  DelayedWorkQueue,
  HyperCriticalWorkQueue,
  NormalWorkQueue,
  BackgroundWorkQueue,
  RealTimeWorkQueue,
  SuperCriticalWorkQueue,
  MaximumWorkQueue,
  CustomPriorityWorkQueue
} WORK_QUEUE_TYPE;

Regarding increasing number of priority classes, the Windows Internal Part2 book come in handy:

Because the naming of all of these worker queues started becoming confusing, recent versions of Windows introduced custom priority worker threads, which are now recommended for all driver developers, and which allow the driver to pass-in their own priority level.

Back to analyzing work items structures, if we break on nt!IoQueueWorkItem and perform a live debug of the above structure through our PoC driver (more on that later), we can see how the _WORK_QUEUE_ITEM parameters have been filled.

Breakpoint 0 hit
nt!IoQueueWorkItem:

2: kd> dt _IO_WORKITEM @rcx
nt!_IO_WORKITEM
   +0x000 WorkItem         : _WORK_QUEUE_ITEM
   +0x020 Routine          : (null) 
   +0x028 IoObject         : 0xffffb58f`2e15e9a0 Void
   +0x030 Context          : (null) 
   +0x038 WorkOnBehalfThread : (null) 
   +0x040 Type             : 1
   +0x044 ActivityId       : _GUID {00000000-0000-0000-0000-000000000000}

2: kd> dx -id 0,0,ffffb58f2c93d080 -r1 (*((ntkrnlmp!_WORK_QUEUE_ITEM *)0xffffb58f2f8ec760))
(*((ntkrnlmp!_WORK_QUEUE_ITEM *)0xffffb58f2f8ec760))                 [Type: _WORK_QUEUE_ITEM]
    [+0x000] List             [Type: _LIST_ENTRY]
    [+0x010] WorkerRoutine    : 0xfffff8073bf5a300 [Type: void (__cdecl*)(void *)]
    [+0x018] Parameter        : 0xffffb58f2f8ec760 [Type: void *]

We can verify that the IoObject is pointing to our device object from our PoC kernel driver that is named workitem:

2: kd> !object ffffb58f`2e15e9a0
Object: ffffb58f2e15e9a0  Type: (ffffb58f27cf2900) Device
    ObjectHeader: ffffb58f2e15e970 (new version)
    HandleCount: 0  PointerCount: 2
    Directory Object: ffffa2076926e650  Name: workitem

This is expected as we have been using IoQueueWorkItem in our driver, which is telling the I/O system to add a reference to the object, which makes sure the driver cannot quit ahead of time while the thread/work-item are still executing.

Here is the function syntax:

void IoQueueWorkItem(
  __drv_aliasesMem PIO_WORKITEM IoWorkItem,
  PIO_WORKITEM_ROUTINE          WorkerRoutine,
  WORK_QUEUE_TYPE               QueueType,
  __drv_aliasesMem PVOID        Context
);

From that, we can verify the routine that is going to be executed in the system thread pool, which is passed as a second parameter in rdx

1: kd> u @rdx
workitem!KWorkItemRoutine [C:\Users\matteo\source\repos\workitem\workitem\workitem.cpp @ 12]:
fffff804`17741000 4889542410      mov     qword ptr [rsp+10h],rdx
fffff804`17741005 48894c2408      mov     qword ptr [rsp+8],rcx
fffff804`1774100a 4883ec38        sub     rsp,38h
fffff804`1774100e 488b442448      mov     rax,qword ptr [rsp+48h]
fffff804`17741013 4889442420      mov     qword ptr [rsp+20h],rax

Next, IopQueueWorkItemProlog is called from IoQueueWorkItem which returns the actual well-formed IO_WORKITEM populated structure.

1: kd> dt _WORK_QUEUE_ITEM @rax
ntdll!_WORK_QUEUE_ITEM
   +0x000 List             : _LIST_ENTRY [ 0x00000000`00000000 - 0xffffdb01`089848f0 ]
   +0x010 WorkerRoutine    : 0xfffff804`0ed5a300     void  nt!IopProcessWorkItem+0
   +0x018 Parameter        : 0xffffdb01`089848e0 Void

If we inspect the _WORK_QUEUE_ITEM returned value, we notice below that the parameter has now been correctly filled with all the correct params, including our final workitem!KWorkItemRoutine routine.

1: kd> dt _IO_WORKITEM   0xffffdb01`089848e0
nt!_IO_WORKITEM
   +0x000 WorkItem         : _WORK_QUEUE_ITEM
   +0x020 Routine          : 0xfffff804`17741000     void  workitem!KWorkItemRoutine+0
   +0x028 IoObject         : 0xffffdb01`06cb0e10 Void
   +0x030 Context          : 0xffffdb01`089848e0 Void
   +0x038 WorkOnBehalfThread : 0xffffdb01`03f5c080 _ETHREAD
   +0x040 Type             : 0
   +0x044 ActivityId       : _GUID {00000000-0000-0000-0000-000000000000}

Moving on with nt!ExQueueWorkItemFromIo, the same return value is passed as the first function parameter and checking the IO_WORKITEM structure once more, reveals the value of WorkOnBehalfThread. Since this value it’s pointing to an ETHREAD structure we can just treat it as such and query it as:

1: kd> !thread 0xffffdb01`03f5c080
THREAD ffffdb0103f5c080  Cid 2324.12a4  Teb: 000000548dba9000 Win32Thread: 0000000000000000 RUNNING on processor 1
IRP List:
    ffffdb0103c4e5b0: (0006,0118) Flags: 00060000  Mdl: 00000000
Not impersonating
DeviceMap                 ffffad041be65130
Owning Process            ffffdb01074f9080       Image:         userapp.exe
[...]

No wonder this thread belong to our original userapp.exe, and since we are in its IOCTL context, the kernel is preserving this value in case it should return anything to it. However, as we’d expect - and will confirm this shortly - the work routine itself is going to run under a system thread pool context.

Another aspect worth noting is that the WorkerRoutine inside the LIST_ENTRY is pointing to the IopProcessWorkItem, but why?

2: kd> u fffff8073bf5a300
nt!IopProcessWorkItem:

Once we have enqueued our work item, if we place a breakpoint to IopProcessWorkItem we notice that is responsible for executing the work item routine itself. And if we check the first argument being passed, we can in fact discover the familiar IO_WORKITEM structure with the same values from before.

2: kd> dt _IO_WORKITEM @rcx
nt!_IO_WORKITEM
   +0x000 WorkItem         : _WORK_QUEUE_ITEM
   +0x020 Routine          : 0xfffff804`17741000     void  workitem!KWorkItemRoutine+0
   +0x028 IoObject         : 0xffffdb01`06cb0e10 Void
   +0x030 Context          : 0xffffdb01`089848e0 Void
   +0x038 WorkOnBehalfThread : 0xffffdb01`03f5c080 _ETHREAD
   +0x040 Type             : 0
   +0x044 ActivityId       : _GUID {00000000-0000-0000-0000-000000000000}

The work item’s routine is still the same original one and pointing again to workitem!KWorkItemRoutine+0 - let’s continue execution and see if what happens next. Once we land in our routine, thread context has already switched and we are suddenly in a system thread context and not in the user IOCTL one any longer.

: kd> k
 # Child-SP          RetAddr           Call Site
00 ffff860b`5f4f4af8 fffff807`3bf5a435 workitem!KWorkItemRoutine [C:\Users\matteo\source\repos\workitem\workitem\workitem.cpp @ 12] 
01 ffff860b`5f4f4b00 fffff807`3be25975 nt!IopProcessWorkItem+0x135
02 ffff860b`5f4f4b70 fffff807`3bf17e25 nt!ExpWorkerThread+0x105
03 ffff860b`5f4f4c10 fffff807`3bffd0d8 nt!PspSystemThreadStartup+0x55
04 ffff860b`5f4f4c60 00000000`00000000 nt!KiStartSystemThread+0x28

And, no surprise, if we check the current thread it does belong to System process and ExpWorkerThread thread pool.

2: kd> !thread
THREAD ffffdb0fff49c080  Cid 0004.00d8  Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 2
Not impersonating
DeviceMap                 ffffad0416a36c00
Owning Process            ffffdb0fff487040       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      72328          Ticks: 34 (0:00:00:00.531)
Context Switch Count      1979           IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.046
Win32 Start Address nt!ExpWorkerThread (0xfffff8040ec25870)

Partitions to the rescue

Now that we know more about how work items are queued, let’s see where these queues are implemented in memory. After some investigations and rabbit holes, it appears that starting from Windows10 memory partitions are used to store WorkerQueues instead of CPU structures like the ENODE from KPRCB.

A memory partition is a self-contained entity that has its own management internals like page lists, working set etc. which are isolated from other partitions. There are actually three types of partition structures used in the Windows 10 Kernel: Memory Manager, Cache System and Executive.

For our purpose, the executive partition is the one we are after, which is well described by the upcoming Windows Internals Part 2 book:

Each partition object contains an executive partition, which is the portion of the partition object relevant to the executive, namely, the system worker thread logic. It contains a data structure tracking the work queue manager for each NUMA node part of the partition (a queue manager is made up of the deadlock detection timer, the work queue item reaper, and a handle to the actual thread doing the management). It then contains an array of pointers to each of the 8 possible work queues (EX_WORK_QUEUE). These queues are associated with an individual index and track the number of minimum (guaranteed) and maximum threads, as well as how many work items have been processed so far.

Furthermore, the book mentions that two different kinds of queues exist for a given system: the ExPool and IoPool, where the first is used by system components via the ExQueueWorkItem routine and second via IoAllocateWorkItem routine made for device drivers.

Back on our quest of mapping relations between work items queues and the memory partition, we discovered that nt!ExWorkerQueue is a kernel variable which has a pointer to the System Partition Object.

0: kd> dx *(nt!_EX_WORK_QUEUE**)&nt!ExWorkerQueue
*(nt!_EX_WORK_QUEUE**)&nt!ExWorkerQueue                 : 0xffffdb0fff47cbd0 [Type: _EX_WORK_QUEUE *]
    [+0x000] WorkPriQueue     [Type: _KPRIQUEUE]
    [+0x2b0] Partition        : 0xffffdb0fff4502a0 [Type: _EX_PARTITION *]
    [...]

We can further inspect the EX_PARTITION:

0: kd> dx -r1 ((ntkrnlmp!_EX_PARTITION *)0xffffdb0fff4502a0)
((ntkrnlmp!_EX_PARTITION *)0xffffdb0fff4502a0)                 : 0xffffdb0fff4502a0 [Type: _EX_PARTITION *]
    [+0x000] PartitionObject  : 0xffffdb0fff48ba40 [Type: _EPARTITION *]
    [+0x008] WorkQueues       : 0xffffdb0fff4902f0 [Type: _EX_WORK_QUEUE * * *]
    [+0x010] WorkQueueManagers : 0xffffdb0fff490450 [Type: _EX_WORK_QUEUE_MANAGER * *]
    [+0x018] QueueAllocationMask : 248 [Type: long]

and the EPARTITION:

0: kd> dx -r1 ((ntkrnlmp!_EPARTITION *)0xffffdb0fff48ba40)
((ntkrnlmp!_EPARTITION *)0xffffdb0fff48ba40)                 : 0xffffdb0fff48ba40 [Type: _EPARTITION *]
    [+0x000] MmPartition      : 0xfffff8040f650bc0 [Type: void *]
    [+0x008] CcPartition      : 0xffffdb0fff579560 [Type: void *]
    [+0x010] ExPartition      : 0xffffdb0fff4502a0 [Type: void *]
    ...

As we saw earlier, the thread pool responsible for executing our work item routine is ExpWorkerThread, which is initialized via the ExpWorkerInitialization function and which, in turn, references PspSystemPartition at offset 0x10 (that is PspSystemPartition->ExPartition).

ida1

Disclosing the running System Worker Thread.

Moving back to the first exercise’s question

Explain how we were able to determine that ExpWorkerThread is the sys- tem thread responsible for dequeueing work items and executing them. Hint: The fastest way is to write a driver.

We can write a quick PoC driver to find out the current thread executing the work item.

Here’s the main function responsible for queing the work item via IoQueueWorkItem:

NTSTATUS WorkitemDeviceControl(PDEVICE_OBJECT device, PIRP Irp) {
	auto stack = IoGetCurrentIrpStackLocation(Irp);
	auto status = STATUS_SUCCESS;


	switch (stack->Parameters.DeviceIoControl.IoControlCode) {
	case IOCTL_WORKITEM:
	{
		auto data = (JunkData*)stack->Parameters.DeviceIoControl.Type3InputBuffer;
		if (data == nullptr) {
			status = STATUS_INVALID_PARAMETER;
			break;
		}
		KeSetBasePriorityThread(KeGetCurrentThread(), 1);
		PIO_WORKITEM pWorkItem;
		// Work item
		pWorkItem = IoAllocateWorkItem(device);
		IoQueueWorkItem(pWorkItem, KWorkItemRoutine, DelayedWorkQueue, pWorkItem);
		break;
	}

	default:
		status = STATUS_INVALID_DEVICE_REQUEST;
		break;
	}

	Irp->IoStatus.Status = status;
	Irp->IoStatus.Information = 0;
	IoCompleteRequest(Irp, 0);
	return status;
}

The routine KWorkItemRoutine is simply getting the current process/thread via two macros:

VOID KWorkItemRoutine(IN DEVICE_OBJECT* DeviceObject, IN PVOID Context)
{
	UNREFERENCED_PARAMETER(DeviceObject)
	PIO_WORKITEM pIoWorkItem;
	pIoWorkItem = (PIO_WORKITEM)Context;
	KdPrint((DRIVER_PREFIX"KWorkItemRoutine running from [%p][%p]  \n", PsGetCurrentProcessId(), PsGetCurrentThreadId()));
	IoFreeWorkItem(pIoWorkItem);
}

After executing the routine from userland, the alert us of the current system thread pool being ‘leased’ to execute the routine.

WorkItemTest: KWorkItemRoutine running from [0000000000000004][00000000000000D8]    

To verify the correctness we can dump the entire IoPool queue from NUMA node0’s ExPartition.

dx -r0 @$queue = ((nt!_EX_PARTITION*)(*(nt!_EPARTITION**)&nt!PspSystemPartition)->ExPartition)->WorkQueues[0][1],d
    [+0x000] WorkPriQueue     [Type: _KPRIQUEUE]
    [+0x2b0] Partition        : 0xffff830e2d8500c0 [Type: _EX_PARTITION *]
    [+0x2b8] Node             : 0xfffff80704f25440 [Type: _ENODE *]
    [+0x2c0] WorkItemsProcessed : 2463 [Type: unsigned long]
    [+0x2c4] WorkItemsProcessedLastPass : 1708 [Type: unsigned long]
    [+0x2c8] ThreadCount      : 7 [Type: long]
    [+0x2cc (30: 0)] MinThreads       : 0 [Type: long]
    [+0x2cc (31:31)] TryFailed        : 0 [Type: unsigned long]
    [+0x2d0] MaxThreads       : 4096 [Type: long]
    [+0x2d4] QueueIndex       : IoPoolUntrusted (1) [Type: _EXQUEUEINDEX]
    [+0x2d8] AllThreadsExitedEvent : 0x0 [Type: _KEVENT *]

Only 7 threads are currently queued, so we can check them one by one.

0: kd> dx -r0 @$queue = ((nt!_EX_PARTITION*)(*(nt!_EPARTITION**)&nt!PspSystemPartition)->ExPartition)->WorkQueues[0][1],d
0: kd> dx -r1 Debugger.Utility.Collections.FromListEntry(@$queue->WorkPriQueue.ThreadListHead, "nt!_KTHREAD", "QueueListEntry")
Debugger.Utility.Collections.FromListEntry(@$queue->WorkPriQueue.ThreadListHead, "nt!_KTHREAD", "QueueListEntry")                
    [0x0]            [Type: _KTHREAD]
    [0x1]            [Type: _KTHREAD]
    [0x2]            [Type: _KTHREAD]
    [0x3]            [Type: _KTHREAD]
    [0x4]            [Type: _KTHREAD]
    [0x5]            [Type: _KTHREAD]
    [0x6]            [Type: _KTHREAD]

The second one appears to be nt!ExpWorkerThread,the one we’re after, with an id of d8 and belonging , as expected, to the System process:

0: kd> !thread 0xffff830e2f346080
THREAD ffff830e2f346080  Cid 0004.00d8  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrQueue) KernelMode Non-Alertable
    ffff830e2d85ea20  PriQueueObject
Not impersonating
DeviceMap                 ffff9e005d4365a0
Owning Process            ffff830e2d89e040       Image:         System
Attached Process          N/A            Image:         N/A
Wait Start TickCount      8367           Ticks: 10 (0:00:00:00.156)
Context Switch Count      751            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:01.140
Win32 Start Address nt!ExpWorkerThread (0xfffff80704425870)
Stack Init fffff00852f9ac90 Current fffff00852f9a820
Base fffff00852f9b000 Limit fffff00852f95000 Call 0000000000000000
Priority 12 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child     

If we want to fetch the ExPool queue we simply have to fetch the first member of the array as such:

dx -r0 @$queue = ((nt!_EX_PARTITION*)(*(nt!_EPARTITION**)&nt!PspSystemPartition)->ExPartition)->WorkQueues[0][1],d

Given the above WinDBG commands we could upgrade them in a full-fledged kernel driver that retrieves all threads of each of the two queues. I might include this driver as final exercise to the same github repository, but in meantime feel free to share it and let me know if you are planning to make one.