• Cisco Talos has developed a custom fuzzer using the popular snapshot fuzzer “WTF” which targets Direct Composition in Windows. 
  • Talos’ vulnerability research team used Protocol Buffers developed by Google to serialize and deserialize test cases. 
  • The Bochscpu backend of WTF was patched and other tricks were used to make snapshot fuzzing work correctly 
  • We hope that the release of our snapshot fuzzing implementation details will give new snapshot fuzzing ideas to the readers making direct composition more secure.

Direct Composition is a feature first introduced in Windows 8 that enables high-performance bitmap composition with transformations, effects and animations that are abstracted as kernel objects. These kernel objects are later serialized and sent to DWM (Desktop Window Manager) to be rendered on screen. 

The fact that kernel objects can be created and manipulated directly using system calls made it an attractive attack surface. Researchers have demonstrated how adversaries can exploit vulnerabilities in Direct Composition, such as in competitions like Pwn2Ownimplement.  

Although there is public research on Direct Composition, only a few discuss fuzzing this feature, and none, to our knowledge, that covers snapshot fuzzing.  Therefore, we wanted to research how to apply snapshot fuzzing to Direct Composition and implement a fuzzer using the publicly available WTF snapshot fuzzer. 

Past work on fuzzing Direct Composition 

Security researchers Peng Qiu and SheFang Zhong of the Qihoo 360 Vulcan Team first published research on vulnerabilities in Direct Composition in the conference talk “Win32k Dark Composition: Attacking the Shadow Part of Graphic Subsystem.” This research showed how to trigger and eventually fuzz Direct Composition-related code. By fuzzing the exposed code, security researchers could uncover potential vulnerabilities in Direct Composition, a feature that had never been publicly dissected for finding vulnerabilities 

More recently at HITB conference in 2023, researchers from Hillstone Networks presented a talk about using a modified version of the fuzzing framework syzkaller to fuzz Direct Composition. First, they fuzzed the kernel component with code coverage. With the generated corpus, at the end of each test case they added a call to NtDCompositionCommitChannel which triggers userland DWM to process the data. They fuzzed again with the modified corpus, and as a result, found vulnerabilities in DWM. Although syzkaller runs with full system virtualization, it is not a snapshot fuzzer and their approach did not collect coverage on DWM. 

“What The Fuzz,” snapshot-based fuzzer 

Wtf (“what the fuzz”) is a snapshot-based fuzzer for Windows targets developed by a researcher named Axel Souchet. After it was first released, researchers quickly started using it to fuzz various targets. It supports several backends to execute the snapshot, such Bochscpu, Windows Hypervisor Framework and KVM. Wtf also supports mutators such as hongfuzz and libfuzzer mutators by default. 

To get fuzzing working, researchers must create a memory snapshot and create a customized fuzzer that mutates the test case, copies it into the loaded snapshot’s guest memory space and installs breakpoints strategically to detect crashes, context switches, or restore snapshots. Then, wtf handles the rest. 

We were also interested in using wtf to fuzz potentially high-severity targets. Naturally, we investigated using it to fuzz Direct Composition to try and build off the previous research into Direct Composition. 

Challenges in fuzzing Direct Composition 

Direct Composition consists of a kernel component and a userland component. A user can execute system calls to interface with the kernel component. When targeting the kernel, it can be fuzzed using a traditional Windows system call fuzzing with tailored grammar, which was the case the 360 Vulcan Team’s research. 

However, fuzzing the userland component is a bit tricky. Direct Composition-related internal kernel objects, which are created by system calls, are serialized and sent to the userland DWM for processing when the NtDCompositionCommitChannel system call is executed. The fuzzer needs to detect if dwm.exe has crashed while processing the data sent from the kernel. This can be tricky in the perspective of snapshot fuzzing because the fuzzing harness that is executed inside the guest lives in a different address space than that of dwm.exe process. Therefore, wtf can only catch the guest fuzzing harness itself crashing (which is a bug in the fuzzer itself) but not the dwm.exe process. Workaround can be making the guest fuzzing harness monitor the status of dwm.exe after each test case execution (e.g., PID change) and crash itself to notify wtf. However, this is very inefficient, especially when using a slow backend like Bochscpu. 

Another issue is that the context switch happens between the execution of the kernel and userland component, so fuzzing must continue until dwm.exe processes data ignoring context switches that happen. The Bochscpu backend, for example, checks whether the CR3 value has changed and automatically restores the snapshot if it did, ending the test case execution. 

Our fuzzing approach 

We developed a custom fuzzer for Direct Composition using wtf while adding some tricks to overcome the challenges of fuzzing Direct Composition with snapshots. The following diagram explains how our fuzzing approach works using wtf.   

텍스트, 스크린샷, 디자인이(가) 표시된 사진

자동 생성된 설명

Each stage of our custom fuzzer is explained in detail in the following sections. All the components described in the diagram are part of wtf. 

Test case generation and mutation 

Dcgen (Direct Composition generator) implements various code generators that each generate an instruction for our interpreter called dcpreter (Direct Composition interpreter) which maps to a related Direct Composition system call. When wtf requests a new test case, it randomly selects a series of code generators and, as a result, it will have a test case. The test case is interpreted by dcpreter, which is executed inside the guest. 

While executing a code generator, newly generated instruction is interpreted with an abstract interpreter which understands side effects such as a new resource being generated with a handle (should be saved for further reference and must not be reused for creation). This information is saved and later used by code generators to make decisions. The following code is an example of a simple code generator for generating instructions that correlate to NtDCompositionCreateChannel system call. 

First, it adds a new instruction to the current test case and sets the opcode of the instruction as OP_CREATE_CHANNEL (opcode defined in Protocol Buffers file explained in next section) which when interpreted NtDCompositionCreateChannel is called by dcpreter. 

Here is a bit more complex one. 

Generating an instruction for setting integer property on resource marshaler.

This code generator generates an instruction that sets a value to an integer property type randomly chosen that the specific resource marshaler supports. First, it randomly selects a resource handle that should be available when the test case is executed sequentially in the snapshot and gets a pointer to CMarshaler object which stores information about the specific resource type the handle represents. Using this object dcgen knows what integer property types are valid and chooses one. 

Mutation is remarkably like generation. Dcmutate (Direct Composition mutator), instead of generating instructions, first abstract interprets instructions inside the test case sequentially. But in the middle of this process, mutator randomly decides if it mutates the current instruction, it is processing. If it decides to do so, it will mutate that instruction and abstract interpret the mutated instruction instead and continue processing the remaining instructions in the test case. This could include changing the operands of the instruction (arguments to the system call) or replacing the instruction with a completely different type of instruction. 

Mutating an instruction that sets an integer property. 

It retrieves a pointer to an object which represents part of the mutated instruction. By changing the fields referenced from this object, dcmutate mutates the operands in the instruction. In this case, it randomly chooses between three cases where it changes the integer property type and/or the assigning value. 

Serialization and deserialization of test cases 

To pass the test case to the guest, we used Protocol Buffers to serialize and deserialize the test case and defined an “inst.proto” file that models each Direct Composition system call with interpreter instructions. Message formats defined in the file with Protocol Bufferswere used when generating and mutating instructions inside the test case. 

Proto file for defining dcpreter instructions. 

As explained previously, dcpreter runs inside the guest and deserializes the test case using the Protocol Buffers library and starts interpretation. 

Main interpretation loop. 

This big switch statement shows how each instruction is interpreted. For OP_CREATE_CHANNEL, you can see that NtDCompositionCreateChannel is actually called here and saves the handle for the created channel. 

Patching Bochscpu Backend 

When you call NtDCompositionCommitChannel, serialized kernel data structures will be sent to userland DWM for further processing. In the process, context switching can occur, which will stop Bochscpu emulation and restore the snapshot as mentioned in the previous section. If you intend to only fuzz the kernel component of the Direct Composition it is acceptable, but if you want to fuzz the dwm.exe, it is a problem. 

To avoid this, you can change the code in the Bochscpu backend that detects CR3 change and stops the emulation. For our purpose, the current CR3 value must be tracked to know when Bochscpu has switched to the original process context. 

Ignoring CR3 changes. 

In TlbControlHook, instead of restoring the snapshot if the initial CR3 value and new CR3 do not match, we save the new CR3 value and continue with the emulation.  

Ignoring breakpoints when CR3 doesn’t match. 

BeforeExecutionHook checks whether the breakpoint is set on the current instruction pointer and dispatches the breakpoint handler accordingly. Since breakpoints are saved as a virtual address when it has an initial CR3 value, saved breakpoint addresses will be invalid if bochscpu is executing in a different context. Therefore, a check is added to do nothing if the current CR3 doesn’t match the initial CR3 value. 

To reduce edge coverage tracking overhead, if the current CR3 doesn’t match that of when the snapshot was generated, we made a patch to not log the edge coverage to avoid tracking code that is not directly related to fuzzing. 

텍스트, 폰트, 스크린샷이(가) 표시된 사진

자동 생성된 설명
Ignoring edge recording. 

Finally, when restoring the snapshot, the variable that stores the current CR3 value must be restored to the initial CR3. 

Restoring the CR3. 

Injecting dcpreter to dwm.exe 

To catch any user process crashes, breakpoints should be installed on user mode crash handlers. However, if the dcpreter.exe is executed as a separate process and a snapshot is created in that process context, how will wtf catch dwm.exe crashing? 

To solve this problem, dcpreter must be executed in the same process context as dwm.exe. To achieve this, dcpreter is implemented as a DLL and injected into dwm.exe with an injector. Before dcpreter starts to deserialize the test case from memory, a snapshot must be created, and breakpoints need to be installed on user mode crash handlers in the dwm.exe context. Then, breakpoints will be triggered while fuzzing if dwm.exe crashes. 

However, there were additional implementation issues that needed to be solved. Libprotobuf, which is statically linked to dcpreter, uses thread local storage (TLS) internally which makes dcpreter crash when it is injected to dwm.exe. You can define the specific macro GOOGLE_PROTOBUF_NO_THREADLOCAL to disable the use of TLS, but it falls back to using pthread instead, which is not available on Windows. This can be solved by compiling with the Windows port of pthread library called pthread-win32. 

Installing breakpoints 

Wtf has a function named SetupUsermodeCrashDetectionHooks() which will install breakpoints that will be triggered when a userland crash happens. Calling this function at the initialization stage of our custom fuzzer will allow us to detect dwm.exe crashes and immediately restore the snapshot and continue fuzzing after saving the test case. 

Furthermore, you need to make sure that an appropriate breakpoint is installed which marks the end of a single test case execution. If not, the test case always times out, slowing down the overall fuzzing speed. This should be a location inside dwmcore.dll where it finished processing data from the kernel. By looking at call stack of publicly disclosed vulnerabilities, we decided to test the withsetting breakpoint at the end of dwmcore!CKernelTransport::DispatchBatches. 

Miscellaneous 

While testing our fuzzing setup, we noticed that it always crashes at the function named D3D10Warp!ProcessorThreadSpecificData::ExecuteProgram_JIT when it is executed inside the snapshot. We used wtf’s backend SimulateReturnFromFunction() to return from this function without actually doing anything which made it no longer crash in the snapshot. 

Results 

We have fuzzed Direct Composition with our custom fuzzer and successfully caught dwm.exe crashes using generated test cases. 

The size of each test case greatly affects the execution speed, as it relates to the number of system calls executed. The system call requires emulating a large number of machine instructions by Bochscpu that are not directly related to fuzzing Direct Composition. This should be reduced for better performance. There are some unpatched denial-of-service kernel bugs that exist, which are triggered using a small combination of system calls. Therefore, a large test case has a higher probability of terminating the test case early. Although our fuzzer works as intended, there is still a lot of room for improvement. 

Future Research 

Batch processing 

There are some features inside Direct Composition that can only be executed by the so-called “batch command,” such as setting a property value in a kernel object. Batch commands are handled by calling NtDCompositionProcessChannelBatchBuffer. Multiple batch commands can be executed with a single system call which reduces the bottleneck of switching between kernel and userland. For the ease of fuzzer implementation however, only a single batch command is included in every batch processing system call. Supporting multiple batch commands in a single NtDCompositionProcessChannelBatchBuffer call test case generation and mutation will significantly increase the fuzzing speed. 

Targeting race condition 

Since mapped shared memory used in Direct Composition can be manipulated from the kernel and userland, Direct Composition can be vulnerable to race conditions, which was the case with some of the vulnerabilities found by Hillstone Networks researchers. Although our fuzzer can generate such test cases, it will be interesting to aid the fuzzer in generating a test case that specifically targets race conditions. 

Faster and better mutation 

Although basic mutation is already available, it can be improved. For example, some resource types take structured data as property value and how it is generated and mutated affects the quality of fuzzing. Because there are so many resource types, there is more work to be done to improve the correctness. Furthermore, the mutation itself is quite heavy since each instruction in the test case is abstractly interpreted again. Reducing the need for abstract interpreting every single instruction will also greatly improve overall performance. 

Maintenance 

When a new Windows build is released, there is a chance that the Direct Composition component will be updated. It is a tedious job to check which resource type was removed or added etc. Automatic extraction of available resource types and their properties can greatly help maintain our fuzzer. There are currently more than 200 types of resource marshalers and manual maintenance requires a lot of work. Using scripting on popular disassemblers may help with this research.