Monday, February 18, 2019

JavaScript bridge makes malware analysis with WinDbg easier

Introduction

As malware researchers, we spend several days a week debugging malware in order to learn more about it. We have several powerful and popular user mode tools to choose from, such as OllyDbg, x64dbg, IDA Pro and Immunity Debugger.

All these debuggers utilize some scripting language to automate tasks, such as Python or proprietary languages like OllyScript. When it comes to analyzing in kernel mode, there is really one option: Windows debugging engine and its interfaces CDB, NTSD, KD and WinDbg.

Unfortunately, even if WinDbg is the most user-friendly of the bunch, it is widely considered as one of the least user-friendly debuggers in the world.

The learning curve for WinDbg commands is quite steep, as it combines an unintuitive and often conflicting command syntax with an outdated user interface. Adding the traditional WinDbg scripting language to this equation does not make things easier for the user as it creates an additional layer of complexity by introducing its own idiosyncrasies.

Thankfully, there's a new WinDbg preview for Windows 10 that brings it in line with modern programming environments. This preview includes a new JavaScript engine and an exposed debugging data model through a set of JavaScript objects and functions.

These new features bring WinDbg in line with modern programming environments such as Visual Studio, using already familiar elements of the user interface. In this post, we'll go over this new version of WinDbg's debugger data model and its new interface with JavaScript and dx commands.

Debugger data model

The debugger data model is an extensible object model that allows debugger extensions, as well as the WinDbg, user interface to access a number of internal debugger objects through a consistent interface.

The objects relevant for malware analysis exposed through the data model are:
  • Debugging sessions
  • Processes
  • Process environment (ex. Peb and Teb)
  • Threads
  • Modules
  • Stack frames
  • Handles
  • Devices
  • Code (disassembler)
  • File system
  • Debugger control
  • Debugger variables
  • Pseudo registers


dx display expression

All the above types of objects are exposed through a new command dx (display debugger object model expression), which can be used to access objects and evaluate expressions using a C++ like syntax, in a simpler and more consistent way than the one exposed through somewhat confusing mix of the MASM and the C++ expression evaluators. Thanks to the addition of the NatVis functionality to WinDbg, the results of dx command are displayed in a much more user friendly way using intuitive formatting with DML as a default output.

The starting point for exploring the dx command is simply to type dx Debugger in the WinDbg command window, which will show the top level namespaces in the exposed data model. Those four namespaces are Sessions, Settings, State and Utility. DML generates output using hyperlinks, allowing the user to drill down into the individual namespaces simply by clicking on them. For example, by clicking on the Sessions hyperlink, the command dx -r1 Debugger.Sessions will be executed and its results displayed.

Drilling down from the top-level namespaces to processes

If we go a couple of layers further down, which can also be controlled with the -r dx command option, we will get to the list of all processes and their properties, including the _EPROCESS kernel object fields exposed as the member KernelObject of a Process debugger object. Users of earlier WinDbg versions will certainly appreciate the new ease of investigation available through the dx command.

The dx command also supports tab completion, which makes navigating the data model even easier and allows the user to learn about the operating system and WinDbg internals such as debugger variables and pseudo-registers. For example, to iterate through the list of internal debugger variables you can type dx @$ and then repeatedly press the tab keyboard key, which will cycle through all defined pseudo-registers, starting from $argreg.

Pseudo-registers and internal variables are useful if we want to avoid typing full object paths after the dx command. Instead of Debugger.Sessions[0] you can simply use the pseudo-register @$cursession, which points to the current session data model object. If you need to work with the current process you can simply type dx @$curprocess instead of the longer dx Debugger.Sessions[0].Process[procid].


Linq queries

Linq (Language Integrated Query) is an already familiar concept for .NET software engineers that allows the user to create SQL-like queries over the object collections exposed through the dx command.

There are two syntaxes available for creating Linq expressions for normal .NET development, but WinDbg, through the dx command, only supports creating queries using the Lambda expression syntax. Linq queries allow us to slice and dice the collection objects and extract the pieces of information we are interested in displaying.

The Linq function "Where" allows us to select only those objects which satisfy a condition specified by the Lambda expression argument supplied as the function argument. For example, to display only processes which have the string "Google" in the name, we can type:

dx @$cursession.Processes.Where(p => p.Name.Contains("Google"))

Just like in SQL, the "Select" function allows us to choose which members of an object in the collection we would like to display. For example, for the processes we already filtered using the "Where" function, we can use "Select" to retrieve only the process name and its ID:

dx -r2 @$cursession.Processes.Where(p => p.Name.Contains("Google")).Select(p => New { Name=p.Name, Id=p.Id })

Going one level deeper, into the exposed _EPROCESS kernel object, we can choose to display a subset of handles owned by the process under observation. For example, one of the methods to find processes hidden by a user mode rootkit is to enumerate process handles of the Windows client server subsystem process (csrss.exe) and compare that list with a list generated using a standard process enumeration command.

Before we list processes created by csrss.exe, we need to find the csrss.exe process(es) objects and once we find them, switch into their context:

dx @$cursession.Processes.Where(p => p.Name.Contains("csrss.exe"))[pid].SwitchTo()

We can now run a Linq query to display the paths to the main module of the processes present in the csrss.exe handle table:

dx @$curprocess.Io.Handles.Where(h => h.Type.Contains("Process")).Select(h => h.Object.UnderlyingObject.SeAuditProcessCreationInfo.ImageFileName->Name)

Since ImageFileName is a pointer to a structure of the type _OBJECT_NAME_INFORMATION, we need to use the arrow to dereference it and access the "Name" fields containing the module path.

There are many other useful Linq queries. For example, users can order the displayed results based on some criteria, which is similar to the Order By SQL clause, or count the results of the query using the "Count" function. Linq queries can also be used in the JavaScript extension, but their syntax is once again slightly different. We will show an example of using Linq within JavaScript later in the blog post.

WinDbg and JavaScript

Now that we've covered the basics of the debugger data model and the dx command to explore it, we can move on to the JavaScript extension for WinDbg. Jsprovider.dll is a native WinDbg extension allowing the user to script WinDbg and access the data model using a version of Microsoft's Chakra JavaScript engine. The extension is not loaded by default into the WinDbg process space — it must be done manually. This avoids potential clashes with other JavaScript-based extensions.

Jsprovider is loaded using the standard command for loading extensions:

.load jsprovider.dll

While this post discusses conventional scripts a threat researcher may create while analysing a malware sample, it is worth mentioning that the JavaScript extension also allows developers to create WinDbg extensions that feel just as existing binary extensions. More information about creating JavaScript-based extensions can be found by investigating one of the extensions provided through the official GitHub repository of WinDbg JavaScript examples.

WinDbg Preview contains a fully functional Integrated Development Environment (IDE) for writing JavaScript code, allowing the developer to refactor their code while debugging a live program or investigating a memory dump.

The following WinDbg commands are used to load and run JavaScript based scripts. The good news is that the commands for handling JavaScript-based scripts are more intuitive compared to the awkward standard syntax for managing WinDbg scripts:

  • .scriptload command loads a JavaScript script or an extension into WinDbg but it does not execute it.
  • .scriptrun runs the loaded script.
  • .scriptunload unloads the script from WinDbg and from the debugger data model namespace.
  • .scriptlist lists all currently loaded scripts.


JavaScript entry points

Depending on the script command used to load the script, the JavaScript provider will call one of the predefined user script entry points or execute the code in the script root level.

From the point of view of a threat researcher, there are two main entry points. The first is a kind of a script constructor function named initializeScript, called by the provider when the .scriptload command is executed. The function is usually called to initialize global variables, and define constants, structures and objects.

The objects defined within the initializeScript function will be bridged into the debugger data model namespaces using the functions host.namespacePropertyParent and host.namedModelParent. The bridged objects can be investigated using the dx command as any other native object in the data model.

The second, and even more important entry point is the function invokeScript, an equivalent of the C function main. This function is called when the user executes the .scriptrun WinDbg command.


Useful tricks for JavaScript exploration

Now we will assume that we have a script named "myutils.js" where we keep a set of functions we regularly use in our day-to-day research. First, we need to load the script using the .scriptload function.

Loading script functions from the user's Desktop folder

WinDbg JavaScript modules and namespaces

The main JavaScript object we use to interact with the debugger is the host object. If we are using WinDbg Preview script editor, the Intellisense tab completion and function documentation feature will help us with learning the names of the available functions and members.

IntelliSense in action

If we just want to experiment, we can put our code into the invokeScript function which will get called every time we execute the script. Once we are happy with the code, we can refactor it and define our own set of functions.

Before we dig deeper into the functionality exposed through the JavaScript interface, it is recommended to create two essential helper functions for displaying text on the screen and for interacting with the debugger using standard WinDbg commands.

They will be helpful for interaction with the user and for creating workarounds around some functionality that is not yet natively present in JavaScript, but we would need it for debugging.

In this example, we named these functions logme and exec. They are more or less just wrappers around the JavaScript functions with the added advantage that we don't need to type the full namespace hierarchy in order to reach them.

Helper functions wrapping parts of the JavaScript WinDbg API

In the function exec, we see that by referencing the host.namespace.Debugger namespace, we are able to access the same object hierarchy through JavaScript as we would with the dx command from the WinDbg command line.

The ExecuteCommand function executes any of the known WinDbg commands and returns the result in a plain text format which we can parse to obtain the required results. This approach is not much different to the approach available in the popular Python based WinDbg extension pykd. However, the advantage of Jsprovider over pykd is that most of the JavaScript extension functions return JavaScript objects thatdo not require any additional parsing in order to be used for scripting.

For example, we can iterate over a collection of process modules by accessing host.currentProcess.Modules iterable. Each member of the iterable array is an object of class Module and we can display its properties, in this case the name.

It is worth noting that Intellisense is not always able to display all members of a JavaScript object and that is when the for-in loop statement can be very useful. This loop allows us to iterate through names of all the object members which we can print to help during exploration and development.


Displaying the members of a Module object

On the other hand, the for-of loop statement iterates through all members of an iterable object and returns their values. It is important to remember distinction between these two for loop forms.

Printing list of modules loaded into the current process space

We can also fetch a list of loaded modules by iterating through the Process Environment Block (PEB) linked list of loaded modules although this requires more preparation to convert the linked list into a collection by calling the JavaScript function host.namespace.Debugger.Utility.Collections.FromListEntry. Here is a full listing of a function which converts the linked list of loaded modules into a JavaScript array of modules and displays their properties.

function ListProcessModulesPEB (){

//Iterate through a list of Loaded modules in PEB using FromListEntry utility function        

    for (var entry of host.namespace.Debugger.Utility.Collections.FromListEntry(host.currentProcess.KernelObject.Peb.Ldr.InLoadOrderModuleList, "nt!_LIST_ENTRY", "Flink")) {

//create a new typed object using a _LIST_ENTRY address and make it into _LDR_TABLE_ENTRY

    var loaderdata=host.createTypedObject(entry.address,"nt","_LDR_DATA_TABLE_ENTRY");

//print the module name and its virtual address

    logme("Module "+host.memory.readWideString(loaderdata.FullDllName.Buffer)+" at "+ loaderdata.DllBase.address.toString(16) + " Size: "+loaderdata.SizeOfImage.toString(16));

    }
}

This function contains the code to read values from process memory, by accessing the host.memory namespace and calling one of the functions readMemoryValues, readString or readWideString, depending on the type of data we need to read.

JavaScript 53-bit integer width limitation

Although programming WinDbg using JavaScript is relatively simple compared to standard WinDbg scripts, we need to be aware of few facts that may cause a few headaches. The first is the fact that the width of JavaScript integers is limited to 53 bits, which may cause some issues when working with native, 64-bit values. For that reason, the JavaScript extension has a special class host.Int64 whose constructor needs to be called when we want to work with 64-bit numbers. Luckily, the interpreter will warn us when a 53-bit overflow can occur.

A host.Int64 object has a number of functions that allow us to execute arithmetic and bitwise operations on it. When trying to create a function to iterate through an array of callbacks registered using the PspCreateProcessNotifyRoutine function shown later in the post, I was not able to find a way to apply a 64-bit wide And bitmask. The masking function seemed to revert back to the 53-bit width, which would create an overflow if the mask was wider than 53 bits.


Masking a host.Int64 with a 53-bit And mask yields a correct result and incorrect if wider

Luckily, there are functions GetLowPart and GetHighPart, which respectively return lower or upper 32 bits of a 64-bit integer. This allows us to apply the And mask we need and get back the required 64-bit value by shifting the higher 32-bit value to the left by 32 and adding the lower 32 bits to it.

The 53-bit limitation for WinDbg JavaScript implementation is an annoyance and it would be very welcome if WinDbg team could find a way to overcome it and support 64 bit numbers without resorting to the special JavaScript class.

Linq in JavaScript

We have already seen how Linq queries can be used to access a subset of debugger data model objects and their members using the dx commands.

However, their syntax in JavaScript is slightly different and it requires the user to supply either an expression that returns a required data type or supply an anonymous function as an argument to a Linq verb function call returning the required data type. For example, for the "Where" Linq clause, the returned value has to be a boolean type. For the "Select" clause, we need to supply a member of an object we would like to select or a new anonymous object composed of a subset of the queried object members.

Here is a simple example using Linq functions filtering a list of modules to display only those modules whose name contains the string "dll" and selects only the module name and its base address to display.

function ListProcessModules(){

//An example on how to use LINQ queries in JavaScript
//Instead of a Lambda expression supply a function which returns a boolean for Where clause or

let mods=host.currentProcess.Modules.Where(function (k) {return k.Name.includes("dll")})

//a new object with selected members of an object we are looking at (in this case a Module)

.Select(function (k) {return  { name: k.Name, adder:k.BaseAddress} });

    for (var lk of mods) {

      logme(lk.name+" at "+lk.adder.toString(16));

    }

}


Inspecting operating system structures

A good starting point for getting the kernel functions and structures addresses is the function host.getModuleSymbolAddress.If we need the actual value stored in the retrieved symbol, we need to dereference the address using host.memory.readMemoryValues function or the dereference function for a single value.

Here is an example enumerating callbacks registered using the documented PspCreateProcessNotifyRoutine kernel function that registers driver functions which will be notified every time a process is created or terminated. This is also used by kernel mode malware, for hiding processes or for preventing user mode modules of the malware from termination.

The example in the post is inspired by the C code for enumerating callbacks implemented in the SwishDbgExt extension developed by Matthieu Suiche. This WinDbg extension is very useful for analysing systems infected by kernel mode malware, as well as kernel memory dumps.

The code shows that even more complex functions can be relatively easily implemented using JavaScript. In fact, development using JavaScript is ideal for malware researchers as writing code, testing and analysis can be all be performed in parallel using the WinDbg Preview IDE.

function ListProcessCreateCallbacks() {

PspCreateNotifyRoutinePointer=host.getModuleSymbolAddress("ntkrnlmp","PspCreateProcessNotifyRoutine");
let PspCreateNotify=host.memory.readMemoryValues(PspCreateNotifyRoutinePointer,1,8);
let PspCallbackCount=host.memory.readMemoryValues(host.getModuleSymbolAddress("ntkrnlmp","PspCreateProcessNotifyRoutineCount"),1,4);
logme ("There are "+PspCallbackCount.toString()+" PspCreateProcessNotify callbacks");

for (let i = 0; i<PspCallbackCount;i++){

    let CallbackRoutineBlock=host.memory.readMemoryValues(PspCreateNotifyRoutinePointer.add(i * 8),1,8);
    let CallbackRoutineBlock64=host.Int64(CallbackRoutineBlock[0]);
    
    //A workaround seems to be required here to bitwise mask the lowest 4 bits,
    //Here we have:
    //Get lower 32 bits of the address we need to mask and mask it to get
    //lower 32 bits of the pointer to the _EX_CALLBACK_ROUTINE_BLOCK (undocumented structure known in ReactOS)
    
    let LowCallback=host.Int64(CallbackRoutineBlock64.getLowPart()).bitwiseAnd(0xfffffff0);
    
    //Get upper 32 bits of the address we need to mask and shift it left to create a 64 bit value
    let HighCallback=host.Int64(CallbackRoutineBlock64.getHighPart()).bitwiseShiftLeft(32);

    //Add the two values to get the address of the i-th _EX_CALLBACK_ROUTINE_BLOCK
    let ExBlock=HighCallback.add(LowCallback);
   
    //finally jump over the first member of the structure (quadword) to read the address of the callback
    let Callback=host.memory.readMemoryValues(ExBlock.add(8),1,8);

    //use the .printf trick to resolve the symbol and print the callback
    let rez=host.namespace.Debugger.Utility.Control.ExecuteCommand(".printf \"%y\n\", " + Callback.toString());

    //print the function name using the first line of the response of .printf command
    logme("Callback "+i+" at "+Callback.toString()+" is "+rez[0]);
  }
}
Here we see the manipulation of the 64-bit address mentioned above. We split a 64-bit value into upper and lower 32 bits and apply the bitmask separately to avoid a 53-bit JavaScript integer overflow.

Another interesting point is the use of the standard debugger command .printf to do a reverse symbol resolution. Although the JavaScript function host.getModuleSymbolAddress allows us to get the address of the required symbol, as of writing this blog post there are no functions which allow us to get the symbol name from an address. That is why the workaround .printf is used with the %y format specifier which returns a string containing the name of the specified symbol.

Debugging the debugging scripts

Developers of scripts in any popular language know that for successful development, the developer also requires a set of tools that will allow debugging. The debugger needs to be able to set breakpoints and inspect values of variables and objects. This is also required when we are writing scripts that need to access various operating system structures or to analyse malware samples. Once again, the WinDbg JavaScript extension delivers the required functionality in the form of a debugging tool whose commands will be very familiar to all regular WinDbg users.

The debugger is launched by executing the command .scriptdebug, which prepares the JavaScript debugger for debugging a specific script. Once the debugger has loaded the script, have an option to choose events which will cause the debugger to stop as well as set breakpoints on specific lines of script code.

The command sxe within the JavaScript debugger is used, just as in WinDbg, to define after which events the debugger will break. For example, to break on the first executed line of a script we simply type sxe en. Once the command has successfully executed we can inspect the status of all available events by using the command sx.

Sx shows JavaScript debugger breaking status for various exceptions

Now, we also have an opportunity to specify the line of the script where the breakpoint should be set using the command bp, just as in standard WinDbg syntax. To set a breakpoint, the user needs to specify a line number together with the position on the line, for example bp 77:0. If the specified line position is 0, the debugger automatically sets the breakpoint on the first possible position on the line which helps us to avoid counting the required breakpoint positions.


Setting a breakpoint on line position 0 sets it on the first possible position

Now that we have set up all the required breakpoints we have to exit the debugger, which is a slightly unintuitive step. The debugging process continues after calling the script either by accessing the WinDbg variable @$scriptContents and calling any of the functions of the script we wish to debug or by launching the script using .scriptrun as usual. Naturally, the @$scriptContents variable is accessed using the dx command.

Scripts can be launched for debugging using the @$scriptContents variable

The debugger contains its own JavaScript evaluator command ??, which allows us to evaluate JavaScript expressions and inspect values of the script variables and objects.



Commands ? or ?? are used to inspect display result of JavaScript expressions .

JavaScript debugging is a powerful tool required for proper development. Although its function is already sufficient in early JavaScript extension versions, we hope that its function will become richer and more stable over time, as WinDbg Preview moves closer to its full release.

Conclusion

We hope that this post provided you with few pointers to functionality useful for malware analysis available through the official Microsoft JavaScript WinDbg extension. Although the API exposed through JavaScript is not complete, there are usually ways to work around the limitations by wrapping standard WinDbg commands and parsing their output. This solution is not ideal and we hope that new functionality will be added directly to the JavaScript provider to make the scripting experience even more user friendly.

The Debugging Tools for Windows development team seems to be committed to adding new JavaScript modules as was recently demonstrated through the addition of the file system interaction and the Code namespace module which open a whole new set of possibilities for code analysis we may be able to cover in one of our next posts. Interested readers are invited to check out the CodeFlow JavaScript extension made available through the official examples repository on Github.

If you would like to learn a few more tips on malware analysis using WinDbg and JavaScript Cisco Talos will be presenting a session at the CARO Workshop in Copenhagen in May.

References



No comments:

Post a Comment