Tuesday, April 29, 2014

Internet Explorer & Adobe Flash 0-Day Coverage

Recently several "0day" releases have come out in the security world, and the VRT has released coverage for two critical vulnerabilities, so we wanted to notify you of this coverage so you can use the SIDs to protect your environment.

Microsoft Internet Explorer 0day CVE-2014-1776.
SIDs 30794 & 30803
https://technet.microsoft.com/en-US/library/security/2963983

Adobe Flash 0day CVE-2014-0515
SIDs 30876 & 30877
http://helpx.adobe.com/security/products/flash-player/apsb14-13.html

Coverage for both of these vulnerabilities were released yesterday, April 28, 2014.  The latest rule pack will provide the updates for both of these vulnerabilities.

http://blog.snort.org/2014/04/sourcefire-vrt-certified-snort-rules_7339.html
http://blog.snort.org/2014/04/sourcefire-vrt-certified-snort-rules_28.html

Tuesday, April 22, 2014

Snake Campaign: A few words about the Uroburos Rootkit

Over the past few days, analyzing the new Uroburos (aka Turla) rootkit has been exciting. That's because the sample dropper (MD5: a86ac0ad1f8928e8d4e1b728448f54f9) includes a lot of clever features. We don’t want to rehash research already publicly available, but we will expand on some features that have not been covered in previous publications (like the driver loading strategy and the main dropper architecture).

The dropper is compressed with a simple packer that uses integer math, such a bit shifting, unsigned multiplication, and so on, to perform data decryption. At the end of the decryption routine, we end up with a jmp ebxopcode. The jump leads to a copy stub routine that replaces the original bytes of the executable:

Uroburus - 1.jpg
Figure 1. The simple Uroburos packer and data copy routine

The unpacked code first disables all possible error reporting windows from popping up by using the SetErrorMode Windows API function. The binary then checks the version of the operating system, even if the process is running in WOW64 mode. Arguments passed to the binary at execution time are checked as well: if any of the arguments is the string up, an auto-destruction routine is executed and all Uroburos files found on disk from possible previous runs are deleted. The dropper even checks for another instance of Uroburos running in memory on the target system by trying to open the following 3 mutexes:

  • "{E9B1E207-B513-4cfc-86BE-6D6004E5CB9C}" - Local setup mutex
  • “{B93DFED5-9A3B-459b-A617-59FD9FAD693E}” - Global Uroburos setup mutex
  • "shell.{F21EDC09-85D3-4eb9-915F-1AFA2FF28153}" - Global still unknown mutex

If any of these mutexes is found, the executable terminates the setup process.

Otherwise, it prepares all data structures needed for all its inter-module communication.

BypassDSEAndLoadVirusDrvis the name of the key routine of the Uroburos dropper. Its final goal is to load the Uroburos rootkit driver, and this is accomplished in different ways depending on the target's operating system. We will provide an in-depth analysis of how this is done later on. After the rootkit driver is loaded, a function in an user-mode module of the dropper called format_ntfs_Win32, and identified within the binary as resource 4000 is used to format its virtual volume, which is accessible via the device \\.\Par1. As mentioned, the entire code responsible for formatting the virtual volume is written in user-mode. The malware authors interestingly decided not to use built-in low-level Windows formatting functions. The virtual volume is backed by a file called fixdata.dat found in the main directory of Uroburos. This directory is called $NtUninstallQXXXXXX$ (where the letters “XXXXXX” are 6 random numbers), is located under Windows root path, and is hidden by the kernel mode driver. The encrypted configuration file, found in the dropper as resource 103 is extracted in a file called system in the virtual volume. Finally the dropper is copied to a file called fdisk_mon.exe located in the main path for Uroburs, and its corresponding system service named ultra3 is installed. This ensures the piece of malware survives a system reboot.

            Main Path: %systemroot%\$NtUninstallQxxxxxx$

              fdisk.sys - Main Rootkit driver

              fdisk_mon.exe - Packed dropper executed as service

              fixdata.dat - Virtual File systems file

Between this and upcoming blog posts, we will go over 3 major features found in Uroburos, which are the:

  • Kernel mode driver setup strategies
  • Patchguard disarming code
  • Virtual File System
Uroburus - 3.gif
Figure 2. A snapshot of the Virtual Volume content. Noteworthy: the “klog” file, which contains the data captured by the keylogger, and the “system” file, which is the Uroburos configuration file

Uroburos Dropper Architecture - Modules communication

We believe that to facilitate an in-depth understanding of the specific features of Uroburos, we should go over the dropper's architecture. All Uroburos modules are DLLs embedded in the resource directory of the main dropper. As needed, the dropper gets a pointer to the target module located in resource directory (using the Windows API functions FindResource and LockResource), and starts processing it: the VirusLoadDll routine takes the module resource buffer pointer as input, allocates a chunk of memory big as target PE virtual size, and then proceeds with the needed IAT resolution, relocations and fix-ups. At the end, the Uroburos main dropper has correctly loaded the DLL module in its address space. We can pinpoint that each of its resource modules is composed as follows:

  • DllEntryPoint implements the unpacking routine and a simple function that saves the DLL base address to a global variable
  • ee, anexported function that performs the actual module job
Uroburus - 2.gif
Figure 3. A snap of the simple Dll Entry point of a Uroburos module

The routine ee is called with 3 parameters: a synchronization routine pointer, that resides in the main dropper; 2 custom parameters that usually point to the Uroburos driver buffer and its size. These last 2 parameters are needed for the exploit execution.

As the name implies, the “synchronization” routine initializes all synch data structures and an array of global functions pointer that could be called from the external module. In this way, the external library can always call the main dropper's internal routines. As we proceeded with the analysis, we saw that some modules were only a wrapper to some of the main dropper’s internal functions.

In summary, we have identified the following Snake (another name of Uroburos rootkit) modules:

  1. A 32-bit and 64-bit driver (resource number 101 and 161)
  2. A configuration file extracted and saved in the virtual volume system(resource number 103)
  3. ms09_025_Win32(resource number 1000), which exploits vulnerability CVE-2009-1123 in order to execute kernel-mode code (and automatically escalate privileges)
  4. ms10_015_Win32(resource number 2000), which exploit vulnerability CVE-2010-0232 in order to escalate privileges and gain access to the SYSTEM account
  5. vboxdrv_Win32 which is perhaps the most interesting one: it disables Windows 7 x64 Driver Signing Enforcement (DSE) by exploiting a bug in the VirtualBox signed driver. This module requires an in-depth analysis that we will cover afterwards
  6. format_ntfs_Win32, whichimplements virtual volume initialization and NTFS formatting code.

Loading the driver

All the work needed to properly load the unsigned driver is managed by the BypassDSEAndLoadVirusDrv function. This function tries to extract and run the ms09_025_Win32module with the aim of triggering exploiting CVE-2009-1123 and divert the kernel code execution to a custom routine that automatically escalates privileges and directly loads the already mapped driver. If the target OS isn't vulnerable to CVE-2009-1123, or if the exploit has failed, it tries to extract and run the modulems10_015_Win32, but this time with the goal of only obtaining SYSTEM privileges. By checking the following registry key (requesting KEY_SET_VALUE access right) it can determine if the attempt to escalate privileges was successful:

  HKLM\Software\Microsoft\Windows Nt\CurrentVersion\Windows

This key can only be written to by a user or process with 'administrator' privileges. If the main dropper is still unable to open this key, the procedure exits and the entire setup process is aborted.

The function CreateTurlaPathAndExtractDrvis called to create the directory $NtUninstallQXXXXXX$in root directory of the operating system, and extract the kernel driver into a file called fdisk.sys. If the OS is 32-bit, the dropper creates and populates the registry key HKCR\Ultra3and executes the unsigned driver sengoku_x64 (the main “Uroburos” driver) by calling the ZwLoadDriver native API function. This method bypasses the standard Windows Service control manager.

Alternatively, if the OS is 64-bit, which means that it implements DSE and Patchguard, the vboxdrv_Win32module is extracted and executed. This DLL might be the most interesting from an analysis point of view.

The VBoxdrv module

The DLL starts by calling 2 functions located in the main dropper, with the goal of extracting and starting the signed VirtualBox driver named “sbhub.sys. Once the VirtualBox driver has been successfully started, pxinsi64.ex” (the executable that implements the user-mode part of the “VirtualBox exploit") is extracted from the module and executed using the CreateProcess API function. The VBoxDrv module now waits for the spawned process to complete execution.

This spawned 64-bit process first tries to open the VirtualBox device \\.\VBoxDrvand, if successful, calls the function GetDseSymbolPtrto get the address of the kernel DSE variable g_ciEnabled. If pxinsi64.exe can't open the VirtualBox device, it immediately terminates. In fact, if the VirtualBox driver has not started correctly, Uroburos is not able to load an unsigned driver in x64 environments.

The function GetDseSymbolPtr warrants a closer look. I provide here the pseudo code:

  NTSTATUS GetDseSymbolPtr (LPVOID * pCiEnableVa) {
    DWORD dwJmpCiIatRva = 0;  // “JMP cs:_imp_CiInitialize” RVA

    // … Get needed buffer size …
    CALL ZwQuerySystemInformation(SystemModuleInformation, lpSysModInfo, 0, &buffSize);
    for (i = 0; i < lpSysModInfo.NumModules; i++) {
      OPEN kernel sys file directly from Disk and map // OpenReadAndRelocModule virus routine
      Analyse on-disk module Import Table, find “CiInitialize” imported name
      if (IAT_Symbol not found)
        continue;  // goto next module

      for (offset = 0; offset < curModule.size; offset++) {
        curByte = curModuleBuff[offset];

        // resolve “CiInitializeStub” routine address searching for “JMP _imp_CiInitialize” opcode
        if ((curByte == JMP FAR opcode) &&
          (JMP FAR offset == “CiInitialize” IAT entry))
          Save this RVA in dwJmpCiIatRva

        if (((curByte == CALL FAR opcode) &&
          (CALL FAR offset == dwJmpCiIatRva))
          // Go backward and search “MOV CS:g_ciEnabled, 1”
          while (offset > 0) {
            curByte = curModuleBuff[offset];
            if (curByte == “MOV CS:REL32, imm8” opcode &&
              sourceOperand == 1)
              Resolve destination REL32 operand and return it.
              This is the “g_ciEnabled” address
        }
      }
    }
  }

Strictly speaking, the algorithm resolves the CiInitializeStub stub function address, then tries to reach the CALL CiInitializeStub instruction located in the SepinitializeCodeIntegrity Nt kernel internal routine. This routine is the one responsible for initializing the Driver Signing Enforcement when the system boots up. When the Uroburos code locates this CALL, it proceeds to search backward for the mov cs:REL32, 1 opcode, and, if it finds it, resolves REL32 destination operand address. This symbol is the g_ciEnabled DSE Kernel variable.

Uroburus - 4.gif
Figure 4. A snap of searched Driver Signing Enforcement code

At this point, pxinsi64.execan exploit the VirtualBox driver, by calling the Windows API function DeviceIoControl with the SUP_IOCTL_FAST_DO_NOP control code, as explained here. However, before triggering the exploit, pxinsi64.exe prepares the VirtualBox device, sending the following input/output controls, also known as IOCTLs: SUP_IOCTL_COOKIE, SUP_IOCTL_LDR_OPEN, SUP_IOCTL_LDR_LOAD. This is important, because the supdrvIOCtlFastinternal VirtualBox driver function, should return 0, and not an error code. The Write What Where conditions should indeed update the value of the g_ciEnabled variable with the value 0.

If all goes well, the Windows Driver Signature Enforcement protection is disabled and pxinsi64.exeexits with the error code 0. Otherwise, it terminates with a different error code.

The VboxDrv module wakes up and deletes the 2 extracted files (now no longer needed): the exploit executable pxinsi64.exe, and the bugged VirtualBox driver usbhub.sys. It finally exits. The main Uroburos dropper can now load and start its infection driver in the same manner as it does for 32-bit systems.

Conclusion

In this brief analysis, we provided an overview of the architecture of the Uroburos rootkit. Uroburos made use of a lot of clever tricks. We also provided an in-depth description of how Uroburos bypasses Driver Signature Enforcement (DSE).

In upcoming blog posts, we'll cover Uroburos':

  • code to bypass Patchguard
  • Virtual file system

Uroburos seems to have been put together with a lot of care. Interestingly, the packer used with the dropper doesn't seem to be as sophisticated as the rest of the techniques that are employed...

One last question remains: does the DSE bypass technique work on Windows 8 and/or Windows 8.1? The answer is no. As a matter of fact, if the host OS is a 64-bit version of Windows 8 or Windows 8.1, the VBoxDrv module fails to run and the entire setup process is aborted. DSE and Pathguard are implemented in a different way in Windows 8 and Windows 8.1. In upcoming blog posts we will look into the how in DSE and Patchguard are implemented differently between Windows 7 and Windows 8, and whether exploit mitigation techniques available on Windows 7 can be bypassed in Windows 8.

Stay tuned!

Past papers/bogs on Uroburos/Turla:

Friday, April 18, 2014

Heartbleed for OpenVPN

Core to the VRT's mission is challenging the general intrusion detection industry's view of "adequate" vulnerability coverage. One way we do this is to seek out new attack vectors for critical vulnerabilities the industry may have overlooked and take the initiative to write the proof of concept code and detection for aspects of a vulnerability that others might have missed. You no doubt have heard by now about the Heartbleed vulnerability and its implications for HTTPS servers that run the vulnerable versions of OpenSSL. Something not discussed enough is its implications for services running on protocols other than HTTP that also rely on OpenSSL. One such case is OpenVPN.

The OpenVPN protocol encapsulates the SSL/TLS session used for authentication, key exchange, and data tunneling in order to provide the reliable transport layer the SSL/TLS session needs, (since OpenVPN is often run over UDP). One improvement, and challenge to exploitation, that OpenVPN provides over vanilla TLS is that it supports optional HMAC signing of OpenVPN messages using a passphrase or key. This is a challenge to the attacker because not only do you need to properly encapsulate your malicious heartbeat message, you also (in cases where the server requires message signing) have to sign it with a valid HMAC. It is important to note that HMAC signing does not prevent the OpenVPN server from being vulnerable, as it is still possible to leak memory using HMAC signing if you have the passphrase or key. Unfortunately many OpenVPN servers have this feature disabled and are vulnerable to memory disclosure without authentication. If you are running an OpenVPN server, it is strongly recommended that you upgrade to the latest version of OpenSSL and enable HMAC signing of OpenVPN messages.

The VRT has developed working Heartbleed exploits for OpenVPN running over TCP and UDP. Detection for this vulnerability includes coverage for servers running over TCP and UDP with HMAC signing and without HMAC signing in SIDs 30711 through 30742.

Thursday, April 10, 2014

Performing the Heartbleed Attack After the TLS Handshake

Over the past several days, many IPS rules for detecting the Heartbleed attack have been suggested that attempt to compare the TLS message size to the heartbeat message size.  This method works with most of the Proof-of-Concept attacks out there, which perform the Heartbleed attack before the TLS handshake has occurred.  Performing the attack before the TLS handshake results in both the attack and response data being sent in plaintext.  However, if a TLS handshake is performed first, all heartbeat data is encrypted, meaning that this type of detection comparing ciphertext (encrypted data) with the unencrypted TLS message size will not work.  This will almost always result in a false positive as chances are high that the encrypted data will appear to be a larger value than the TLS message size.  Adding to the challenge is the fact that there is nothing explicit within the heartbeat request nor the heartbeat response that indicates the heartbeat data is encrypted.

Our detection from the beginning has always ignored the heartbeat message data itself to avoid false positives arising from using ciphertext as if it was readable on the wire.  Instead, we only use the unencrypted values within the TLS header.

Monday night, before Heartbleed really hit the news and public exploit code became available, the VRT created a proof-of-concept to demonstrate the Heartbleed bug by analyzing the openssl-1.0.1f code and modifying it to send malicious heartbeats and dump out the response to view the exposed data.  By using this approach, the heartbeat request is sent after the TLS handshake, resulting in encrypted payloads.  It turns out that by using our own exploit as the basis for detection, we were able to avoid the mistakes made by some others that will result in false positives against legitimate traffic since we never made the assumption that we could read the heartbeat message size.

t1_lib.c.diff is a patch to the openssl-1.0.1f source tree that implements the Heartbleed attack, after the TLS handshake has occurred.  Steps to create the PoC are as follows --

$ wget https://labs.snort.org/files/t1_lib.c.diff
$ wget http://www.openssl.org/source/openssl-1.0.1f.tar.gz
$ tar -zxf openssl-1.0.1f.tar.gz
$ cd openssl-1.0.1f
$ patch -p0 < ../t1_lib.c.diff
$ ./config no-shared no-idea no-mdc2 no-rc5 zlib enable-tlsext no-ssl2 && make depend && make
$ apps/openssl s_client -tlsextdebug -connect <victim_server>:443


Once you connect, type 'B' to trigger a heartbeat then 'Q' to quit.  You can send a few heartbeats per session if you want.  At this point, many servers out there have disabled heartbeat support so don't be alarmed if you receive "peer does not accept heartbearts."  This is a good thing!

We detect Heartbleed attacks whether they are encrypted or not by using detection_filter ("threshold") rules to discover too many heartbeat requests in a short amount of time as an attacker tries to gather memory dumps and by inspecting the TLS size in heartbeat responses for a value that is greater than the normal heartbeat response size.

More information about how the exploit works and our detection for it can be read at our original blog post on this subject, http://blog.talosintel.com/2014/04/heartbleed-memory-disclosure-upgrade.html

Heartbleed Continued - OpenSSL Client Memory Exposed

The Heartbleed vulnerability is bad. Not only does it pose a risk to servers running the vulnerable version of OpenSSL (1.0.1 through 1.0.1f) with heartbeats enabled, it also poses a serious risk to clients running the vulnerable versions.

OpenSSL clients process heartbeats using the same vulnerable functions: tls1_process_heartbeat() and dtls1_process_heartbeat(). The same memcpy() overread detailed in our previous blog post allows malicious servers to read blocks of client memory. In internal testing we were able to extract memory from several client programs such as curl and wget, that link against the vulnerable OpenSSL versions.  It is important to note the versions of these programs does not necessarily matter, if they are linking against the vulnerable OpenSSL versions.

Research into other clients that link against the vulnerable versions of OpenSSL continues. Again, it is strongly recommended that you upgrade to OpenSSL version 1.0.1g or install a version of OpenSSL with heartbeats disabled.

We have released detection for the client side attack in SIDs 30520 through 30523, we have expanded detection port ranges to cover more vulnerable clients and servers, and last but not least, all Heartbleed rules have been added to the community ruleset - because we care.

Tuesday, April 8, 2014

Heartbleed Memory Disclosure - Upgrade OpenSSL Now!

Heartbleed is a serious vulnerability in OpenSSL 1.0.1 through 1.0.1f.   If you have not upgraded to OpenSSL 1.0.1g or installed a version of OpenSSL with -DOPENSSL_NO_HEARTBEATS it is strongly recommended that you do so immediately.

This vulnerability allows the attacker to read up to 64KB of heap memory from the victim without any privileged information or credentials. How is this possible? In short, OpenSSL's heartbeat processing functions use an attacker controlled length for copying data into heartbeat responses. Both DTLS and TLS heartbeat implementations are vulnerable.

The vulnerable functions are tls1_process_heartbeat() in ssl/t1_lib.c (for TLS) and dtls1_process_heartbeat() in ssl/d1_both.c (for DTLS). Looking at these functions you can see that OpenSSL first reads the heartbeat type and length:

hbtype = *p++;
n2s(p, payload);
pl = p;

n2s is a macro that takes two bytes from "p" and copies them to "payload". This is the length indicated by the SSL client for the heartbeat payload.  Note: The actual length of the SSL record is not checked. The variable "pl" is a pointer to the heartbeat data sent by the client.

OpenSSL allocates as much memory as the client asked for (two byte length up to 65535 bytes) plus 1 byte for heartbeat type, 2 bytes for payload length, and 16 bytes for padding:

buffer = OPENSSL_malloc(1 + 2 + payload + padding);
bp = buffer;

Then it builds the heartbeat response by copying the payload size sent in the request to the response using the macro s2n (opposite of n2s).  Finally (and here's the critical part), using the size supplied by the attacker rather than its actual length, it copies the request payload bytes to the response buffer.

*bp++ = TLS1_HB_RESPONSE;
s2n(payload, bp);
memcpy(bp, pl, payload);

If the specified heartbeat request length is larger than its actual length, this memcpy() will read memory past the request buffer and store it in the response buffer which is sent to the attacker. In internal testing we were able to successfully retrieve usernames, passwords, and SSL certificates.

To detect this vulnerability we use detection_filter ("threshold") rules to detect too many inbound heartbeat requests, which would be indicative of someone trying to read arbitrary blocks of data. Since OpenSSL uses hardcoded values that normally result in a 61 byte heartbeat message size, we also use rules to detect outbound heartbeat responses that are significantly above this size. Note: you can't simply compare the TLS record size with the heartbeat payload size since the heartbeat message (including the indicated payload size) is encrypted.

We have released detection in SIDs 30510 through 30517 to detect attacks targeting this vulnerability.

To keep people updated, Heartbleed rules have been added to the community ruleset.

Microsoft Update Tuesday: April 2014, two final XP and Office 2003 fixes



It’s the last Microsoft Update Tuesday before the end-of-life of both Windows XP and Office 2003 and Microsoft is patching two vulnerabilities that also impact XP and two that also impact Office 2003 this month. All-in-all it’s a relatively light month this time around with only four bulletins covering eleven CVEs.

The first bulletin this month, MS14-017, deals with Word and covers three CVEs. One fix is for a 0-day vulnerability, CVE-2014-1761, that Microsoft previously addressed in advisory 2953095 and a “Fix it” that disables support for RTF completely in Word. The vulnerability results from an incorrect “listoverridecount” value in an “overridetable” structure in the RTF file.  This value is not properly checked by Word and setting it to an invalid value causes a type confusion bug, which can be exploited by an attacker to gain remote code execution.  The vulnerabilities addressed in this bulletin also cover Word 2003.

The requisite Internet Explorer bulletin, MS14-018, only covers six CVEs this month. As usual most of the issues are the result of use-after-free vulnerabilities. This time, none of the vulnerabilities that are being patched were publicly known. Given that IE runs on XP as well, this is one of the two bulletins that covers XP.

MS14-019 fixes a vulnerability (CVE-2014-0315) in the way that Windows handles files that can result in remote code execution. This is the second bulletin that also covers XP.

The final bulletin this month is MS14-020 and deals with Publisher, where a maliciously crafted file can result in remote code execution due to an arbitrary pointer dereference (CVE-2014-1759). As with the Word bulletin, this one also covers 2003.

Rules SID 24974-24975, 30497-30502, 30508-30509 address these vulnerabilities.

CVE-2014-1761, Oh did you mean CVE-2012-2539?

When the VRT first received word of a new Microsoft Word 0-day I anxiously awaited details and the ever important hash of the in-the-wild exploit to be able to research it and provide coverage through Snort, ClamAV and the FireAmp suite of products. I was especially interested when word came that it was an RTF vulnerability, as I have spent a lot of time looking at high profile RTF vulnerabilities such as the ever popular CVE-2012-0158.

When the in the wild sample finally arrived I thought someone was playing an early April Fool's joke on us: I knew this vulnerability already. More than that, I had written the coverage for this almost a year and half ago! The vulnerability appeared to be CVE-2012-2539, which was released December 11th 2012 as Microsoft Security Bulletin MS12-079. I checked blogs, looked for any mistakes in the hash I had gotten but, no, this WAS the dreaded vulnerability that prompted Yahoo Finance to tell everyone not to open any RTF files. So I did some searching in my old research and found that I had written Snort rules 24974 and 24975 way back in December of 2012 for this vulnerability. The release posts on Snort.org's blog confirmed this (blog|rule changes). The rule even specifies the vulnerable element of the RTF specification, listoverridecount, in the message.

I enjoyed this hilarious state of affairs and we kept it to ourselves until someone else found it out, for dramatic effect if you will. Lo and behold, this week's blog posts by other security vendors popped up, pointing to listoverridecount as the exploitation vector. This confirmed what we already knew, that this vulnerability was centered around the listoverridecount value. The blog posts rightly deduced that the only legal values for this element are 0, 1 or 9 and other values could cause a crash. Our detection on both Snort and ClamAV already detected that. Interestingly though, there seems to be some programs that generate RTF out there that can generate values for listoverridecount that are not 0, 1 or 9, as we found out when someone submitted a sample to ClamAV that has the SHA256 hash:

3fbffe29252df6a87f37962afe72576ea2a7a5540d6c7993cbbff265fcd2734d

as a potential false positive for the a signature we have to detect attacks leveraging CVE-2012-2539.



ClamAV was the only vendor to detect it before we decided it was prudent to turn the signature into a PUA (Potentially Unwanted Application) signature since no one seemed to be exploiting it actively. The Snort rules have now been updated with new references and a non PUA ClamAV signature that references CVE-2014-1761 has gone out (I can only hope that alternate RTF generators stop using invalid values in their listoverridecounts).

All in all this 0-day has been a little bit disappointing since it was a rehash of a known vulnerability we already covered, but what I can console myself with is the fact that someone, somewhere is probably majorly annoyed because the exploit they built or bought is not working against Sourcefire/CISCO customers!

Monday, April 7, 2014

Dynamically Unpacking Malware With Pin

A common approach that malware takes to hide itself is packing. Traditionally, packing was a means to compress your executable, then unpack and execute it at run time. Packing can also be used as an obfuscation technique for those who wish to hide their executable code. For a while I have been mulling over how to write a generic unpacker. A general rule I came up with is that the unpacked code would have to be written to memory then that memory would be executed. Since I was looking at a sample that did exactly this, I wrote a Pintool to retrieve the unpacked memory regions.

It is a fairly tedious task to follow execution in a debugger in order to retrieve unpacked code. You need to skim thousands of instructions, set breakpoints, watch calls to functions, unset breakpoints, accidentally allow the malware to execute, revert your VM, get back to where you were, read more disassembly, then finally dump memory and analyze that when you get to something interesting. This can take hours, sometimes days or more.

The Dropper

The dropper (MD5: 2E57C0CA7553263E7B6010B850FF2E48) is covered by an NDB signature, Win.Trojan.Zbot-30983. This signature targets bytes from the first stage’s unpacking loop as these bytes were seen to be consistent among all similar samples.


Win.Trojan.Zbot-30983:1:*:8b95a0f6ffff33c08a8415a7f6ffff83f00233858cf6ffff8b8da0f6ffff88840da7f6ffff{-20}410f95c0ff75203bc68d8d6cfdffff59e815ffffff{-75}8b95a0f6ffff33c08a8415a7f6ffff83f0028b8da0f6ffff88840da7f6ffff

This initial unpacking function opens the binary (itself), seeks and ftells for the size, mallocs a buffer, then reads its bytes into the buffer. Beginning at offset 0x4FD8 the function searches for the byte pattern:

   NN ?? (NN+1) ?? (NN+2) ?? (NN+3) ?? (NN+4)

Writing the same in Python we can identify the offset 0x51A9C, which places us 0x89D bytes from the end of the file. The matching pattern:

   9C 54 9D 91 9E FB 9F 69 A0

There is then a loop that copies the 0x956 bytes immediately following that pattern to a local buffer. It then xor decodes the first 0x84A bytes of that buffer with the 6th byte of the 9 bytes extracted above, 0xFB. That is the variable labeled as xor_byte in the above screenshot. Once this memory is decoded, it is executed.

The Pintool

Pin enables you to instrument binaries. That is, you can write code to execute between each instruction, basic block, or routine, you can instrument threads, as well, there is a lot more functionality that would be difficult to list here like hooking system calls. The goal of this Pintool was to simply execute this malware and retrieve the unpacked code.

To achieve this, I started with one of Pin’s examples which records memory reads and writes. I only cared about the writes, so I cut out the code for handling reads.

Any time an opcode for writing memory is detected, the program retrieves the write's target address. It then takes that write address and scans memory regions using VirtualQuery() in order to find the base address of the page that the write address belongs to. Once the owning page is found, that page's info is returned. The page's start and end addresses are stored in a map. Rather than storing every single address that was written to, we instead store ranges of memory, this saves a significant amount of space.

// Records a memory write
VOID RecordMemWrite(VOID * ip, VOID * addr) {
    map<VOID *, VOID *>::iterator i;

    for(i=writtenMap.begin(); i != writtenMap.end(); ++i) {
        if(addr >= i->first && addr < i->second) {
            return;
        }
    }

    WINDOWS::MEMORY_BASIC_INFORMATION *info = getAddrInfo(addr);
    if(info == NULL)
        return;

    writtenMap[info->BaseAddress] = ((UINT8 *)info->BaseAddress) + info->RegionSize;

    return;
}

In addition to recording what memory is written to, the tool checks the address of every basic block executed. If this address falls within one of the memory regions that was previously written to, that memory is dumped to file. The tool then removes the record of that write so that the memory will not be dumped to file again unless it is subsequently written to then executed. This avoids writing to the disk as every single basic block inside a memory region is executed.

VOID checkBBL(ADDRINT addr) {
    map<VOID *, VOID *>::iterator i;
    FILE *memdump;
    char fname[30];

    // Check if basic block (eip / rip) is in memory that was written to
    for(i=writtenMap.begin(); i != writtenMap.end(); ++i) {
        if(addr >= (ADDRINT)i->first && addr < (ADDRINT)i->second) {      
            // Dump memory to file
            sprintf(fname, "dumps\\%p.dump", i->first);

            memdump = fopen(fname, "wb");

            fwrite(i->first, sizeof(char), (size_t)((ADDRINT)i->second - (ADDRINT)i->first), memdump);

            fclose(memdump);

            // Remove write record so we don't dump at every bb
            writtenMap.erase(i->first);

            break;
        }
    }

    return;
}

The Result

Running the Pintool on the sample 2E57C0CA7553263E7B6010B850FF2E48, we get a total of 12 memory files.


Of the memory dumps highlighted above, the smallest two (0018D000 and 0018E000) contain the second stage of unpacking (first stage discussed above), and the two larger files are the third unpacking stage. In the third stage, there is one rather lengthy, hideous function. This function calls itself recursively in order to run through different stages. We see some anti-analysis from the strings vmtoolsd.exe, VBoxService.exe, and SbieDll.dll (Sandboxie). The first two are checked when the function is called with 6 as the first argument. Sandboxie is checked when it is called with a 5.


Eventually, the function calls itself with 9 and that leads to the last stage of unpacking. The final stage uses the RunPE method. It calls CreateProcess on InternetExplorer. It then calls WriteProcessMemory a few times in order to replace code in the newly created process. Finally it calls ResumeThread to begin execution.

The final Zbot payload is detected by a signature dating back to 2011.

Trojan.Spy.Zbot-142:1:*:4973576f77363450726f6365737300002200250073002200000000002200250073002200200025007300000075736572656e762e646c6c00437265617465456e7669726f6e6d656e74426c6f636b000044657374726f79456e7669726f6e6d656e74426c6f636b003a640d0a64656c20222573220d0a6966206578697374202225732220676f746f206400006200610074000000406563686f206f66660d0a25730d0a64656c202f4620222573220d0a000000002f006300200022002500730022

Conclusion

This Pintool was able to get me all of the stages of the unpacker, however, since the sample used RunPE as the final stage I had to dump that manually. The memory dumps did allow me to quickly identify where to break and reach the right functions. Jurriaan Bremer has done some work on unpacking RunPE malware with Pin by hooking the system calls that are used during this process. Another useful addition to this tool would be dumping the call stack when the unpacked code is called. This would allow rapid identification of the unpacking functions at each stage.

Pin is a powerful tool for dynamic malware analysis. This Pintool acts as good proof of concept to justify further work in this area. Setting up an unpacking environment with a powerful, generic unpacker will speed up analysis and classification of malware samples.

Wednesday, April 2, 2014

Using the Immunity Debugger API to Automate Analysis

While analyzing malware samples I came across many simple but annoying problems that should be solved through automation. This post will cover how to automate a solution to a common problem that comes up when analyzing malware.


The application uses GetProcAddress() to get the address to a function located within a library. That address is stored in a variable and saved for later use. This becomes an issue while analyzing the application and coming across a call instruction that references a generic memory address. There is barely any information to indicate which function is being called. Although I could make some intelligent guesses about what is being called, it would be better to know the exact function.


The tool that I am going to use to automate this problem is Immunity Debugger. This debugger, like a lot of others, provides the capability to automate analysis through scripting.


Solving GetProcAddress()

To restate the problem, there are a number of functions being called that cannot be traced back to an actual function. To set this up, a call is made to GetProcAddress() with both the library and the function name are passed as parameters. The return value is the address of the provided function, and it is stored in a variable. Figures 1, 2, and 3 will are pulled directly from the disassembly in IDA.

 

Figure 1: Unknown Function Call
Figure 1 is the unknown function call. As the reader can tell, there is currently no way to know what is being called. Some details surrounding the function can be pulled together to help explain the purpose of the function call. However, that isn’t reliable. 

Figure 2: 4266FC Memory Address
Before going to Immunity, first I want to locate  the memory address in IDA. I accomplished this by double clicking the “dword_4266FC” XREF link in IDA to show the memory address where the function address is stored. Figure 2, shows us the details at .data:004266FC.


Okay, now I need to  track down where this variable is set. There are a multitude of ways to get the answer. I just opened up the XREFS to in IDA and found the function where the variable is set after a GetProcAddress() call. The function, sub_410610, is the culprit. This function contains multiple GetProcAddress() calls. Each call has a return value that is stored in a separate variable.



Figure 3: GetProcAddress()


In Figure 3, there are two of several calls that are made to GetProcAddress(). The return value of GetProcAddress() is the memory address of the specified function, and it is stored in the EAX register. Looking at the first instruction in Figure 3, the address for GetProcAddress() is moved into EDI. A few instructions below that is the call to GetProcAddress() (CALL EDI). The next thing to figure out is what happens to EAX after the function call. Within five instructions EAX is moved into dword_4266FC.

Since the function name is a parameter to GetProcAddress(), why can’t I just grab the name of the function from another spot in memory? Well, the malware author has come up with a method to obfuscate the function names.
 

Take a look at Figure 4. This is the beginning of a very long series of moving bytes around to construct a hex string that is an encoded form of the function name. Once everything is in order, these hex strings are ran through a decoding routine (sub_401610). Once the decoding is complete, the name is stored in a variable that is used for the GetProcAddress() call. In figure 3, that variable part of the ‘lea edx, [esp+0E4h+var_74’ instruction.



Figure 4: Obfuscated Function Names


To get the decoded strings, initially I started with the manual process of stepping through the debugger and recording the decoded strings. As soon as I started stepping, I stopped, and decided to write a script to complete this process.


Since it’s always a good idea to have a list of what needs to be accomplished:


  1. Hook the function that does the GetProcAddress() calls
  2. Get a list of where the GetProcAddress() calls are being made
  3. Look for where EAX is being stored in a variable
  4. Record the address of the variable
  5. Record the function name
  6. Dump this info into a file
  7. Use IDA Python to read the file and load the data into IDA


I used this list as a guideline for creating the script. One more thing I wanted to make sure doesn’t happen: I don’t want the script to slowly step through application while reading each instruction.


I created a PyCommand to solve this problem. PyCommands are plugins for Immunity Debugger that help automate various tasks. These commands are launched using the debuggers provided command box at the bottom of the window (Figure 5). PyCommands are saved in the Immunity Debugger\PyCommands path located in the application’s install directory. These commands are called using an “!” followed by the name of the file.


Figure 5: Command Box
One last thing. The documentation isn’t the best. Anybody using the API will need to use the following sources: the source code, current pyCommands deployed with the application, or one of several resources on the web. I’ve added reference links at the end of this post.


Hooking the Function

To start, I decided to use one of the hook classes provided by Immunity. These can be found in the libhook.py file. I went with the LobBpHook() class. This will hook a function of my choice, and pause execution inside the function.


To set up a hook, I need to create a main method, and a hook class with an init and run methods. Here is a skeleton of the hook I created.


class HookFunc(LogBpHook):

def __init__(self):

LogBpHook.__init__(self)
return

def run(self,regs):
<left blank for now>

def main(args):
if not args:
return "No arguments provided."

imm = immlib.Debugger()

hookAddr = int(args[0],16)
funcName = imm.getFunction(hookAddr).getName()
hook = HookFunc()
hook.add(funcName,hookAddr)
return funcName + " Hooked."

The HookFunc class isn’t a provided class, but one that I created. It is inheriting from the LogBpHook class. The init and run methods are required. The run method is what is going to happen once the hook is triggered.

The main method accepts an address as an argument. This is the address of the function I am going to hook. After the args are checked, the first thing that needs to be done is instantiate a Debugger object. This is stored in imm. Next, is the code used to add the hook.

The string argument needs to be converted  to a hex address. This is accomplished with the int() method. Next, I got the name of the function. After the HookFunc object was created the hook needs to be added. The hook.add(funcName,hookAddr) call adds the hook at the appropriate address.

Figure 6 shows us where execution pauses after the hook is triggered. The execution is paused inside the function that I wanted to hook.



Figure 6: Hook Breakpoint in Function
Other than adding code to the run() method, that is all it takes to create a hook.



Getting a List of the Calls Made Inside of the Function

 Because of the similarity between all of the calls to GetProcAddress(), scripting a solution to get the address of the calls was easily accomplished. It doesn’t do much for making the script work for multiple situations, but it solves this problem. Figure 3 shows  the CALL EDI instruction. This is used for every GetProcAddress() call within this function. It is also only used for the GetProcAddress() call. In addition, this function is just one long basic block. Based on this, I felt the easiest way to grab what I needed was to parse a list of the call instructions being executed and then grabbing the instructions following those calls.


funcAddr = imm.getCurrentAddress()

curFunc = imm.getFunction(funcAddr)

basicBlocks = curFunc.getBasicBlocks()
calls = basicBlocks[0].getCalls()

It’s not the best solution, but I use the first two lines to grab a function object of the current function. The function object contains all of the function data and is explained in the libanalyze.py file in the lib directory. Functions are typically split up into several basic blocks. Since this function is just one long basic block, I decided to just grab all of the function’s basic blocks with getBasicBlocks(). Once I get that data, I can grab a list of call instruction addresses.

The getCalls() method returns a list of call instructions over which I can iterate.

Before starting the next section, there is a problem that is easily overlooked (I know I did). The program execution is paused at the start of the function. If I attempt to read the memory addresses where the function addresses are stored, they will not contain the correct value. The instructions to populate those memory addresses have not been executed. In order to continue execution until the end of the function, I use the following method:

imm.runTillRet()

Figure 7 shows that the execution has stopped at the end of the function. I also had Immunity log the results of the above four method calls (Figure 8).


Figure 7: Breakpoint at End of Function

Figure 8: Current Address, Basic Blocks, Call Instructions


 Now that the variables will be populated when the script reads the memory address, I can proceed.


Getting Address where EAX is Stored and Saving it to a File

Once again, Figure 3, shows that the value in EAX is stored within at most seven instructions of the initial CALL ESI. In order to get the disassembly, I’ll need to iterate over the list of calls.

for c in calls:

     oc = imm.disasm(c)
    call = oc.getDisasm()

Each call will be disassembled and checked to see if EDI is in the instruction.

if 'EDI' in call:
    flag = 7
    i = 0
while i <= flag:
instr = imm.disasmForward(c,i).getDisasm()
if ',EAX' in instr:
<add code>
i++

If the instruction is found, then I’ll need to set up a loop to iterate over the next several instructions to locate an ‘,EAX’ instruction. Once located, I know that I have found the MOVinstruction. This is accomplished with a while loop and using the imm.disasmForward(address,number of lines) method. This method is described in the immlib.py file. I’ve attached the .getDisasm() to the end of the disasmForward(c,i) call to get the disassembly of that line. See Figure 9.

funcStrSaveAddr = '0x' + instr[instr.index('[')+1:instr.index(']')]
funcSaveAddr = int(funcStrSaveAddr,16)
calledFuncName = imm.getFunction(imm.readLong(funcSaveAddr)).getName()
imm.log("* " + funcStrSaveAddr + "," + calledFuncName + "\n")
f.write(funcStrSaveAddr + "," + calledFuncName+"\n")
break

The first instruction grabs the address that is described in the string. This can be can be accomplished  with Python string manipulation. Since the getDisas() method returns a string, this address needs to be converted to hexadecimal. Once again, the int(<string>,16) method converts it to hexadecimal.


Figure 9: CALL EDI and MOV Instructions
The third line pulls the function name that was called by GetProcAddress(). On the first run through, I used the following code to get the name of the function:


calledFuncName = imm.getFunction(funcSaveAddr).getName()


This returned the following values:



Figure 10: Function Name Return Values



Figure 10 shows the stored address along with the supposed name of the function at that address. That is obviously not the value that I need. Virus-20.00426718 is just a reference to the memory address where the function address I am looking for is stored.


Because funcSaveAddr() is just the address of the variable and not the value, I need to read the value  stored at that memory location. This is accomplished using the imm.readLong(funcSaveAddr) method:


calledFuncName = imm.getFunction(imm.readLong(funcSaveAddr)).getName()


This is a simple problem with a simple solution. I was tired and spent a little too long troubleshooting the issue.


The next two lines write both the address of the variable and the function name to both Immunity’s log window (imm.log()) and to a file. Figure 10 shows us the output to the log window. The file follows the following CSV format: address, function name.


Figure 11: Memory Address and Function Name
Figure 11 is a series of calls to set up network functionality later on in the application. Now it is ready to to be read into IDA.


Using IDA Script to Rename Variables
The IDC script is going to read in the lines of the file. On each read, it is going to split up the CSV data, and use that to rename the variable in IDA. Here is the script:

#include <idc.idc>
static main() {

auto fh,line, addr, name, actAddr;

//Open file
fh = fopen("getprocaddr.txt","r");

//Loop through file using readstr()
while ((line = readstr(fh)) != -1) {

 // Split CSV values
addr = line[0:strstr(line,',')];
name = line[strstr(line,',')+1:];

//Convert hex string to long
actAddr = xtol(addr);

 //Change the name of the variable
MakeNameEx(actAddr,name,0);
}
fclose(fh);
}


The script itself should be self explanatory. I've commented the relevant sections. It's fairly simple.


Figures 12, 13, and 14 shows the renamed sections and calls in IDA. Compare these with figures 1, 2, and 3. The code is now much easier to understand.

Figure 13: Location of Function Call
 
Figure 14: Location of Memory Address
Figure 15: EAX being Stored at Memory Address

From here I can continue to analyze the application without wasting time on manually stepping through with Immunity to identify the functions being called. 

Conclusion
There are multiple ways to solve the problem outlined in this blog post. I went with what was easiest for me. The Immunity script is nothing fancy. Hopefully, this blog helps others out there looking for ways to automate various mundane tasks.


Additional Resources: