Thursday, June 26, 2014

Exceptional behavior: the Windows 8.1 X64 SEH Implementation

In my last post, you may remember how the latest Uroburos rootkit was able to disarm Patchguard on Windows 7 . I was recently looking into how Patchguard is implemented in Windows 8.1 and decided to dig into Exception Handling on x64. As a matter of fact, all the new 64-bit Windows operating systems have entirely changed the way they manage error conditions from their state in older 32-bit versions of Windows (C++ exceptions and OS Structured Exception handling). There are a lot of papers available on 64-bit Windows exception handling on the web, but I decided to increase my knowledge on this topic with the goal to understand how it is implemented and to correctly characterize some strange behavior associated with the implementation of Patchguard on Windows 8.1.

Here are some interesting articles that can be found online:

  1. Exceptional Behavior - x64 Structured Exception Handling - OSR Online. The NT Insider, Vol 13, Issue 3, 23 June 2006.
  2. Skape, Improving Automated Analysis of Windows x64 Binaries - Uninformed, June 2006. A great article from Matt Miller
  3. Johnson, Ken. " Programming against the x64 exception handling support ." - Nynaeve. N.p., 5 Feb. 2007. A very good serie of articles that deals with Windows Vista x64 SEH implementation written by Ken Johnson (Skywing)

I strongly recommend that all the readers check out these 3 papers. I won't be rehashing any of the work there.

I will also assume that the reader already knows how Windows Structured Exception Handling and C++ exceptions handling can be exploited to manage errors conditions. If not, I personally recommend the following book that explains very well how this is done:

Quick introduction

As the 3 articles mentioned above explain, x64 exception handling is not stack-based. Therefore, a lot of Structured Exception Handling (SEH) attacks have became ineffective against 64 bit binaries. 64-bit Windows Portable Executables (PE) have what is called the "Exception Directory data directory". This directory implements the 64-bit version of exception handling. It’s the compiler’s duty to add the relative RUNTIME_FUNCTION structure in the exception directory for each chunk of code directly or indirectly involved with exception handling. Here's what this structure looks like:

  typedef struct _RUNTIME_FUNCTION {
    DWORD BeginAddress;    // Start RVA of SEH code chunk
    DWORD EndAddress;      // End RVA of SEH code chunk
    DWORD UnwindData;      // Rva of an UNWIND_INFO structure that describes this code frame
  } RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;

Each runtime function points to an UNWIND_INFO structure that describes one of the most important feature of Windows error handling: the Frame Unwind. Before I describe what frame unwinding is, let’s take a look at the key structures related to the stack unwind (the “UnwindData” member of the RUNTIME_FUNCTION structure points to a UNWIND_INFO ):

// Unwind info flags
#define UNW_FLAG_EHANDLER 0x01
#define UNW_FLAG_UHANDLER 0x02
#define UNW_FLAG_CHAININFO 0x04

// UNWIND_CODE 3 bytes structure
typedef union _UNWIND_CODE {
  struct {
    UBYTE CodeOffset;
    UBYTE UnwindOp : 4;
    UBYTE OpInfo : 4;
  };
  USHORT FrameOffset;
} UNWIND_CODE, *PUNWIND_CODE;

typedef struct _UNWIND_INFO {
  UBYTE Version : 3;          // + 0x00 - Unwind info structure version
  UBYTE Flags : 5;            // + 0x00 - Flags (see above)
  UBYTE SizeOfProlog;         // + 0x01
  UBYTE CountOfCodes;         // + 0x02 - Count of unwind codes
  UBYTE FrameRegister : 4;    // + 0x03
  UBYTE FrameOffset : 4;      // + 0x03
  UNWIND_CODE UnwindCode[1];  // + 0x04 - Unwind code array
  UNWIND_CODE MoreUnwindCode[((CountOfCodes + 1) & ~1) - 1];
  union {
    OPTIONAL ULONG ExceptionHandler;    // Exception handler routine
    OPTIONAL ULONG FunctionEntry;
  };
  OPTIONAL ULONG ExceptionData[];       // C++ Scope table structure
} UNWIND_INFO, *PUNWIND_INFO;

The compiler produces a RUNTIME_FUNCTION structure (and the related unwind info data) for almost all procedures directly or indirectly related with SEH (or C++ exceptions). The only exception, as outlined in " Programming against the x64 exception handling support.", is for the leaf functions: these functions are not enclosed in any SEH blocks, don’t call other subfunctions and make no direct modifications to the stack pointer (this is very important).

Let’s assume that a parent function surrounded by a __try__except block calls another function from its __try code block. When an exception occurs in the unprotected sub function, a stack unwinding occurs. The Windows kernel MUST indeed be able to restore the original context and find the value of the RIP register (instruction pointer) from before the call to the sub function had occurred (or a call to a function that subsequently jumps to the leaf function). This procedure of unwinding the stack is called Frame Unwind. The result of Frame Unwind is that the state of the key registers, including the stack, are restored to the same state as before the call to the exception-causing function. This way, Windows can securely detect if an exception handler (or terminator handle) is present, and subsequently call it if needed. The frame unwind process is the key feature of the entire error management. The unwind process is managed in the Windows kernel and can be used even in other ways (take a look at RtlVirtualUnwind kernel function, which is highlighted in Programming against the x64 exception handling support).

Exception Handling implementation - Some internals

The Skywing articles (mentioned at the beginning of the paper - Programming against the x64 exception handling support) cover the nitty gritty details of the internals of stack unwind in the 64-bit version of Windows Vista. The current implementation of unwind in Windows 8.1 is a bit different but the key concepts remain the same. Let's take a look at exception handling.

The internal Windows function RtlDispatchException is called whenever an exception occur. This function is implemented in the NTDLL module for user mode exceptions, and in the NTOSKRNL module for kernel mode exceptions, although in a slightly different manner. The function begins its execution by performing some initial checks: if a user-mode “vectored” exception handler is present, it will be called; otherwise standard SEH processing takes place. The thread context at the time of exception is copied and the RtlLookupFunctionEntry procedure is exploited to perform an important task: to get the target Image base address and a Runtime Function structure starting with a RIP value, that usually points to the instruction that has raised the exception. Another structure is used: the Exception History Table. This is, as the name implies, a table used by the Windows kernel to speed up the lookup process of the runtime function structure. It is not of particular interest, but for the sake of completeness, here's its definition:

#define UNWIND_HISTORY_TABLE_SIZE   12
typedef struct _UNWIND_HISTORY_TABLE_ENTRY {
  ULONG64 ImageBase;
  PRUNTIME_FUNCTION FunctionEntry;
} UNWIND_HISTORY_TABLE_ENTRY, *PUNWIND_HISTORY_TABLE_ENTRY;

typedef struct _UNWIND_HISTORY_TABLE {
  ULONG Count;          // + 0x00
  USHORT Search;        // + 0x04
  USHORT bHasHistory;   // + 0x06
  ULONG64 LowAddress;   // + 0x08
  ULONG64 HighAddress;  // + 0x10
  UNWIND_HISTORY_TABLE_ENTRY
    Entries[UNWIND_HISTORY_TABLE_SIZE];
} UNWIND_HISTORY_TABLE, *PUNWIND_HISTORY_TABLE;

If no runtime function is found, the process is repeated using the saved stack frame pointer (RSP) as RIP. Indeed in this case, the exception is raised in a leaf function. If the stack frame pointer is outside its limit (as in the rare case when a non-leaf function does not have a linked RUNTIME_FUNCTION structure associated with it), the condition is detected and the process exits.

Otherwise, if the RUNTIME_FUNCTION structure is found, the code calls the RtlVirtualUnwind procedure to perform the virtual unwind. This function is the key of exception dispatching: starting with the Image base, the RIP register value, the saved context and a RUNTIME_FUNCTION structure, it unwinds the stack to search for the requested handler (exception handler, unwind handler or chained info) and returns a pointer to the handler function and the correct stack frame. Furthermore, it returns a pointer to something called the “HandlerData”. This pointer is actually the SCOPE TABLE structure, used for managing C++ exceptions. This kind of stack unwind is virtual because no unwind handler or exception handler is actually called in the entire process: the stack unwind process is actually stopped only when a suitable requested handler is found.

With all the data available, the NT kernel code now builds the DISPATCHER_CONTEXT structure and exploits RtlpExecuteHandlerForException to perform the transition to the language handler routine ( _C_specific_handler in the case of SEH and C++ exceptions). It is now the duty of the language handler routine to correctly manage the exception.

// Call Language specific exception handler.
// Possible returned values:
// ExceptionContinueExecution (0) - Execution must continue over saved RIP
// ExceptionContinueSearch - The language specific dispatcher has not found any handler
// ExceptionNestedException - A nested exception is raised
// ExceptionCollidedUnwind - Collided unwind returned code (see below)
// NO Return - A correct handler has processed exception
EXCEPTION_DISPOSITION RtlpExecuteHandlerForException(EXCEPTION_RECORD *pExceptionRecord,
  ULONG64 *pEstablisherFrame, CONTEXT *pExcContext, DISPATCHER_CONTEXT *pDispatcherContext);

The implementation of RtlDispatchException in kernel mode is quite the same, with 3 notable exceptions:

  1. No Vectored exception handling in kernel mode
  2. A lot of further checks are done, like data alignment and buffer type checks
  3. RtlVirtualUnwind is not employed (except for collided unwinds), but an inlined unwind code is exploited (that relies on the internal procedures RtlpUnwindOpSlots and RtlpUnwindEpilogue )
SEH and C++ Language specific Handler

The standard SEH and C++ exception handler is implemented in the _C_specific_handler routine. This routine is, like the RtlDispatchException, implemented either in user mode or in the kernel.

It starts by checking if it was called due to a normal or collided unwind (we will see what a collided unwind is later on). If this is not the case, it retrieves the Scope Table, and starts cycling between all of the entries in table: if the exception memory address is located inside a C++ scope entry segment, and if the target member of the scope table is not zero, the exception will be managed by this entry. The handler member of the scope entry points to an exception filter block. If the pointer is not valid, and the struct member is 1, it means that the exception handler has to be always called. Otherwise the exception filter is called directly:

DWORD ExceptionFilter(PEXCEPTION_POINTERS pExceptionPointers, LPVOID EstablisherFrame);

The filter can return one of these three possible dispositions:

  • EXCEPTION_CONTINUE_EXECUTION - The C specific handler exits with the value ExceptionContinueExecution; code execution is then resumed at the point where the exception occurred (the context is restored by the internal routine RtlRestoreContext)
  • EXCEPTION_CONTINUE_SEARCH - The C specific handler ignores this Scope item and continues the search in the next Scope table entry
  • EXCEPTION_EXECUTE_HANDLER - The exception will be managed by the _C_specific_handler code

If the filter returns the code EXCEPTION_EXECUTE_HANDLER, the C specific handler prepares all the data needed to execute the relative exception handler and finally calls the routine RtlUnwindEx. This function unwinds the stack and calls all the eventual intermediate __finally handlers, and the proper C exception handler. The routine is called by the C-specific handler in a particular way: the target C++ exception handler pointer is passed in the “ TargetIp” parameter, while the original exception pointer is located in the exception record structure. This is a very important fact, as this way all the eventual intermediate terminator handlers are called. If the C-specific handler had call the specific exception handler directly, all the intermediate __finally handlers would have been lost, and the collided unwinds (a particular unwind case) would have been impossible to manage. RtlUnwindEx doesn’t return to the caller if it’s able to identify the real exception handler.

Here we provide all the data structures related to the Scope table:

// C Scope table entry
typedef struct _C_SCOPE_TABLE_ENTRY {
  ULONG Begin;        // +0x00 - Begin of guarded code block
  ULONG End;          // +0x04 - End of target code block
  ULONG Handler;      // +0x08 - Exception filter function (or “__finally” handler)
  ULONG Target;       // +0x0C - Exception handler pointer (the code inside __except block)
} C_SCOPE_TABLE_ENTRY, *PC_SCOPE_TABLE_ENTRY;

// C Scope table
typedef struct _C_SCOPE_TABLE {
  ULONG NumEntries;               // +0x00 - Number of entries
  C_SCOPE_TABLE_ENTRY Table[1];   // +0x04 - Scope table array
} C_SCOPE_TABLE, *PC_SCOPE_TABLE;

The important thing to note is that if there is a valid handler routine in the Scope Table entry but the target pointer is NULL, it means that the related target code is enclosed by a "finally" block (and only managed by the unwinding process). In this case the handler member points to the code located in the finally block.

Particular Cases

Frame Consolidation Unwinds

As outlined in " Programming against the x64 exception handling support ", this is a special form of unwind that is indicated to RtlUnwindEx with a special exception code, STATUS_UNWIND_CONSOLIDATE. This exception code slightly changes the behavior of RtlUnwindEx; it suppresses the behavior of substituting the TargetIp argument to RtlUnwindEx with the Rip value of the unwound context (as already seen in the C-specific handler routine). Furthermore, there is special logic contained within RtlRestoreContext (used by RtlUnwindEx to realize the final, unwound execution context) that detects the consolidation unwind case, and enables a special code path that treats the ExceptionInformation member of ExceptionRecord structure as a callback function, and calls it.

Essentially, consolidation unwinds can be thought of as a normal unwind, with a conditionally assigned TargetIp whose value is not determined until after all unwind handlers have been called, and the specified context has been unwound. This special form of unwind is in often used in C++ exceptions.

Collided Unwinds

A collided unwind, as the name imply, occurs when an unwind handler routine initiates a secondary unwind operation. An unwind handler could be for example a SEH terminator handle (routine that implements the __finally block). A collided unwind is what occurs when, in the process of stack unwind, one of the call frames changes the target of an unwind . This definition is taken from " Programming against the x64 exception handling support", and I found quite difficult to understand at the first sight. Let’s see an example:

int _tmain(int argc, _TCHAR* argv[])
{
  // Let's test normal unwind and collided unwind
  TestUnwinds();
  return 0;
}

// Test unwind and Collided Unwinds
BOOLEAN TestUnwinds() {
  BOOLEAN retVal = FALSE;   // Returned value
  DWORD excCode = 0;        // Exception code

// Test unwind and Collided Unwinds
__try {
  // Call a function with an enclosed finally block
  retVal = TestFinallyFunc();

  } __except(                 // Filter routine
    excCode = GetExceptionCode(), EXCEPTION_EXECUTE_HANDLER
    ) {
    wprintf(L"Exception 0x%08X in TestUnwinds.\r\n "
    L"\tThis message is not shown in a Collided Unwind.\r\n", excCode);
    }
    wprintf(L"TestUnwinds func exiting...\r\n");
    return retVal;
  }

// Test unwind and Collided Unwinds
BOOLEAN TestFinallyFunc() {
  LPBYTE buff = NULL;
  BOOLEAN retVal = FALSE;
  BOOLEAN bPerformCollided = 0;   // Let’s set this to 1 afterwards

buff = (LPBYTE)VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_READWRITE);

do {
  __try {
    // Call Faulting subfunc with a bad buffer address
    retVal = FaultingSubfunc1(buff + 3590);

    // Produces CALL _local_unwind assembler code
    if (!retVal) return FALSE;    // <-- 1. Perform a regular unwind
    // Produces JMP $LN17 label (finally block inside this function)
    //if (!retVal) __leave;
  } __finally {
    if (!_abnormal_termination())
      wprintf(L"Finally handler for TestFinallyFunc: Great termination!\r\n");
    else
      wprintf(L"Finally handler for TestFinallyFunc: Abnormal termination!\r\n");
    if (buff) VirtualFree(buff, 0, MEM_RELEASE);

    if (bPerformCollided) {     // ← 2. Perform COLLIDED Unwind
      // Here we go; first example of COLLIDED unwind
      goto Collided;
      // Second example of a collided unwind
      break;
      // Other example of collided unwind:
      return FALSE;
    }
  }
  Sleep(5000);
} while (!retVal);
  return TRUE;

Collided:
  wprintf(L"Collided unwind: \"Collided\" exit label.\r\n");
  return 0;
// Std_Exit:
}

The example shows some concepts explained in this analysis. TestUnwinds is the main routine that implements a structured exception handler. For this routine, a related RUNTIME_FUNCTION structure, followed by a C_SCOPE_TABLE , is generated by the compiler. The scope table entry contains either an handler, and a target valid pointers. The protected code block transfers execution to the TestFinallyFunc procedure. The latter shows how a normal unwind works: when FaultingSubfunc1 raises an exception, a normal stack unwind takes place: the stack is unwound and the first __finally block is reached. Keep in mind that in this case only the code in the __finally block is executed (the line with the “Sleep” call is never reached), then the stack frame unwind goes ahead till the __except block (exception handler) of the main TestUnwinds procedure. This is the normal unwind process. A normal unwind process can even be manually initiated, forcing the exit from a try block: the “ return FALSE;“ line in the __try block is roughly translated by the compiler to the following:

mov   byte ptr [bAutoRetVal], 0
lea   rdx, $LN21
mov   rcx,qword ptr [pCurRspValue]
call  _local_unwind

$LN21:
mov   al, byte ptr [bAutoRetVal]
goto  Std_Exit

The compiler uses the _local_unwind function to start the stack unwind. The _local_unwind function is only a wrapper to the internal routine RtlUnwindEx, called with only the first 2 parameters: TargetFrame is set to the current RSP value after the function prolog; TargetIp is set to the exit code chunk pointer as highlighted above... This starts a local unwind that transfers execution to the __finally block and then returns to the caller. The stack unwind process is quite an expensive operation. This is why Microsoft encourages the use of the “ __leave ” keyword to exit from a protected code block. The “__leave” keyword is actually translated by the compiler as a much faster “ jmp FINALLY_BLOCK” opcode (no stack unwind).

Now let’s test what happens when a bPerformCollided variable is set to 1….

In the latter case, FaultingSubfunc1 has already launched a stack unwind (due to an exception) that has reached the inner __finally block. The three examples of collided unwind code lines generate quite the same assembler code like the manually initialized normal stack unwind (but with a different TargetIp pointer). What happens now? A stack unwind process begins from an already started unwind context. As result, the RtlpUnwindHandler internal Nt routine (the handler associated with RtlpExecuteHandlerForUnwind ) manages this case. It restores the original DISPATCHER_CONTEXT structure (except the TargetIp pointer) and returns ExceptionCollidedUnwind constant to the caller (the second call to RtlUnwindEx). We don’t cover the nitty gritty implementation details here, but we encourage the reader to check the Skywing articles ( http://www.nynaeve.net/?p=113).

A side effect of the Collided unwinds in the SEH implementation is that we lose the parent function exception handler: the code flow is diverted and the compiler informs the developer with the C4532 warning message. The message located in the TestUnwinds exception handler routine of our example is indeed never executed when a collided unwind occurs.

Conclusion

In this blog post we took a hard look at the implementation of the Windows 8.1 64-bit Structured Exception handling. We even analysed one of the most important concepts related to SEH: the stack unwind process, and two of its particular cases. The implementation of the last case, the so called “Collided unwind”, is very important for the Windows 8.1 Kernel, because the Kernel Patch Protection feature uses it heavily, rendering its analysis much more complicated.

In the next blog post we will talk about how Patchguard is implemented in Windows 8.1. I'll also go over how the Uroburos rootkit defeated Patchguard in Windows 7 and how those techniques no longer work on Windows 8.1. Stay tuned!

Friday, June 13, 2014

Detection for PutterPanda, we got this.

Recently a post by Crowdstrike was released detailing an attack being used, allegedly, by the Chinese Military "PLA Unit 61486".  The post is a great demonstration of the use of OSINT (Open Source Intelligence) to track an adversary in this increasingly digital world.

You can read Crowdstrike's post here:
http://www.crowdstrike.com/blog/hat-tribution-pla-unit-61486/index.html

Naturally, we started receiving questions if we cover one of the malware/tools mentioned in the post:
15cae06fe5aa9934f96895739e38ca26

(there are others like it)

The VRT can confirm that we've had coverage for the malware/tools mentioned here, since 2012.

The Sourcefire IPS/Snort detects the outbound traffic with rules: 21240 and 21241, along with a similar variant at sid 21242.

Etumbot Detection, more prior coverage

Arbor Networks recently posted details about a backdoor they named Etumbot. It provides technical detail about the functionality of the malware and it includes hashes of known samples.

The Arbor write up is available here:
http://www.arbornetworks.com/asert/2014/06/illuminating-the-etumbot-apt-backdoor/

Using the list of hashes provided by Arbor, the malware was run through our sandbox. This allows us to see files created or downloaded, registry modifications, network traffic and other malware behavior. After the samples have completed their runs in the sandbox, pcap files are retrieved and ran against the existing Sourcefire NGIPS/Snort ruleset provided by the VRT.  The following rules generated alerts.

24115 - MALWARE-BACKDOOR Win.Backdoor.Demtranc variant outbound connection
24235 - MALWARE-CNC Win.Trojan.Wuwo initial infection variant outbound connection
26072 - MALWARE-CNC Win.Trojan.Locati variant outbound connection
28914 - MALWARE-CNC Win.Trojan.Anony variant connection
29471 - BLACKLIST DNS request for known malware domain cht.strangled.net
29473 - BLACKLIST DNS request for known malware domain finance.yesplusno.com

The VRT has had coverage for this malware since 2012 with the rules listed above.

Thursday, June 12, 2014

The never ending Exploit Kit shift - Bleeding Life

Recently we've been able to observe several shifts in exploit kit techniques, so I thought it would be good to share the IOC information for the exploit kits so that administrators and network defenders can take a look at their devices and logs to remediate on their networks.

Bleeding Life

Bleeding life, traditionally, was not one of the more subtle exploit kits.

In the past, the exploit kit would attempt to get the exploits through fairly obvious URI methods.  For example:

"/load_module.php?e=Adobe-2010-2884"
"/load_module.php?e=Java-2010-3552"
"/modules/helpers/Java-2010-0842.jar"

The URI would be explicit about which vulnerability the kit was going to download and run on the client.  However, as of the beginning of of May, subtlety increased slightly, as we've seen a shift in this technique.  The jar and swf files are now named much simpler.  So, for example:

"/modules/2.swf"
"/modules/1.swf"
"/modules/nu.swf"
"/modules/n3.swf"
"/modules/1.jar"
"/modules/2.jar"

The vulnerabilities have been updated to more modern exploits as well.  (I'll detail the hashes and vulnerabilities here in a second.)

The landing page appears to have shifted format in URI as well.  For example:

"/load_module.php?user=", in which the variable issued to user is either "n1, 1, 2, or 11"  or for those of you that speak regular expression: user=(n1|11?|2)

Now for some hashes:

nu.swf, 1.swf, and 2.swf, these appears to be a single hash:
4788CCA43F06752BD6D52978CBF8058FA4A3AEB76BC5242EE83DA4223EC2DE13 -- CVE-2013-0634

n3.swf, however, is a different hash:
8A5EDD1E23DB8054E6B7B76193A70EDC7C0924320F4D26AB963AA53CEA35AB90 -- CVE-2014-0515

1.jar appears to be several hashes:
3C3172A47915FE77EF1F2D38CCB5C786D30F13D8C5161FD0F2411C3B0459A036 -- CVE-2011-3544
C35A5AA55C911F1F1CFF733E0F422C0DE316CFFAF3B285ABA57A4CFDB7188341 -- CVE-2012-1723
4525F4FE895D887AE354CE6221BAD424690503DAFEBC87A43CF54092FAA9CBE8 -- CVE-2012-1723
C1806E59BAE8CD3A320FB249223852D25DD62299844CF045D5AF4AE1DF0452AF -- CVE-2012-1723
C43DBBADD79F2C50F67BFC265825FBAC3887F6840B1DBB2E2556148F597D80C7 -- CVE-2013-2465
7F04E3B43FA259984AEE7CF9FBE83A2C0994FB321D650E5B9FDFDFB11435F05E -- CVE-2013-2465

2.jar appears to be a single hash: 
C9450462F9A58C2C854E93FF8A6782C7AF677653097347F20DD679939EA19B5A -- CVE-2013-2465


The hostnames where this exploit kit has been hosted in the past 30 days (that we've seen) are the following:
www.rouleta.org
tsp-team.com
www.air-bilet.ru
www.cook-n-eat.net
www.preotech.ru

With the following as "Referers":
www.vz.ru
tvzvezda.ru
www.westernbeef.com
paranormal-news.ru
rollen.ru
www.insur-info.ru

Sharing

However, the point that I find the most interesting about this exploit kit are the exploits that are shared with at least one other exploit kit.

The following hashes, for example, are shared between Bleeding Life and the Nuclear exploit kit:
4788CCA43F06752BD6D52978CBF8058FA4A3AEB76BC5242EE83DA4223EC2DE13
7F04E3B43FA259984AEE7CF9FBE83A2C0994FB321D650E5B9FDFDFB11435F05E
C35A5AA55C911F1F1CFF733E0F422C0DE316CFFAF3B285ABA57A4CFDB7188341
4525F4FE895D887AE354CE6221BAD424690503DAFEBC87A43CF54092FAA9CBE8
C43DBBADD79F2C50F67BFC265825FBAC3887F6840B1DBB2E2556148F597D80C7

The fact that so many exploits are shared between the two, in my mind, draws a connection.  I don't know if it's a connection in the same way that Cool and Blackhole were related (written by the same person), but I find it interesting.

All these hashes are detected and prevented with both ClamAV and FireAMP, and Sourcefire IPS/Snort's detection will ship in the form of SIDs:

31229-31232

This blog was made possible by contributions and assistance from Emmanuel Tacheau from our Cisco TRAC team.

Tuesday, June 10, 2014

Microsoft Update Tuesday June 2014: Internet Explorer, Internet Explorer, Internet Explorer



Once again it’s time for Microsoft’s Update Tuesday and this time it’s almost all about Internet Explorer. We had a bit of a lull in the past months with respect to IE vulnerabilities, especially due to the out-of-band patch that Microsoft released last month, which delayed some of the regularly scheduled fixes. However, this month more than makes up for it: we have a total of seven advisories this month, fixing 66 vulnerabilities, 59 of which are in IE.

There are two advisories that are marked as critical:

The first critical bulletin is MS14-035 and is the IE bulletin that covers 59 total vulnerabilities. Of these 59 vulnerabilities, two are information disclosure issues: CVE-2014-1777 and CVE-2014-1771. The last vulnerability was publicly known and is a TLS renegotiation vulnerability that could be exploited by a man-in-the-middle attacker. There are also 3 escalation of privilege vulnerabilities, while the remaining 54 vulnerabilities are memory corruption vulnerabilities. Once again many of these memory corruption vulnerabilities are use-after-frees. Of these memory corruption vulnerabilities, one was publicly known: CVE-2014-1770. Microsoft is also adding a defense in depth protection to IE this month to better protect against these use-after-free vulnerabilities.

MS14-036 is the second and final critical bulletin this month and is for GDI+, this bulletin covers two CVEs. The first CVE (CVE-2014-1817) is related to a vulnerability when processing Unicode Script while the other one is related to image parsing (CVE-2014-1818).

The remaining bulletins are all marked as important and each fix a single vulnerability.
Remote desktop is the subject of the first important bulletin (MS14-030, CVE-2014-0296) and fixes a vulnerability that could allow an attacker to disclose and modify a session. One mitigating factor this vulnerability is that an attacker must be able to perform a MITM attack at the beginning of the session to be able to influence it.

The next bulletin, MS14-031, is for a Denial of Service issue in the way that TCP (CVE-2014-1811) is handled. An attacker can send sequence of crafted TCP packets to cause a DoS.

MS14-032 covers Lync Server and fixes CVE-2014-1823 which is a reflected XSS vulnerability, where an attacker modifies a parameter to an existing meeting which allows Javascript to be injected in the target’s browser.

There is an information disclosure in MSXML (CVE-2014-1816) that is fixed by MS14-033, where an attacker can obtain paths (and thus usernames), when a malicious XML file is loaded in the browser.

The final bulletin for this month, MS14-034, covers a vulnerability in Word that can result in a remote code execution due to a vulnerability in the handling of embedded fonts (CVE-2014-2778)

VRT is releasing the following rules to address these issues: SID  31188-31194, 31196-31209, 31215-31217.

Tuesday, June 3, 2014

An Introduction to Recognizing and Decoding RC4 Encryption in Malware

There is something that we come across almost daily when we analyze malware in the VRT: RC4. We recently came across CVE-2014-1776 and like many malware samples and exploits we analyze, RC4 is used to obfuscate or encrypt what it is really doing. There are many ways to implement RC4 and it is a very simple, small algorithm. This makes it very common in the wild and in various standard applications. Open-source C implementations can be found on several websites such as Apple.com and OpenSSL.org.

What is RC4? 

RC4 was designed by Ron Rivest of RSA Security in 1987. RC4 is a fast and simple stream cipher that uses a pseudo-random number generation algorithm to generate a key stream. This key stream can be used in an XOR operation with plaintext to generate ciphertext. The same key stream can then be used in an XOR operation against the ciphertext to generate the original plaintext.

While it is still common in malware, RC4 has been legitimately implemented in a number of areas where speed and privacy are of concern. In the past, both WEP and TLS both used RC4 to protect data sent across the wire. However, last Fall, Microsoft recommended that customers disable RC4 by enabling TLS1.2 and AES-GCM.

For more information including a detailed history of RC4, check out the Wikipedia article.

Why is it used in malware? 

Increasingly, we find that RC4 is used to encode data that is sent to a remote server to be decrypted on the other side using a pre-shared key.  This makes detection a bit trickier (but not impossible) and also makes it harder to determine exactly what is being sent across the wire. What we will usually do when we think we’ve come across some sort of encryption is determine the source of it and whether the data being sent is static (for matching purposes) and what exactly that data is.

How does it work?

*Note: For these examples, I will be using a variant of the Coremex Search Engine Hijacker (MD5: 70E2090D5DEE18F3E45D38BF254EFF87) after it has resumed its suspended child process.

RC4 is implemented in two main phases:
1. A Key Scheduling Algorithm is executed using a symmetric key to create an array of 256 bytes (0x100h).
2. This array is then used in a pseudo-random number generation algorithm to generate a cipher stream that can be decoded using the same key.

Many books and internet articles will represent the Key Scheduling Algorithm (KSA) with the following Pseudocode:

for i from 0 to 255
   S[i] := i
endfor


j := 0
for i from 0 to 255
   j := (j + S[i] + key[i mod keylength]) mod 256
   swap values of S[i] and S[j]
endfor

To better understand how the algorithm works, split it into multiple sections.


Section 1: 

Create and Initialize the Substitution Box

for i from 0 to 255
   S[i] := i
endfor

This section creates an array (or an “SBox”/Substitution Box)  where each value equals its position in the array from 0-255 (0x00-0xFF) , this is also known as its identity permutation:




This initial table-creation is a key indicator when looking for this type of encryption in malware samples. For this sample, the RC4 KSA was initialized using the following loop in x86 assembly code:

100020E5 xor eax, eax     ; Initialize counter to 0

loop:
100020E7                    ; Give each array index its identity value
100020E7 mov [eax+ecx], al ; using EAX as a counter/value:
100020E7                    ; S[0] = 0x00 ... S[256] = 0xFF
100020EA inc eax          ; Increment counter by 1
100020EB cmp eax, 100h    ; Compare counter value to 256 (0x100h) // NOTE THE 100h!
100020F0 jl   short loop ; Loop around if counter < 256

Note the instruction at 0x100020EB. 100h is a great value to search a binary for in a disassembler like IDA Pro. Looking for an instruction that is comparing a register to 100h can often point you in the right direction, especially if you know the malware is using RC4 ahead of time.

While looking at the memory dump that [eax+ecx] points to after this loop completes, you can see the newly-constructed SBox that looks like the one above:


0012FBB0  00 01 02 03 04 05 06 07  08 09 0A 0B 0C 0D 0E 0F  ................
0012FBC0  10 11 12 13 14 15 16 17  18 19 1A 1B 1C 1D 1E 1F  ................
0012FBD0  20 21 22 23 24 25 26 27  28 29 2A 2B 2C 2D 2E 2F   !"#$%&'()*+,-./
0012FBE0  30 31 32 33 34 35 36 37  38 39 3A 3B 3C 3D 3E 3F  0123456789:;<=>?
0012FBF0  40 41 42 43 44 45 46 47  48 49 4A 4B 4C 4D 4E 4F  @ABCDEFGHIJKLMNO
0012FC00  50 51 52 53 54 55 56 57  58 59 5A 5B 5C 5D 5E 5F  PQRSTUVWXYZ[\]^_
0012FC10  60 61 62 63 64 65 66 67  68 69 6A 6B 6C 6D 6E 6F  `abcdefghijklmno
0012FC20  70 71 72 73 74 75 76 77  78 79 7A 7B 7C 7D 7E 7F  pqrstuvwxyz{|}~.
0012FC30  80 81 82 83 84 85 86 87  88 89 8A 8B 8C 8D 8E 8F  Ç.éâäàåçêëèïî.Ä.
0012FC40  90 91 92 93 94 95 96 97  98 99 9A 9B 9C 9D 9E 9F  .æÆôöòûùÿÖÜ¢£.Pƒ
0012FC50  A0 A1 A2 A3 A4 A5 A6 A7  A8 A9 AA AB AC AD AE AF  áíóúñѪº¿¬¬½¼¡«»
0012FC60  B0 B1 B2 B3 B4 B5 B6 B7  B8 B9 BA BB BC BD BE BF  ¦¦¦¦¦¦¦++¦¦+++++
0012FC70  C0 C1 C2 C3 C4 C5 C6 C7  C8 C9 CA CB CC CD CE CF  +--+-+¦¦++--¦-+-
0012FC80  D0 D1 D2 D3 D4 D5 D6 D7  D8 D9 DA DB DC DD DE DF  ---++++++++¦_¦¦¯
0012FC90  E0 E1 E2 E3 E4 E5 E6 E7  E8 E9 EA EB EC ED EE EF  aßGpSsµtFTOd8fen
0012FCA0  F0 F1 F2 F3 F4 F5 F6 F7  F8 F9 FA FB FC FD FE FF  =±==()÷˜°··vn²¦


Now that the table has been initialized, it’s time to scramble the box.

Section 2:

Scramble SBox with Key “0006” (ASCII 0x30303036)

j := 0
for i from 0 to 255
   j := (j + S[i] + key[i mod keylength]) mod 256
   swap values of S[i] and S[j]
endfor

This routine takes the initialized table and performs various byte-swaps against the table using the key and its length (keys can range from 1->255 bytes in length). Here is how this sample implemented this routine. Note that the exact assembly instructions will vary amongst compilers, platforms and languages.


100020F4 loop:               ; ECX = S[0] | EDI = j
100020F4 mov    eax, esi         ; Initialize EAX
100020F6 cdq                         ; EAX -> EDX:EAX (with sign)
100020F7 idiv   [esp+0Ch+keylen]         ; EDX = i mod keylen
100020FB mov    bl, [esi+ecx]        ; BL = S[i]
100020FE mov    eax, [esp+0Ch+key]   ; EAX = key
10002102 movzx  eax, byte ptr [edx+eax] ; EAX = key[i mod keylen]
10002106 add    eax, edi                 ; EAX = (j + key[i mod keylen])
10002108 movzx  edx, bl             ; EDX = S[i]
1000210B add    edx, eax                 ; EDX = (j + S[i] + key[i mod keylen])
1000210D and    edx, 0FFh                ; Another way to mod 255
10002113 mov    edi, edx             ; j = (j + S[i] + key[i mod keylen])
10002115 mov    al, [edi+ecx]        ; AL = s[j]
10002118 mov    [esi+ecx], al    ; S[i] = S[j]
1000211B inc    esi                  ; i++
1000211C cmp    esi, 100h            ; Check if i < 256 // NOTE THE 100h!
10002122 mov    [edi+ecx], bl        ; S[j] = S[i]
10002125 jl     short loop ; Loop if Less

In IDA Pro, the SBox Scramble loop following the Initialization loop may resemble these basic blocks:




Manually calculating at least the first few bytes of this example with a pencil and a piece of paper will help make it more clear how the bytes are swapped to generate this new SBox:

Initialized SBox:


For the first byte of the Key “0006”, ( Key[0] ) is “0”, remember this is ASCII “0x30”:


j := (j + S[i] + key[i mod keylength]) mod 256
   swap values of S[i] and S[j]

i = 0 // first round
j = (j + S[i] + key[i mod keylength]) mod 0x100
 = (0 + S[0x00] + key[0 mod 4]) mod 0x100
 = (0 + 0 + key[0]) mod 0x100
 = (0 + 0x30) mod 0x100
 = 0x30 mod 0x100
 = 0x30
S[0x0] = 0x30
S[0x30] = 0x00

After bytes S[0x00] and S[0x30] are swapped, the resulting table looks like this:


For the second byte of the Key “0006”, ( Key[1] ) is also “0”, or ASCII “0x30”:

i = 1 // second round
j = (j + S[i] + key[i mod keylength]) mod 0x100
 = (0x30 + S[0x01] + key[1 mod 4]) mod 0x100
 = (0x30 + 1 + key[1]) mod 0x100
 = (0x31 + 0x30) mod 0x100
 = 0x61 mod 0x100
 = 0x61
S[0x1] = 0x100
S[0x61] = 0x100

After bytes S[0x01] and S[0x61] are swapped, the resulting table looks like this:



The algorithm will continue to perform this calculation 256 times. Note that these values will continue to be swapped out and will even swap previously-swapped bytes as well. Using the key “0006”, the malware sample will end up generating the following SBox on the stack (I added the corresponding SBox array indexes for visualization purposes only):


S[00] | 0012FBB0  18 8A 98 7B|16 35 F4 A8|C0 A5 53 94|D0 0D 87 90| 
S[10] | 0012FBC0  2B 11 BA 26|08 25 C7 75|EB C6 83 D4|20 12 73 DB|
S[20] | 0012FBD0  1B 4E FF D3|EF 72 50 2E|B9 33 AF DC|6C C9 42 8C|
S[30] | 0012FBE0  BC 29 3A E8|EC 3B E7 54|44 F5 C3 3F|3C A9 32 17|
S[40] | 0012FBF0  59 60 DF 23|F0 6A B7 89|8B 43 7E C2|47 A3 37 A6|
S[50] | 0012FC00  34 A7 67 95|D8 B1 46 D9|56 28 A2 5B|7D 4C 41 7F|
S[60] | 0012FC10  5E AE 85 88|B2 9C 9B 0F|0A AB 8D 6E|ED 96 40 92|
S[70] | 0012FC20  45 1A F9 CE|B0 3E 9D 1D|68 1E E3 13|2A 51 D6 B4|
S[80] | 0012FC30  EE 58 D5 E1|D1 BB 39 4A|4F 15 07 B8|80 69 E4 FC|
S[90] | 0012FC40  5A 21 A1 1C|7C 9A 0E 5F|FD CB 02 B5|FA BD 57 86|
S[A0] | 0012FC50  E9 8E CA E5|5D 19 6F AA|4D CD 71 F2|BE 49 0B E2|
S[B0] | 0012FC60  F1 79 A0 D2|B6 DD F6 F8|2F E6 78 C1|52 CF 05 04|
S[C0] | 0012FC70  E0 6D 70 97|99 24 FE 06|4B 91 76 A4|B3 FB 63 09|  
S[D0] | 0012FC80  81 64 00 82|5C C5 EA 36|AD 03 C8 0C|1F 84 48 C4|
S[E0] | 0012FC90  74 31 01 55|62 66 8F 9F|38 61 F7 BF|27 7A 22 AC|
S[F0] | 0012FCA0  9E 65 77 F3|6B 2C DE DA|30 14 3D CC|2D 93 D7 10|

Section 3:

Generate the Key Stream and Encode Data

i := 0
j := 0
for x from 0 to len(plaintext)
i := (i + 1) mod 256
j := (j + S[i]) mod 256
   swap values of S[i] and S[j]
K := S[(S[i] + S[j]) mod 256]
output K ^ plaintext[x]
endfor

The next step is to use this newly-created SBox to encode the data. This is done by creating a keystream using the SBox and this algorithm. The result, K is then used in an XOR operation with each byte of the plaintext to generate the encrypted data.


This routine takes the modified SBox and again performs various byte-swaps against the table. It then uses this information to generate the keystream(K). This stream is XOR’d against the plaintext until all of the plaintext has been encoded. If the length of the plaintext exceeds the length of the keystream, the stream starts over at K[0]. Here is how this sample implemented the routine:


Note that this sample used the following structure (other implementations may use u_char for indexes) to store the SBox and its two counters:

struct rc4_state
{
 u_char perm[256]; // SBox
 __int32 index1; // i
 __int32 index2; // j
};

This sample encodes various data about the victims machine and sends the data encoded with this RC4 stream to its Command and Control server. This section of the malware just happens to be encoding a hash of one of my system files. The original hash that it encodes is: EA497F6BD6555BA85127CE083A513BE8:

10002174 loop:       
10002174 mov ecx, [ebp+68h+state.index1] ; ECX = i
10002177 inc ecx                     ; i += 1
10002178 and ecx, esi             ; i = i mod 0x100
1000217A mov [ebp+68h+state.index1], ecx ; Store i
1000217D lea edx, [ebp+ecx+68h+state]     ; EDX = *S[i]
10002184 movzx     ecx, byte ptr [edx]     ; ECX = S[i]
10002187 add ecx, [ebp+68h+state.index2]   ; ECX = j + S[i]
1000218A and ecx, esi                 ; ECX = (j + S[i]) mod 0x100
1000218C mov [ebp+68h+state.index2], ecx ; j = (j + S[i]) mod 0x100
1000218F mov al, [ebp+ecx+68h+state.perm] ; AL = S[j]
10002196 movzx      ebx, byte ptr [edx]    ; EBX = S[i]
10002199 mov [edx], al                ; S[i] = S[j]
1000219B mov eax, [ebp+68h+state.index2] ; EAX = j
1000219E mov [ebp+eax+68h+state.perm], bl ; S[j] = S[i]
100021A5 mov eax, [ebp+68h+Plaintext]      ; EAX = Plaintext
100021A8 mov edx, [ebp+68h+state.index1]    ; EDX = i
100021AB movzx     edx, [ebp+edx+68h+state.perm] ; EDX = S[i]
100021B3 lea ecx, [edi+eax] ; ECX = *Plaintext[x]
100021B6 mov eax, [ebp+68h+state.index2] ; EAX = j
100021B9 movzx     eax, [ebp+eax+68h+state.perm] ; EAX = S[j]
100021C1 add eax, edx ; EAX = S[i] + S[j]
100021C3 and eax, esi ; EAX = (S[i] + S[j]) mod 0x100
100021C5 mov al, [ebp+eax+68h+state.perm] ; AL = S[(S[i] + S[j]) mod 0x100]
100021CC xor [ecx], al ; Plaintext[x] ^ Output K
100021CE inc edi ; x++
100021CF cmp edi, [ebp+68h+arg_4] ; Check if x < len(Plaintext)
100021D2 jb   short loop ; Loop if x < len(Plain

In IDA Pro, the RC4_Crypt loop may resemble these basic blocks:



Once the length of the plaintext is met, the keystream K is completely generated. As each value of K was generated, it was used to XOR the complimentary byte of the plaintext, in this case it looked like this:



To decrypt the ciphertext, simply reverse the process:



Exercise:

Putting it all together with Python

I implemented RC4 in Python, treating input as strings and outputting the SBox contents both before and after scrambling.

*Note: since this script treats input as a string, you would have to send raw bytes for non-ASCII characters. In the example above, this can be accomplished like this:


./rc4Gen.py 0006 `perl -e 'print "\xEA\x49\x7F\x6B\xD6\x55\x5B\xA8\x51\x27\xCE\x08\x3A\x51\x3B\xE8"'`


I've linked the Python code here: rc4Gen.py