Wednesday, March 21, 2012

ClamAV vs. Content IQ Test, part 2

This is the second post in a series of blog posts about the Content IQ Test. Please see ClamAV vs. Content IQ Test, part 1.

Let's see how ClamAV does with test files that contain auto-executing embedded active content.

Test file 10 contains the target string in an obfuscated, auto-executing Javascript object embedded in a PDF file.

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:7:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ clamscan -d test.ndb Test_File_10_Target_String_in_JS_in_PDF.pdf 
Test_File_10_Target_String_in_JS_in_PDF.pdf: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.010 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ clamscan -d test.ndb Test_File_10_Negative_Control.pdf 
Test_File_10_Negative_Control.pdf: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 1.00:1)
Time: 0.010 sec (0 m 0 s)

ClamAV generates an alert because it's able to parse some PDF objects. In this particular case, it's able to "see":

What ClamAV sees for test file 10

It's worth noting that official ClamAV signatures would have flagged this file:

azidouemba@ubuntu:~/Downloads$ clamscan Test_File_10_Target_String_in_JS_in_PDF.pdf 
Test_File_10_Target_String_in_JS_in_PDF.pdf: Heuristics.PDF.ObfuscatedNameObject FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1151473
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 1.00:1)
Time: 10.331 sec (0 m 10 s)


Test file 11 contains the target string in an obfuscated, auto-executing Javascript object embedded in a PDF file compressed with Zip.

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:7:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ clamscan -d test.ndb Test_File_11_Target_String_in_JS_in_PDF_in_ZIP.zip
Test_File_11_Target_String_in_JS_in_PDF_in_ZIP.zip: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.013 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ clamscan -d test.ndb Test_File_11_Negative_Control.rar 
Test_File_11_Negative_Control.rar: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: 0.97.3
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.012 sec (0 m 0 s)


Archives are treated  like the containers they are. Here, ClamAV extracted the contents of the ZIP file and scanned its content.

Test file 12 contains the target string in ActionScript code in an auto-executing Flash (SWF) file. To detect this string, we use a feature of ClamAV that is current undergoing testing and is not available in the latest stable release. You will need to download the development release, and uncomment the following in libclamav/scanners.c before compiling:

case CL_TYPE_SWF:
            if(DCONF_DOC & DOC_CONF_SWF)
                ret = cli_scanswf(ctx);

            break;

We run clamav-devel/clamscan/clamscan with the option --leave-temps. ClamAV "sees":

What ClamAV sees for test file 12

We go ahead and scan Test file 12:

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:7:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_12_Target_String_in_Swf.swf
Test_File_12_Target_String_in_Swf.swf: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.012 sec (0 m 0 s)


ClamAV did NOT alert on the file. That's because, in this case, it did not treat the code that contains the evil string as ASCII normalized text. Therefore, we need to change our signature to make it all files, not just ASCII normalized text file. We do so by changing that target type from 7 to 0:

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:0:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_12_Target_String_in_Swf.swf
Test_File_12_Target_String_in_Swf.swf: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.030 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_12_Negative_Control.swf 
Test_File_12_Negative_Control.swf: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 0.026 sec (0 m 0 s)


Success!

Test file 13 contains the target string in ActionScript code in an auto-executing Flash (SWF) file embedded in an Excel file.

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:0:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_13_Target_String_in_Swf_in_Excel.xlsm 
Test_File_13_Target_String_in_Swf_in_Excel.xlsm: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.02 MB
Data read: 0.02 MB (ratio 1.50:1)
Time: 0.072 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_13_Negative_Control.xlsm 
Test_File_13_Negative_Control.xlsm: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.04 MB
Data read: 0.02 MB (ratio 2.50:1)
Time: 0.040 sec (0 m 0 s)

The Flash file is extracted from the Excel file. Then, the Actionscript code is extracted from the Flash file and ClamAV alerts on the target string.

Test file 14 contains the target string in ActionScript code in an auto-executing Flash (SWF) file embedded in an Excel file compressed with Zip.

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:0:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_14_Target_String_in_Swf_in_Excel_in_Zip.zip 
Test_File_14_Target_String_in_Swf_in_Excel_in_Zip.zip: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.02 MB
Data read: 0.01 MB (ratio 2.00:1)
Time: 0.043 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_14_Negative_Control.zip 
Test_File_14_Negative_Control.zip: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.05 MB
Data read: 0.01 MB (ratio 4.33:1)
Time: 0.024 sec (0 m 0 s)

The Excel file is extracted from the Zip file. Then the Flash file is extracted from the Excel file. Next, the Actionscript code is extracted from the Flash file and ClamAV alerts on the target string.

Test file 15 contains the target string in ActionScript code in an auto-executing Flash (SWF) file embedded in a Powerpoint file.

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:0:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_15_Ts_in_Swf_in_Ppt.pptx 
Test_File_15_Ts_in_Swf_in_Ppt.pptx: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.07 MB
Data read: 0.03 MB (ratio 2.25:1)
Time: 0.032 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_15_Negative_Control.pptx 
Test_File_15_Negative_Control.pptx: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.11 MB
Data read: 0.03 MB (ratio 3.38:1)
Time: 0.053 sec (0 m 0 s)

The Flash file is extracted from the Powerpoint file. Then, the Actionscript code is extracted from the Flash file and ClamAV alerts on the target string.

Test file 16 contains the target string in ActionScript code in an auto-executing Flash (SWF) file embedded in a PDF file.

azidouemba@ubuntu:~/Downloads$ cat test.ndb 
TestSig1:0:*:6576616c{-200}756e657363617065{-200}282725363525373625363925366325323825323927
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_16_Ts_in_Swf_in_Pdf.pdf 
Test_File_16_Ts_in_Swf_in_Pdf.pdf: TestSig1.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.01 MB (ratio 0.00:1)
Time: 0.030 sec (0 m 0 s)
azidouemba@ubuntu:~/Downloads$ ~/Programs/clamav-devel/clamscan/clamscan -d test.ndb Test_File_16_Negative_Control.pdf 
Test_File_16_Negative_Control.pdf: OK

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: devel-clamav-0.97-434-gd510390
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.01 MB
Data read: 0.01 MB (ratio 1.00:1)
Time: 0.037 sec (0 m 0 s)

The Flash file is extracted from the PDF. Then, the Actionscript code is extracted from the Flash file and ClamAV alerts on the target string.

 In the next post I'll take a look at how ClamAV does against polymorphic test files.

Tuesday, March 20, 2012

MIDI Karaoke Background or Malware Vector?

In late January, we started seeing a new piece of malware based on the MIDI file format. This was the first in-the-wild attempt at leveraging a vulnerability that Microsoft publicly disclosed in Janurary under the security bulletin MS12-004 (CVE-2012-0003). The vector of infection was through embedding the exploit file, baby.mid, in a malicious webpage. Upon opening the webpage under versions of Windows other than Windows 7 or Windows Server 2008 R2, Windows Media Player would open baby.mid. The exploit it contained would cause a heap overflow that allowed for shellcode to be executed.

Below is a series of screenshots showing the embedding of baby.mid and the JavaScript (and shellcode) that is later invoked.

Windows Media Player calling baby.mid

Pic.1 - Here we see the HTML where the embedded Windows Media Player is called with baby.mid as content followed by the start of obfuscated shellcode

Shellcode

Pic.2 - Here we see the end of shellcode and the call to its unobfuscation.


In Pic.2, we can see artifact left behind by the person being this attack: “/*Encrypt By Dadong's JSXX 0.41 VIP*/”. This is the signature of a JavaScript obfuscating tool.


Exploit

Pic.3 - Finally this function will start the exploit with a call to play midi file on launch.

The importance of this exploit cannot be understated as MIDI is a relatively uncomplicated format and a simple 1-byte overflow was found to be enough to allow remote code execution in the context of Windows Media Player. This is not an exploit that takes hours of testing and sophisticated knowledge to use, all that's needed is rudimentary knowledge of the (open) MIDI file format and basic HTML, as you can embed MIDI in webpages in order for them to be played on the time the page is loaded. In this blog post I will take you through how detection is achieved and how the VRT ( in particular Alain Zidouemba, Patrick Mullen and myself) worked to Parse and detect MIDI files leveraging CVE-2012-0003, including MIDI files using an encoding designed to reduce file size which caused false positives for other detection devices.

Without going into too much detail, the MIDI file format simple and lightweight, hearkens back to the year 1982 with MIDI 1.0. Since then, there have been revisions and updates, even competing sub-standards. We can see that a MIDI file is a big-endian file format that starts with a 4-byte identifier: "Mthd" (0x4D546864). Inside the file, individual tracks can be identified by the bytes "MTrk" (0x4D54726B). We are interested in the data inside this track. After a chunk size field (size of the track) of 4 bytes, we get into the track event data where MIDI events are defined. MIDI events consist of a delta-time field, an event type field and up to 3 bytes as parameters for that event type. Delta-time fields are variable length, with a flag at bit 7 of each byte to determine whether the next byte is a continuation of the delta-time or the actual event type. Some events have no delta-time values and their delta-time fields are always set to 00.

Using this blog as a reference we can be more specific about the vulnerability. Of all the MIDI event types, the ones we will concentrate on are: Note Off (0x8), Note On (0x9) and Note Aftertouch (0xA). All three of these have 2 parameters of 1 byte each. The first of these two parameters is the note number, and is valid when between 0 and 127. However with a 1-byte field, the range of values is twice that, and that's where Windows Media Player has a problem. So, forl a start, detection has to be able to find all event types that have high nibbles of 0x8, 0x9 and 0xA to be able to judge if the note number field is over 0x7F. Then detection must then contend with variable delta-time fields. And as if that wasn't enough, there is another type of encoding for note events that complicates things, called MIDI Running Status. This is a type of run-length encoding where, as long as the status byte/event type doesn't change, the parsing assumes all subsequent events are of the same type. Additionally, in an attempt to be even more efficient, when using running status, MIDI files can simply set the 2nd parameter of the note on event to 00 to turn off the note instead of using the note off event. This complicates detection as we peg all our content matching on those status types as they are the vulnerable ones. However, the vulnerable fields are now no longer at a predictable distance from the identifying content.

For our Snort and ClamAV, we have the ability to write detection routines in C. ClamAV uses LLVM, and code is JIT-compiled at load time on systems that are supported by the JIT. On systems that don't, it's interpreted. This allows for detailed parsing of a file type. Alain Zidouemba wrote the following detection code, and goes through it with us. I encourage you to download the BC.Exploit.CVE_2012_0003-1.c and follow along.

We start by declaring sig1 and sig2, two signatures that we will use to identify MIDI files. Sig1 will match "Mthd" at the beginning of a file and sig2 will match "MTrk" anywhere in the file. We make our trigger condition coming across a file that matches both sig1 and sig2. If that triggering condition is met, through our code, ClamAV will do the following for every track ("MTrk") in the MIDI file:

- Read the chunk size, which is a 4-byte big-endian value that comes immediately after “MTrk”
- Skip the delta-time, which is a variable-length value field. It determines when an event should be played relative to the track's last event. For our parsing purposes, we don’t need to know what the delta-time is. We only need to know where it ends so that we can skip it.
- Next, we read the event type. It can be a value between 0x80 and 0xFF
- For event types between 0xB0 and 0xEF, we properly parse the MIDI channel events and parameters.
- For event type 0xFF, we are dealing with what are called meta-events. They are events that aren’t sent or received over MIDI ports, yet we need to parse them properly. And that’s what we do.
- Event type 0xF0 usually defines a Normal System Exclusive Event. These are events used to control MIDI hardware or software that require special data bytes that will follow their manufacturer's specifications. These are the most common type of SysEx event and are used to hold a single block of manufacturer specific data. The last byte transmitted is 0xF7 to indicate the end of the event
- Event type 0xF0 sometimes defines a Divided System Exclusive Event. This is when a large amount of SysEx data in a Normal SysEx Event could cause following MIDI Channel Events to be transmitted after the time they should be played. In that case, the last byte is not 0xF7 to indicate that the SysEx data is not finished and will be continued in an upcoming Divided SysEx Event. Any following Divided SysEx Events before the final one use a similar format as the first, only the start byte is 0xF0 instead of 0xF7 to signal continuation of SysEx data. The final block follows the same format as the continuation blocks, except the last data byte is 0xF7 to signal the completion of the divided SysEx data. Again, we are most concerned about just parsing this data properly.
- If the event type is “Note On”, “Note Off”, or “Note Aftertouch” (in other words, if the higher nibble is 0x8, 0x9 or xA), check to see if the velocity (in the case of “Note On” or “Note Off”) or the aftertouch value (in the case of “Note Aftertouch” ) if greater than 0x7F. If that’s the case we have the vulnerable condition!

Metasploit

Pic.4 - The Metasploit exploit with note on event, “9F” and overflow of the note number field “B2”.

The Windows Media Player MIDI overflow vulnerability is a great example of a vulnerability being disclosed by the vendor, hackers realizing the usefulness of the vulnerability, and in-the-wild code appearing a short while after. Thankfully, both ClamAV and Snort cover the baby.mid in-the-wild exploit, exploits generate by the Metasploit module, as well as POC exploits. On the ClamAV side, the signature BC.Exploit.CVE_2012_0003-1 provides coverage for the midi file, while CVE_2012_0003-1 through 3 cover the html, payload and the .exe that is downloaded. On the Snort side, there is coverage with rule sids 20900,21159, and 21167.

MD5's of samples found in the wild up to now:

- 6249ac0674574c7df2f81801a41b85a5
- 9d63609e49e18f87973e66bdbc4236b4
- d3410dd27ba25c780abcd5c4df573303
- 1a4c84227cbf6da8724699b9b6fbb71b
- bbc2d8cb3f8ed9a3a5292408d476af14
- c91703bc8d5509003c1d0a634dcbbd06
- 2b988374bb9c0ac7d04a2999959fa978
- 17145972a2116660580f879ac690315f
- df5e0faae726386b7d2ee0fce0bfcbde
- dd47870ac7970ca8b00080d2626f7e2a
- 72d39c6837503e36b2ccec381e191b78
- f82fdcd9f1bc2caf0ffa3928648d356d
- ef8e3898330c9c4af29402776544038c

Wednesday, March 7, 2012

Some Snort discussion about Murofet, Kazy, or whatever we're calling it..

One of the fun parts about malware analysis is the name you give it.  I try to name my coverage in ClamAV similar to what other vendors are naming the same samples so there is some correlation and consistency.  Sometimes it works...this is one of the cases where it doesn't.  Various vendors call this family of malware different things, but they all seem to exhibit similar characteristics.

I've been looking into this family of malware recently, and since it has a very distinct method of operation, I thought I'd talk about it a bit, along with providing you some Snort rules to help find the malware on your network.  I'm not going to go into the in depth binary analysis here, I'll keep it simple - since most people in the Snort world will read this blog wondering about the malware from a traffic perspective, that's what I'll focus on.  Looking at what information you can glean just from watching the malware work is most of the battle.

Let's start by looking at the packet dump, as that's what we are most interested in - how it behaves on the network.

A connectivity check

It starts off with a simple connectivity check.  There is nothing of value in the check, it's simply to see if the malware can reach the internet.  Immediately following this check, it starts doing DNS lookups.  This is a really noisy piece of malware, these requests are done very quickly, and there's a large amount of them.

DNS lookups galore
These lookups are very simple to write a sig against using the detection_filter keyword, as we only wanted to be alerted when there is a huge flood of similar requests.  Since this is generally a bad thing to do, we'll write this up.


alert udp $HOME_NET any -> $EXTERNAL_NET 53 (msg:"BOTNET-CNC Possible host infection - excessive DNS queries for .eu"; flow:to_server; byte_test:1,!&,0xF8,2; content:"|02|eu|00|"; fast_pattern:only; detection_filter:track by_src, count 100, seconds 10; classtype:trojan-activity; sid:21544; rev:1;)

After searching around in DNS for a bit, outbound SYN packets start flying around with a destination port of 22292.  (As I said, I'm keeping this simple.)  Also easily sig-able using a similar methodology as the detection_filter above.  But I'm not going to write this rule as it's very easy to evade just by changing the port.  When writing Snort rules you want to focus on things that will be consistent, so you'll catch more than one variant.


I've pulled several samples of this and they all seem to exhibit the same activity.

Eventually after what seems like some switching back and forth between these types of outbound traffic, an HTTP request is made.  I've blacked out portions of this just in case these numbers identify a system in some way:


Looking at this dump there are some excellent distinguishing characteristics that we may want to use in a rule.

1. HTTP/1.0
2. The URI is rather unique.
3. Host is in China.
4. No Referer.

Here's our rule:

alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"BOTNET-CNC Win32.Trojan.Murofet variant outbound connection"; flow:established,to_server; content:".php?w="; nocase; http_uri; content:"&n="; distance:0; http_uri; pcre:"/\.php\x3fw\x3d\d+\x26n\x3d\d+/U"; content:"HTTP/1.0"; metadata:policy balanced-ips drop, policy security-ips drop, service http; reference:url,www.virustotal.com/file/aeab4913c8bb1f7f9e40258c323878969b439cf411bb2acab991bba975ada54e/analysis/; classtype:trojan-activity; sid:21440; rev:2;)

The above "Murofet" rule has been in the hands of customers since the 27th of February.

Content match number 1:
".php?w="

Content match number 2:
"&n="

Both could seem rather common, so let's use our PCRE to get rid of false positives.

.php\x3fw\x3d\d+

verifies that content match #1 has at least one digit after it until it hits:

\x26n\x3d\d+

Our second content match, also, verifying there is at least one digit in it.  We restrict all of those to http_uri so we don't wind up looking at the URI in the Referer field, cookie, body, or whatever.

Our last content match is for "HTTP/1.0".  While still common in today's "web 2.0" world, HTTP/1.1 is much more common.