Thursday, April 22, 2010

A New Detection Framework

We just completed a talk here in Dubai on some detection capability research the VRT has been doing.  The subtitle of the presentation, "What would you do with a pointer and a size?" pretty much sums up the potential of the project.  It all started last December at the SANS IDS conference.  In talking to both attendees and presenters, it became clear there was a lack of capability for high-end security and response personnel.  Repeatedly we were asked about providing a greater depth of detection, dropping a file to disk for longer analysis and logging packets for an extended period of time.  In short, there were solutions needed that weren't being provided.

So Patrick Mullen and I sat down and started fiddling with some ideas.  I worked on deep parsing and detection on PDF files and Patrick worked on ways to provide me the full file data.  Initially we had an SO rule that grabbed PDF files and called my PDF parser.  We got it working, and it was pretty sexy.  But it blocked the Snort process and clearly wasn't the way to go.  It did, however, show that we were on to something.

Lurene, Patrick, Nigel and I then locked ourselves in a room and hammered out the initial design of what would come to be known NRT, the Near Real Time detection project.  The project goals were straightforward, if not easy:  Create a system that allowed arbitrary data sources to pass data to specialized detection systems and provide every scrap of data we could back to the incident response teams.

With this laid out, I got a hold of Mike Cloppert, one of the guys we had spoken to at the IDS conference.  We scheduled a call with the team he works with and discussed with them what they wanted out of a detection system.  At the completion of the call, we all were quite pleased.  Everything they had asked for was already in the design, and quite a bit more as well.  We were on the right track.

Coding began.  This involved every person on the VRT and a lot of late nights.  Our goal for the first phase of POC was to prove that we could use Snort as a datasource for a system that would then provide analysis out of band with network traffic and alert back into the system.  At the end of a hectic month of coding (along with all of our other work) we had a static preprocessor that pulled files off the wire and passed them to a PDF detection module, a ClamAV engine and a pure logging module.  The end result was the capability to thread out (non-blocking) detection of PDf files, handling the common evasion techniques for PDF files and then alert back to Snort:

04/21-11:17:58.1271873878 [**] [300:3221225473:1] URL:/wrl/first.pdf Hostname:wrl Alert Info:Probable exploit of CVE-2009-0658 (JBIG2) detected in object 8, declared as /Length 29/Filter [/FlateDecode/ASCIIHexDecode/JBIG2Decode ]  [**] 

{TCP} ->
04/21-11:17:58.12718738780:0:0:0:0:0 -> 0:0:0:0:0:0 type:0x800 len:0x0 -> TCP TTL:240 TOS:0x10 ID:0 IpLen:20 DgmLen:1280
***AP*** Seq: 0x0  Ack: 0x0  Win: 0x0  TcpLen: 20
55 52 4C 3A 2F 77 72 6C 2F 66 69 72 73 74 2E 70 URL:/wrl/first.p
64 66 20 48 6F 73 74 6E 61 6D 65 3A 77 72 6C 20 df Hostname:wrl 
41 6C 65 72 74 20 49 6E 66 6F 3A 50 72 6F 62 61 Alert Info:Proba
62 6C 65 20 65 78 70 6C 6F 69 74 20 6F 66 20 43 ble exploit of C
56 45 2D 32 30 30 39 2D 30 36 35 38 20 28 4A 42 VE-2009-0658 (JB
49 47 32 29 20 64 65 74 65 63 74 65 64 20 69 6E IG2) detected in
20 6F 62 6A 65 63 74 20 38 2C 20 64 65 63 6C 61 object 8, decla
72 65 64 20 61 73 20 2F 4C 65 6E 67 74 68 20 32 red as /Length 2
39 2F 46 69 6C 74 65 72 20 5B 2F 46 6C 61 74 65 9/Filter [/Flate
44 65 63 6F 64 65 2F 41 53 43 49 49 48 65 78 44 Decode/ASCIIHexD
65 63 6F 64 65 2F 4A 42 49 47 32 44 65 63 6F 64 ecode/JBIG2Decod
65 20 5D 20                                      e ] 


Detection was extremely accurate and specific to the triggering condition of the vulnerability.  The PDF parser inflated the JBIG2 stream, handled any encoding and then looked at the specific conditions required to exploit the reader.  It fully detects attacks generated by the Metasploit framework.  In fact, it was good enough to uncover a bug in the Metasploit JBIG2 module which has now been fixed.  By allowing additional detection, above what is done by the Snort engine now, to occur outside of the packet stream, we are able to provide much more data back to the user.  Which got us to thinking about Javascript...

Anyone who has looked at Javascript data associated with exploits knows that there are often long, random names assigned to variables.  We decided to check for that by jamming all of the variable names together and then doing an entropy check.  If the variable was too random, we'd alert.  For example, one attack file, when taking all the JavaScript variables and putting them together, we get:


Which in turn, leads the NRT to fire:

[**] [300:2147483653:1] URL:/wrl/first.pdf Hostname:wrl Alert Info:The JavaScript variables in object 6, declared as /Length 5994/Filter [/FlateDecode/ASCIIHexDecode ] , show a high degree of entropy [**]

We were in detection nirvana.  Anything we wanted to do, know matter how much processor it took, was available to us.

While sitting in Dubai on day one of HitB, Lurene came up with an idea of how to analyze unescaped data to find shellcode.  The process went like this:  Grab the PDF off the wire, inflate the JavaScript object, determine that it is JavaScript, normalize the unescape() calls and pass the data to a custom nugget written by Lurene.  This nugget then uses heuristics to discover the encoder type, decodes the shellcode and then returns data about the shellcode found.  The result:

[**] [300:3221225482:1] URL:/wrl/first.pdf Hostname:wrl Alert Info:Reverse TCP connectback shellcode detected. Connecting to on port 4444  [**]

This data didn't come from seeing data to port 4444 on host, it came from interpreting shellcode that was unescaped in a compressed object in a PDF that we pulled off the wire. we're excited.

But this system had to be open and it had to be extensible.  It had to be flexible and it had to be verbose in its logging.  So here is what we came up with:


This component is the heart of the system.  It handles data sources and detection nuggets. It manages a central database of all known good and known bad files and URLs.  Additionally, it keeps track of known good and bad sub-components (JS in PDF, for example), so that detection speed is improved and so that we can alert on data subsequently found to be bad. Finally it creates a complete log of detection by writing out not just the original file, but also the normalized versions of the segment of code that creates the alerts.


We want to be able to provide data into the system from any arbitrary location.  Capture a file off the wire with Snort, grab the file via a Milter, pass the file into the system from ClamAV or just hook on-open on a windows system and pass it to the system?   All of that should be handled and available through an API.


For any given data handler one or more nuggets should be available.  The nuggets should be able to pass data to other nuggets.  For example, a PDF nugget that finds embedded JavaScript data should be able to pass just that block into a Javascript system.


Snort registers with the Dispatcher as a Data Handler.  The Nugget Farm is populated by both a PDF and a JavaScript nugget.  Snort grabs the file and sends it to the PDF nugget.  The PDF parser finds the JavaScript block and sends it to the JavaScript nugget.  When the JavaScript nugget alerts, it sends the normalized data back to the Dispatcher.  When the PDF file alerts on the JBIG section it sends the data in the JBIG section as well as the entire file back to the Dispatcher.  The dispatcher writes each section and the associated alerts to disk in addition to the full file.  Finally it alerts into the Snort system.

There are more details, such as how we alert back in time (no sonic screwdriver required).  But we'll get to that.  For now we want to see what you would do if we handled you a pointer and a size.  So we've put up some rough (very rough) POC code at  Review the code in src/preprocessor/nrt_* to see what we're up to.  Modeling that code you should be able to write your own C code to do detection against files pulled by the system.

We've got a long way to go, with a ton of research in front of us.  There is no time-line for full release, but we're interested in seeing what you come up with.  As we create additional documentation and nail down more functionality, we'll continue updating the code.  Keep an eye on labs and the VRT blog for updates.  In the meantime, go poke around and let us know what you come up with.

Code & Dubai Presentation available at:


  1. Nice! What about dynamically submitting those files to a sandbox system, such as NORMAN, and reporting it back? For data files, submit it to the normal clients (Office, Acrobat, flash) in VMs that monitors a set of conditions (processes running, connections opened, etc) and report if something trigger those conditions. The VMs are cycled for each new file.

  2. That's powerful stuff! However, if you want people to actually contribute plugins for this, you need to devise a way to make these coded in something other than C. My recommendation: embed a Perl interpreter so that all of CPAN is available as a sourcecode library. That would've saved all that work you did writing a PDF parser so you could focus on the nuggets.

  3. Sounds like this would be pretty cool in a combination with honeyd :)

  4. Augusto: Yes.
    Martin: APIs for C, PERL, Ruby and Python will be available.
    Ry: Yes.


Post a Comment