Tuesday, May 3, 2011

Razorback Roadmap and Status Report

In which we get our first introduction to Tom Judge, the Amish Hammer.

Yep, you're right, we've been kinda quiet lately.  Some of that has been because we are the VRT in addition to the developers of Razorback and we had some big things to tackle in our other roles.  But we've also been just thinking and taking in some feedback.  We now have full time developers (and are still hiring, hint, hint) working on the project and we took the opportunity to revisit the architecture and, at a substantially more reasonable pace, decide if we were on track and where we wanted to go.  We have been working a lot on review and design, so you'll find this document somewhat lengthy.  Take a break after each list and go get some tea or something.

First, we thought again about what our goals were for Razorback.  These are the first things that developers are told when they get here and a list we frequently reference when discussing architecture changes.  These are our initial goals:

  1.  Don't get in the user's way -- By this we mean that we don't make assumptions about a user's network or needs.  We don't want to make lazy decisions by making end users jump through hoops or have to work within unnecessary and opaque requirements.  The goal is to make a framework, not a prison.

  2.  Take care of the common stuff  -- We initially designed this system to allow talented people and teams to rapidly develop detection.  For example, many of the teams we talked to showed a particular (and not surprising) interest in web, mail and dns traffic.  Since this is a broad requirement, we added specific database tables and API calls to track these traffic types.

  3.  Hide nothing and never make the user duplicate work -- Ultimately this will be up to nugget builders, but we've provided the alerting and database framework so that every piece of analyzed information or normalized data can be stored and linked to the source files.  The classic example we use to explain to incoming developers what we mean by this is the PDF file.  If we decompress an object, store that object for the user in decompressed form.  If we analyze Javascript, fix it up and rename variables for ease of use, store that normalized Javascript block.  If we find shell code, store it separately and provide an explanation of what the shell code would do if it ran.

  4.  Build from the bottom up to scale big -- From the beginning we knew that this system, if implemented at any fairly sized network would require some horsepower.  We're taking the time to do work on arbitrary amounts of inbound data.  So we don't short cut code, we think about the speed impact up front and we minimize the work we do in the dispatcher.  It has a few discrete functions, and everything else is handed out to nuggets.  We continually review architecture for excessive network traffic, unnecessary scope creep and silly (some argue "clever") design decisions.  We have a strong admiration for the simple and a profound dislike for "magic".

  5.  Let the user do what he needs to do -- You might think this is covered under 1, but this addresses the core functionality of Razorback, as opposed to operational considerations.  We try to make every component, capability and user-facing operation configurable.  We try to ensure that we provide timely and readable logging.  But most of all, when it comes to detection, we want to let entities build on the framework to get it to do what they need it to do.  We know there are really smart people out there, and we want to let them be smart quickly.  So we provide user-definable flag fields in data-block, IP address, IP block, user and AS blocks to track information in an enterprise-specific way.  We allow analyst notes to be attached to just about any database entity.  We allow users to define their own data types that we route as easily as any of our provided types.  We allow arbitrary reporting.  In short, we want to be a framework, not a prison. (Yes, I know I'm repeating myself)

So where is Razorback now?  We've got a 0.1.5 release tagged and available as a tarball up on sourceforge (http://sourceforge.net/projects/razorbacktm/files/Razorback/razorback-0.1.5.tbz/download).  This is, pending something awesome in the way of a bug, the last of our development on the POC code we initially released at DEFCON last year.  It is fairly functional, and works well enough to demonstrate our thinking.  But we're in the midst of reimplementing it at a more sane pace, hopefully for the better.

Sourcefire has given us not just developers, but also the time to continue this as a research application.  We have our final design pretty much laid out and we think we know where we're going.  But we have the leeway to spend time testing solutions, constantly refactoring code and proving to each other that we are headed in the right direction.  If we find we've developed ourselves into a corner, we'll be able to backup and rethink our approach.  We're taking full advantage of this.

First, a caveat:  This is a list of things we hope to get done.  This is not a contract, this is not a guarantee.  But this is the way we're currently headed.  That being said, here is what we hope to get done this year:

  1. Packaging -- Razorback right now is ridiculously hard to install and configure.  Our goal is to build out a saner way to keep up with updates (all new files are in a single repository, not split between dispatcher and nuggets).  We also now provide a single configuration point to begin the build process.  We're also targeting a install process to help a new user get up to speed quickly so they can start playing.

  2. Security -- Both encryption and AAA services are being built into the system.  Authorization must occur at the earliest point possible and encryption options must be available for all traffic that could reveal data blocks, reporting or forensic data.  Also, silly implementation errors result in extreme mocking of developers and we hand over code to our tame hackers to generate more embarrassment.

  3. Networking -- IPv6 must be supported throughout the platform.  We must be nice to network operations folks and not transmit things on the network we don't have to.

  4. Operational interfaces --  Razorback 0.1.5 doesn't do a good job of communication at an operational level what is going on.  Incident response teams can get a lot of information, but the admin of the box is short on options to get insight into what is going on.  During architecture of components in the new build of Razorback, we're keeping a close eye on configuration options, verbose logging, metrics and fault-tolerance.  To assist in this, a real-time admin interface will be made available to the dispatcher.

  5. Data transfer -- We're implementing both caching and queueing services to ensure that we get data off of collectors as quickly as possible.  The queueing approach reflects the non-interactive nature of the collector-dispatcher-detection architecture and provides support for horizontal scaling.

  6. Database improvements -- I did the database schema which means two things:  1)  Its wrong and 2) Someone else needs to fix it.  We're going to work on building out a database interaction that is implementation agnostic.  We should support more than just MySQL.  The schema needs to be normalized to the maximum extend possible while ensuring it still supports enterprise-specific needs.  Also, we need to move to UTF-8 to support international language sets.

  7. API -- The API is going to be updated to support high-latency detection by returning to the dispatcher a deferred response.  This means that the nugget will take a while to process and the dispatcher should check back.  One example would be submission to a web-based analysis front end.  The nugget would store the information necessary to return to that site later in the dispatcher.  The dispatcher would manage a queue of deferred detection so any compatible nugget can pull from the queue and query the website to see if the response is ready.

  8. Scripting -- So I was jacking around with Maltego and I built some custom database connectors.  The way they allow this is you call something and pass arguments via stdin and then return results via stdout and stderr.  It is genius in its simplicity and ease of implementation.  This allows us to provide scripting support under any language provided they accept and return data in well-formed blocks.  This is actually one of my favorite updates and should help response teams rapidly roll out detection.

  9. Data storage -- We store the binary data blocks in the relational database.  There are many ways to describe this practice, but the way Watchinski (our boss) described by somehow simultaneously rolling his eyes, laughing like a hyena and demanding we fix it.  As it works out, this was my idea.  So we're looking at a number of solutions for data storage from the mundane (FTP, HTTP) to the exotic(ish) NoSQL and map-reduce sort of things.  This is a key research area because we want to allow searching of files to find indicators of compromise, etc...  This change will also affect how we submit data to the system, so we aren't clogging up the dispatcher with huge blocks of data while we wait for processing.

  10. Scalability -- This is a late-year requirement, but we want to go huge with wide sets of dispatchers and deep sets of detection.  There isn't really a timeline for this, but development is going on with an eye towards this requirement.

So...that's our small list of things we're up to.  We're working on two-to-four-week development cycles and we will be releasing stable (for some value of stable) tarballs each quarter.  The Q2 release is already being built and currently is in trunk.  The Q2 dev cycle is laid out like this:

 — Implement and prove end-to-end data transfer via the queueing system and updated API

 — Implement and prove database and local and global caching systems that relate to datablock handling

 — Architect and implement the response capability for detection nuggets including alerting and data block judgement

 — Provide preliminary support for large data-block transfers to dedicated file storage and notification to detection systems to pull the block instead of getting it over the queue

 Don't go for tea now, we're almost done.  You can do it!)

We've finished phase 1 and are working on design and testing requirements for phase 2.  Our goal is to have all of these completed by the end of June.  Phase one uses ActiveMQ and the Stomp protocol to manage data transmission and command and control.  This allows us to use the ActiveMQ authentication system so that nuggets can not communicate to the system unless they have the proper credentials.  Routing is also functional now, with multiple data types routing to multiple application types.  We also now support the "any" data type so that a nugget will receive any data that is provided to the system.  This supports logging nuggets and anti-virus solutions.

Well, that is where we are and where we think we're headed.  We'll let you know when we have an RC for the Q2 release, and as we flesh out specific requirements for future releases we'll provide those as well.  In the meantime, checkout the outstanding documentation provided by Mr. Tom Judge, our newest developer and total svn ninja.  It lays out everything you need to know about the code currently being worked on in trunk.  You can find it at https://sourceforge.net/apps/trac/razorbacktm/.

This is an open source project, if you want to contribute code either in the form of nuggets or other functionality, we welcome your participation.  If you have a question or comment about either participating in the development of the project or the project road map hit the mailing list, we'd love to hear from you.

No comments:

Post a Comment