One of the simpler ways to identify malware-infected machines communicating with their command and control servers is to watch for known malicious User-Agent strings in HTTP requests. For those not familiar with them, User-Agent strings are added to almost all HTTP queries on the Internet, and are designed to uniquely identify the particular browser, tool, or other piece of software generating the query, so that appropriate data can be returned based upon the type of system issuing the request. As an example, different JavaScript might be supplied by an intelligent web server to Internet Explorer vs. Chrome or Firefox, to deal with cross-browser compatibility issues; or a server might redirect a user to the mobile version of a web page if their User-Agent string declares the request to be coming from an iPhone or an Android device. While it might seem counter-intuitive that malicious programs would overtly declare themselves in something that servers are trained to investigate regularly, the VRT has seen countless such declarations in the wild, including "User-Agent: Malware" in one request highlighted in my Malware Mythbusting talk last year.

Unfortunately, web proxies sometimes alter the original User-Agent string generated by a client system as a given HTTP request traverses the proxy system. For example, we recently ran into a scenario where the BlueCoat Proxy was rewriting all client User-Agent strings to "Mozilla/4.0 (compatible;)", which was generating false positives with SID 21444. While the rule has since been fixed - a crucial space was unintentionally stripped from the spot just before the closing parenthesis during an automated rule-checking process - knowing that the User-Agent strings were all being mangled in this way leads us to a very important point about the efficacy of these types of rules in production environments.

Realistically, what this means for you is that if you've got an IDS inspecting client traffic headed out to the Internet - and that IDS is positioned after a proxy server, but before a final outbound link to the Internet (a typical configuration, as many IDS users place their devices just inside the firewall, at a final choke point that all internal devices must go through on their way out to the Internet) - you're going to be missing valuable data if your proxy server is rewriting User-Agent strings. Certainly, there are many proxy servers that do valuable detection of their own based on User-Agent strings; while researching this post, for example, I noted that BlueCoat systems can be configured to block outbound connections over HTTP for requests without a User-Agent string, a behavior which I would strongly encourage based on things like binary C&C channels operating over HTTP. That said, if you're expecting that your IDS will help you find infected clients based on User-Agent strings in this type of a configuration, you'll end up with a false sense of security, since the IDS will never see the indicators it's looking for.

Do yourself a favor the next time you've got a free moment, and go double-check this on your network if you're running a web proxy of some kind and you're not entirely certain how it behaves. If you are suffering from this lack of visibility, consider positioning your IDS in a way to see the original queries, disable User-Agent rewriting on your particular proxy server, or training the proxy server to look for known-malicious User-Agent strings. You may be surprised at what you find once you're properly configured for detection.