Monday, March 21, 2016

Malware Word Search: Identifying Angler's Dictionary

This post authored by Steve Poulson with contributions from Nick Biasini.

Exploit kits are constantly evolving and changing. We recently wrote about some subtle Angler changes but then Angler changed drastically on March 8. In this blog post, we will briefly cover these changes, examining different characteristics of the URL structure for Angler and the origins of the words being leveraged to create them.

New Angler
Beginning on March 8, Talos noticed some major changes to the URL structure for Angler. These changes were drastic and have altered every part of the URL for the landing pages. Let's first look at the old syntax

This landing page includes all the recent changes we observed. This particular landing page was delivered at 9:09 UTC on March 8. Then, at 9:12 UTC, the change began with us seeing landing page URL's similar to

Finally at 9:28 UTC we saw it evolve into what was the final version (until the next day):

A single server hosted all this, so in the space of 20 minutes we saw a proxy server rotate through several different iterations of the landing page until arriving at the final version listed above. This was true for all the Angler proxy servers we observed during the period. 

This speaks to the resiliency of Angler and how well distributed the functions are currently. There is a reason that the attackers separate the proxy and exploit servers, and points largely to some sort of automatic updating since the server wasn't down for an extended period of time and appeared to continue functioning while it was being updated.

Since that time we have seen one subtle additional change landing at its final URL structure 

The only difference between the last two landing pages is the amount of digits in beginning of the subfolder; it dropped from ten to five. We finally ended up with the following syntax: /topic/[5 digit number]-word1-word2-word3-word4-word5-word6-word7-word8/ 

UPDATE: We recently began seeing Angler change again and is now using seven words instead of eight in the URL for the landing page.

This change to a topic folder followed by a five digit number and eight hyphen, separated words resembles a popular forum engine, IP.Board. Below are a couple URL's that use this blog software, which illustrates how closely Angler has impersonated the software

Over the next several days we started analyzing the landing page URLs and began a focus on the words that were being used.

Dictionary Analysis

We extracted thousands words from the landing page URLs all of them unique and a number of them quite obscure, such as epigrammatic, atropine, and umbrageous.

The question is where do these words originate. It is easy enough to find dictionaries on the Internet. However, we can reject a number of them by simply the fact that words in the URLs do not appear in these dictionaries. Ideally, we want the dictionary with fewest words that are all in the URLs

However, after some investigation three were good fits.

To find the best match we collected the frequencies of two characteristics from the words observed in these samples. (Observed)

  • Length of the words
  • Starting letter

Similarly, for each of the dictionaries the expected frequencies of those characteristics were calculated (expected) and plotted to show the similarity. Pearson’s Correlation Coefficient was also calculated as an objective measure of similarity – the one closest to 1.0 indicates the most similar dictionary.

Length of Words
All dictionaries show a good fit. Although, wordlist-d.txt under predicts seven, eight and nine length words and expects more longer words. 2of12inf.txt gives a better fit and Corncob looks the best fit.

Looking at the correlation corncob.dict gives the value closest to 1.

Starting Letter Frequency
Again all dictionaries fit the observed pattern. wordlist-d.txt slightly under and over predicting letters. Again corncob.dict looks the best.

This is again shown by the correlation coefficients, which again are close to 1.0 with corncob.dict the closest to 1.

We conclude that corncob.dict is a likely candidate for the source of words used by the Angler URL generation code because all the words seen in the URLs are found in that dictionary, even the obscure ones, and the distribution of lengths and starting letters are very close. Additionally, corncob has the fewest words which means finding all 1776 words in the URLs by chance is unlikely. This insight may lead to better detection and may also explain the source of some of the other generated words seen in Angler traffic. It also offers a technique to identifying the generation mechanisms in other traffic such as DGAs.

As a result of the recent Angler exploit kit changes we are releasing the following updated rule: 38228 The domains observed are blocked via Cisco’s Domain reputation systems as soon as we see them, as well as the Flash and Silverlight files being used for exploit in the wild.
For the most current rule information, please refer to your Defense Center, FireSIGHT Management Center or

Advanced Malware Protection (AMP) is ideally suited to prevent the execution of the malware used by these threat actors.

CWS or WSA web scanning prevents access to malicious websites and detects malware used in these attacks.

The Network Security protection of IPS and NGFW have up-to-date signatures to detect malicious network activity by threat actors.

1 comment:

Post a Comment