Recently, we've been researching several threat actors operating in South Asia: Transparent Tribe, SideCopy, etc., that deploy a range of remote access trojans (RATs). After a hunting session in our malware sample repositories and VirusTotal while looking into these actors, we gathered a small collection of VBA code samples that eventually allowed us to connect certain IOCs to individual threat actors based on the final payload, victimology and submission locations. For example, if the final payload was a CrimsonRAT or ObliqueRAT sample, we would attribute the VBA code to the Transparent Tribe group.
We then created specific rules to hunt for earlier Transparent Tribe related malicious documents and found several samples attributed to the group. Interestingly, we also found a smaller subset of samples that could not be immediately attributed to the Transparent Tribe. We decided to dig a bit deeper into the anomaly and conducted additional analysis of the VBA code and payload, which could not be easily attributed to any known group.
Initial assertions on the origins of the samples we were researching.
When these samples were first discovered, security researchers attributed them to either the Sidewinder or Donot groups. Now, considering that Transparent Tribe focuses on targets in India and Sidewinder and Donut have been reported to focus their activities on targets in Pakistan, it was particularly interesting to find out that VBA code of potentially opposing groups reuse portions of the code.
Historical Transparent Tribe VBA code
Transparent Tribe (APT36) is a group operating in South Asia for more than five years. The lures they use in their malicious documents indicate that their targets are Indian government and military organizations. The main payload employed by Transparent Tribe in most of associated campaigns are variants of Crimson and Oblique RATs, which indicates that the objective of their activity is obtaining foothold and persistent remote access.
Their initial infection vector is usually email, purporting to come from official sources and containing a lure, which can be a Word document or more often, an Excel spreadsheet. For the purposes of this post, we're focusing on Excel VBA code and its evolution over time. Here, we see an early example of the VBA code employed by Transparent Tribe in May 2019.
The code establishes a hardcoded destination path for one or more payload files and then uses a Text field of a user-defined form to get the content of the payload to be dropped to disk. The payload is stored as an ASCII-encoded string of hexadecimal byte values separated by a specific separator. The string needs to be converted into a binary byte array before it is written to disk as a ZIP compressed file. The conversion is executed in a for-each loop iterating over the content of the string and converting every two characters into a binary byte with the help of the CByte conversion function.
May 2019 Transparent Tribe VBA code.
There are at least three fairly unique elements of the code we can use to hunt for additional similar samples: the unique destination folder path, the user-defined form containing the payload in an executable or a ZIP format and the for-each loop iterating over the content of the form's text field. We used those elements to retroactively hunt for similar samples and found samples that were attributed to other groups.
Later Transparent Tribe example from April 2021.
Historical Donot team VBA code
The Donot threat actor (aka APT-C-35) has operated in South Asia for more than five year. Although some of their TTPs are similar to Transparent Tribe, they focus their operations on Pakistani government and military organizationThey also use emai, but are known to be focused on installing malicious Android apps. Several samples containing the remote code exploit for CVE-2017–11882 have also been attributed to the group. Here, we are looking at VBA code of the malicious spreadsheets used to install custom made remote access trojans, keylogging and exfiltration tools.
We found a few examples of Donot VBA code, specifically from 2018 and September 2021. The similarities in the code are immediately visible, as are similarities to the Transparent Tribe code. Our records indicate that the Donot VBA code predates Transparent Tribe code, although the confidence for that is low.
Early 2018 Donot VBA code example.
One interesting aspect of the Donot VBA code is that it often contains a fake error message which is displayed to the user before or after the payload is dropped and executed. The fake message is shown below as part of a September 2021 Powerpoint sample discovered in December 2021.
Fake Powerpoint message displayed by the Donot VBA code.
September 2021 Donot VBA code sample.
Hangover VBA code
Operation Hangover may be another alias for the Donot group but it is sometimes referred to as a separate group on its own. Since sometimes the operations of Donot and Operation Hangover are tracked as separate attacks, we cannot be certain they are the same group, though based on code similarities we can assume they're closely related.
Early July 2019 Operation Hungover example.
In June 2021, while hunting for Transparent Tribe samples, we discovered malicious Excel spreadsheet "Exports promotion highlits may 2021.xls" that attempted to dropped a previously unknown RAT. This was followed in July by the discovery of the closely related spreadsheet "List of Nomination of the Candidates1.xltm". We have decided to call these samples internally "SDuser" sample, based on the specific PDB string left in the binary payload.
At the time, we quickly realized there are similarities between the VBA code of these samples and Transparent Tribe VBA code. However, we can see that the similarity between SDuser VBA code and Donot VBA code is also strong, with all the well known elements:
- Setting of the path to the folder for the payload.
- Using VBA forms to store the payload with a specific separator character.
- Using Cbyte to convert the hexadecimal strings into a binary byte array.
- Displaying a fake Excel error message.
SDUser (Donot?) VBA dropper June 2021.
SDUser (Donot?) VBA dropper July 2021.
Not only was this interesting because it was similar code, but it was shared between APT groups that had completely opposing targets. While Transparent Tribe is focusing on organizations in India, Donut is focusing on organizations in Pakistan and China.
Sduser binary payloads
Although the majority of this post is dedicated to similarities in the VBA code of the opposing APT groups, we feel it is important to document binary payloads discovered in the same period as they can shed some light on the objectives of actors behind SDuser campaigns.
The samples are found on Virustotal based on the PDB path "C:\Users\SDUSER\source\repos", which was left by the developers in the payload. Overall, there were almost 30 samples uploaded to VirusTotal, belonging to seven different projects:
- Mal – January 2021
- Test – January 2021
- 12324 – January 2021
- Evading____ - February 2021
- ConsoleApplication4 – March 2021
- Obfuscating shellcode – March 2021
- WindowsSecurity – June 2021
The functionality of the payloads was quite similar. From the upload of the first sample in January, there were three main areas where the group has experimented:
- Command and control protocol
- Anti-sandboxing techniques
- Reverse shell mechanism
The majority of samples uploaded to VirusTotal were a part of the WindowsSecurity project and contained parts of the functionality developed by the earlier project.
Anti-sandboxing checks in the Main function of the WindowsSecurity project.
The Windows security payload starts with the following anti-sandboxing checks:
- There are two or more processor cores in the system.
- There is more than 2GB of RAM installed on the system.
- The mouse cursor has moved sufficiently far between checking of two positions.
- There are more than 50 processes running on the infected system.
ConsoleApplication4, anti sandboxing checks in the most recent payload.
In the most recent payload, ConsoleApplication4 project, the attackers added anti-sandboxing checks to check for the sufficient capacity of the hard drive of the infected system as well as the presence of Virtual Box artefacts.
The payload also checks the registry for the history of USB devices added to the system. If no USB devices are detected, the payload will simply exit without connecting to the C2 server. A similar check runs for the recently opened files. The malware will continue with execution unless there are at least two recently opened files.
The objective of most of the fully functioning payloads is to launch a reverse shell to connect to an attacker controlled C2 server. The attackers are experimenting with Metasploit payloads but in the payloads dropped by the Sduser malicious documents a simple redirection of cmd.exe standard input and output to a socket connected to an attacker controlled host is used.
Some payloads also employ Telegram API and may use it to communicate with the attackers.
Setting C2 channels and launching reverse shell in the Main function of WindowsSecurity.
Finding similarity with computer algorithms
Although it is very easy for a human to spot similar patterns in the code this process is not so simple and obvious for computer programs. A lot of work has been done to improve the code and text similarity in the fields of malicious code analysis and plagiarism detection to find the closest similarities.
For the purpose of this research, we chose a few algorithms that were relatively easy to implement in Python, focusing on the Normalized Compression Distance, Jaccard index, the Winnowing algorithm for fingerprinting documents and the Python standard difflib comparison library.
For demonstration, we choose a small subset of all samples and compare them using various similarity/distance metrics.
Normalized Compression Distance
Normalized compression distance (NCD) is derived from the information distance, a concept that is related to Kolmogorov complexity. The absolute information distance between two binary strings can be expressed as the length of a minimal program which can be created to convert one string into another.
For the purpose of comparison of two strings, the distance should be normalized to show values from 0 to 1, with smaller values indicating stronger similarity between the strings we are comparing.
Unfortunately, calculating information distance as well as normalized information distance is proven to be not computable due to the non-computability of the halting problem. Luckily, it has been found that the normalized compression distance can be a good approximation of normalized information distance.
NCD can be expressed as a ratio of lengths of strings compressed with a compression function. The compression function can be any well known compression algorithm such as deflate or bzip2 although in practice, for shorter strings, better results are obtained by using lzma.
NCD works well for any binary string and it can also successfully be applied to unpacked PE executables. However, in our testing, for all comparison methods, we decided to first extract the VBA code from the generated olevba JSON output.
As the compression algorithm we used lzma and expressed NCD as the python code as a normalized ratio of lengths of compressed concatenated strings and lengths of standalone compressed strings.
NCD expressed in python code
Jaccard index is a metric which is often used in clustering algorithms and is a measure of the number of common elements in 2 sets. It is defined as the ratio of the intersection of two sets divided by the union of the two sets.
It's a measure of similarity for the two sets of data, with a range from 0% to 100%. The higher the percentage, the more similar the two populations. Although it's easy to interpret, it is extremely sensitive to small sample sizes and may give erroneous results, especially with very small samples or data sets with missing observations.
The Jaccard distance is simply a complement of the Jaccard index. In our research we simply split the code into words and use words as the elements of the sets we are comparing. Nltk, Python's framework for natural language processing contains a function that calculates the Jaccard distance.
Winnowing algorithm is often used to detect code similarity and plagiarism in students works at universities, but it can also be used for general code similarity.
The advantage of the algorithm is that it can scale by reducing the number of comparisons of n-gram fingerprints by a significant number compared to naive comparison of hashes generated for each n-gram in the document.
In our case, we can limit the size of the window for fingerprint generation to create a larger number of fingerprints for comparison as the number of files in our test is small. In addition to that, a lexer is used to tokenize the code so that the simple substitution of variable and function names is easily detected.
For our test we have used the Python copydetect module which can be used to visualize the similarity matrix from the Winnowing comparison and highlight detected similarities in the code, which cannot be done with NCD and Jaccard similarity algorithms we used.
Copydetect uses the Pygments syntax-highlighting module for lexing, which is convenient, as it supports over 300 languages. Specifically, we used lexers for vb.net and vbscript and obtained similar results.
Similarities between the sduser1 and tt2 sample as detected by copydetect
Difflib does not perform and scale as well as Winnowing, but it is still acceptable for the purpose of comparing a small number of code snippets. The module is based on a string comparison algorithm published in the late 1980's by Ratcliff and Obershelp and it also has the ability to display similarities detected between the two compared documents.
Similarity algorithm comparison
We have run all algorithms over the small subset of 7 VBA code samples (from the IOC section) and compared their results to estimate their success in detecting similarities in the code of the opposing APT groups.
Comparison of similarity algorithms on a small sample subset
The findings show that algorithm results are comparable and that they indicate code reuse between the following samples:
- Significant overlap between sduser1 with tt1 and tt2 code.
- Significant overlap between sduser1 and sduser2 code.
- Significant overlap between donot2 and hang1 code.
- Smaller overlap between hang1 and sduser1 code.
In tests, multiple algorithms can be used to improve the reliability of the results.
Similarities of the VBA code between different groups in South Asia have been previously mentioned by Tencent but here we discussed them in detail. The code reuse, which is easily visible to a human eye, is confirmed by objective code similarity detection methods.
One of the strengths of software engineering is the ability to share code, to build applications on top of libraries written by others, and to learn from the success and failures of other software engineers. The same is true for threat actors. Two different threat actors may use code from the same source in their attacks, which means that their attacks would display similarities, despite being conducted by different groups.
False flag operations are common in warfare, when one of the sides mimics TTPs of their opponents in order to falsely attribute their activities and improve chances of operation success. This is also commonly seen with cyber threat actors, with known examples, such as the Olympic Destroyer campaign and it is likely to continue in the future.
Code sharing between threat actors is to be expected. Open-source tools are a useful source of functionality, and adopting techniques from successful attacks conducted by other groups are likely to be sources of misleading evidence leading to false attribution.
We can expect sophisticated threat actors to continue to take advantage of code reuse and false flags, to integrate evidence designed to fool analysts and lead to attribution of their attacks to other groups. Attribution is already difficult and it is unlikely to become easier.
Ways our customers can detect and block this threat are listed below.
Cisco Secure Endpoint (formerly AMP for Endpoints) is ideally suited to prevent the execution of the malware detailed in this post. Try Secure Endpoint for free here.
Cisco Secure Web Appliance web scanning prevents access to malicious websites and detects malware used in these attacks.
Cisco Secure Email (formerly Cisco Email Security) can block malicious emails sent by threat actors as part of their campaign. You can try Secure Email for free here.
Cisco Secure Firewall (formerly Next-Generation Firewall and Firepower NGFW) appliances such as Threat Defense Virtual, Adaptive Security Appliance and Meraki MX can detect malicious activity associated with this threat.
Cisco Secure Malware Analytics (Threat Grid) identifies malicious binaries and builds protection into all Cisco Secure products.
Umbrella, Cisco's secure internet gateway (SIG), blocks users from connecting to malicious domains, IPs and URLs, whether users are on or off the corporate network. Sign up for a free trial of Umbrella here.
Cisco Secure Web Appliance (formerly Web Security Appliance) automatically blocks potentially dangerous sites and tests suspicious sites before users access them.
Additional protections with context to your specific environment and threat data are available from the Firewall Management Center.
Cisco Duo provides multi-factor authentication for users to ensure only those authorized are accessing your network.
Open-source Snort Subscriber Rule Set customers can stay up to date by downloading the latest rule pack available for purchase on Snort.org.
9ce56e1403469fc74c8ff61dde4e83ad72597c66ce07bbae12fa70183687b32d - Feb 2018 Donot sample (label donot1)
5efde4441e4184c36a0dec9e7da4b87769a574b891862acdb4c3321d18cbca69 - Sept 2021 Donot sample (donot2)
386ed7ba502e7bf0e60c546476c1c762cbc951eb2a2ba1f5b505be08d60310ef - May 2019 Transparent Tribe sample (label tt1)
dbb9168502e819619e94d9dc211d5f4967d8083ac5f4f67742b926abb04e6676 - April 2021 Transparent Tribe sample (label tt2)
56349cf3188a36429c207d425dd92d8d57553b1f43648914b44965de2bd63dd6 - July 2019 "Operation Hangover"sample (label hang1)
a3c020bf50d39a58f5345b671c43d790cba0e2a3f631c5182437976adf970633 - June 2021 SDuser sample (label sduser1)
3bbae53fc00449166fd9255b3f3192deba0b81b41b6e173d454c398a857b5094 - July 2021 SDuser sample (label sduser2)
188.8.131.52 - possible connection with the Sidewinder group