What is RC4?
RC4 was designed by Ron Rivest of RSA Security in 1987. RC4 is a fast and simple stream cipher that uses a pseudorandom number generation algorithm to generate a key stream. This key stream can be used in an XOR operation with plaintext to generate ciphertext. The same key stream can then be used in an XOR operation against the ciphertext to generate the original plaintext.
While it is still common in malware, RC4 has been legitimately implemented in a number of areas where speed and privacy are of concern. In the past, both WEP and TLS both used RC4 to protect data sent across the wire. However, last Fall, Microsoft recommended that customers disable RC4 by enabling TLS1.2 and AESGCM.
For more information including a detailed history of RC4, check out the Wikipedia article.
Why is it used in malware?
Increasingly, we find that RC4 is used to encode data that is sent to a remote server to be decrypted on the other side using a preshared key. This makes detection a bit trickier (but not impossible) and also makes it harder to determine exactly what is being sent across the wire. What we will usually do when we think we've come across some sort of encryption is determine the source of it and whether the data being sent is static (for matching purposes) and what exactly that data is.
How does it work?
*Note: For these examples, I will be using a variant of the Coremex Search Engine Hijacker (MD5: 70E2090D5DEE18F3E45D38BF254EFF87) after it has resumed its suspended child process.
RC4 is implemented in two main phases:
1. A Key Scheduling Algorithm is executed using a symmetric key to create an array of 256 bytes (0x100h).
2. This array is then used in a pseudorandom number generation algorithm to generate a cipher stream that can be decoded using the same key.
Many books and internet articles will represent the Key Scheduling Algorithm (KSA) with the following Pseudocode:
for i from 0 to 255
S[i] := i
endfor
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap values of S[i] and S[j]
endfor

To better understand how the algorithm works, split it into multiple sections.
Section 1:
Create and Initialize the Substitution Box
for i from 0 to 255
S[i] := i
endfor

This section creates an array (or an "SBox"/Substitution Box) where each value equals its position in the array from 0255 (0x000xFF) , this is also known as its identity permutation:
This initial tablecreation is a key indicator when looking for this type of encryption in malware samples. For this sample, the RC4 KSA was initialized using the following loop in x86 assembly code:
100020E5 xor eax, eax ; Initialize counter to 0
loop:
loop:
100020E7 ; Give each array index its identity value
100020E7 mov [eax+ecx], al ; using EAX as a counter/value:
100020E7 ; S[0] = 0x00 ... S[256] = 0xFF
100020EA inc eax ; Increment counter by 1
100020EB cmp eax, 100h ; Compare counter value to 256 (0x100h) // NOTE THE 100h!
100020F0 jl short loop ; Loop around if counter < 256
Note the instruction at 0x100020EB 100h is a great value to search a binary for in a disassembler like IDA Pro. Looking for an instruction that is comparing a register to 100h can often point you in the right direction, especially if you know the malware is using RC4 ahead of time.
While looking at the memory dump that [eax+ecx] points to after this loop completes, you can see the newlyconstructed SBox that looks like the one above:
0012FBB0 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ................
0012FBC0 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F ................
0012FBD0 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F !"#$%&'()*+,./
0012FBE0 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 0123456789:;<=>?
0012FBF0 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F @ABCDEFGHIJKLMNO
0012FC00 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F PQRSTUVWXYZ[\]^_
0012FC10 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F `abcdefghijklmno
0012FC20 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F pqrstuvwxyz{}~.
0012FC30 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F Ã‡.Ã©Ã¢Ã¤Ã Ã¥Ã§ÃªÃ«Ã¨Ã¯Ã®.Ã„.
0012FC40 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F .Ã¦Ã†Ã´Ã¶Ã²Ã»Ã¹Ã¿Ã–Ãœ¢£.PÆ’
0012FC50 A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF Ã¡ÃÃ³ÃºÃ±Ã‘ÂªÂº¿¬¬½¼¡«»
0012FC60 B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF ¦¦¦¦¦¦¦++¦¦+++++
0012FC70 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF +++¦¦++¦+
0012FC80 D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF ++++++++¦_¦¦¯
0012FC90 E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF aÃŸGpSsÂµtFTOd8fen
0012FCA0 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF =±==()÷˜°··vn²¦
Now that the table has been initialized, it’s time to scramble the box.
Section 2:
Scramble SBox with Key “0006” (ASCII 0x30303036)
j := 0
for i from 0 to 255
j := (j + S[i] + key[i mod keylength]) mod 256
swap values of S[i] and S[j]
endfor

This routine takes the initialized table and performs various byteswaps against the table using the key and its length (keys can range from 1>255 bytes in length). Here is how this sample implemented this routine. Note that the exact assembly instructions will vary amongst compilers, platforms and languages.
100020F4 loop: ; ECX = S[0]  EDI = j
100020F4 mov eax, esi ; Initialize EAX
100020F6 cdq ; EAX > EDX:EAX (with sign)
100020F7 idiv [esp+0Ch+keylen] ; EDX = i mod keylen
100020FB mov bl, [esi+ecx] ; BL = S[i]
100020FE mov eax, [esp+0Ch+key] ; EAX = key
10002102 movzx eax, byte ptr [edx+eax] ; EAX = key[i mod keylen]
10002106 add eax, edi ; EAX = (j + key[i mod keylen])
10002108 movzx edx, bl ; EDX = S[i]
1000210B add edx, eax ; EDX = (j + S[i] + key[i mod keylen])
1000210D and edx, 0FFh ; Another way to mod 255
10002113 mov edi, edx ; j = (j + S[i] + key[i mod keylen])
10002115 mov al, [edi+ecx] ; AL = s[j]
10002118 mov [esi+ecx], al ; S[i] = S[j]
1000211B inc esi ; i++
1000211C cmp esi, 100h ; Check if i < 256 // NOTE THE 100h!
10002122 mov [edi+ecx], bl ; S[j] = S[i]
10002125 jl short loop ; Loop if Less
In IDA Pro, the SBox Scramble loop following the Initialization loop may resemble these basic blocks:
Manually calculating at least the first few bytes of this example with a pencil and a piece of paper will help make it more clear how the bytes are swapped to generate this new SBox:
Initialized SBox:
For the first byte of the Key “0006”, ( Key[0] ) is “0", remember this is ASCII “0x30>”:
j := (j + S[i] + key[i mod keylength]) mod 256
swap values of S[i] and S[j]
i = 0 // first round
j = (j + S[i] + key[i mod keylength]) mod 0x100
= (0 + S[0x00] + key[0 mod 4]) mod 0x100
= (0 + 0 + key[0]) mod 0x100
= (0 + 0x30) mod 0x100
= 0x30 mod 0x100
= 0x30
S[0x0] = 0x30
S[0x30] = 0x00
After bytes S[0x00] and S[0x30] are swapped, the resulting table looks like this:
i = 1 // second round
j = (j + S[i] + key[i mod keylength]) mod 0x100
= (0x30 + S[0x01] + key[1 mod 4]) mod 0x100
= (0x30 + 1 + key[1]) mod 0x100
= (0x31 + 0x30) mod 0x100
= 0x61 mod 0x100
= 0x61
S[0x1] = 0x100
S[0x61] = 0x100
After bytes S[0x01] and S[0x61] are swapped, the resulting table looks like this:
The algorithm will continue to perform this calculation 256 times. Note that these values will continue to be swapped out and will even swap previouslyswapped bytes as well. Using the key “0006”, the malware sample will end up generating the following SBox on the stack (I added the corresponding SBox array indexes for visualization purposes only):
The next step is to use this newlycreated SBox to encode the data. This is done by creating a keystream using the SBox and this algorithm. The result, K is then used in an XOR operation with each byte of the plaintext to generate the encrypted data.
This routine takes the modified SBox and again performs various byteswaps against the table. It then uses this information to generate the keystream(K). This stream is XOR'd against the plaintext until all of the plaintext has been encoded. If the length of the plaintext exceeds the length of the keystream, the stream starts over at K[0]. Here is how this sample implemented the routine:
Note that this sample used the following structure (other implementations may use u_char for indexes) to store the SBox and its two counters:
In IDA Pro, the RC4_Crypt loop may resemble these basic blocks:
Once the length of the plaintext is met, the keystream K is completely generated. As each value of K was generated, it was used to XOR the complimentary byte of the plaintext, in this case it looked like this:
I implemented RC4 in Python, treating input as strings and outputting the SBox contents both before and after scrambling.
After bytes S[0x01] and S[0x61] are swapped, the resulting table looks like this:
The algorithm will continue to perform this calculation 256 times. Note that these values will continue to be swapped out and will even swap previouslyswapped bytes as well. Using the key “0006”, the malware sample will end up generating the following SBox on the stack (I added the corresponding SBox array indexes for visualization purposes only):
S[00]  0012FBB0 18 8A 98 7B16 35 F4 A8C0 A5 53 94D0 0D 87 90
S[10]  0012FBC0 2B 11 BA 2608 25 C7 75EB C6 83 D420 12 73 DB
S[20]  0012FBD0 1B 4E FF D3EF 72 50 2EB9 33 AF DC6C C9 42 8C
S[30]  0012FBE0 BC 29 3A E8EC 3B E7 5444 F5 C3 3F3C A9 32 17
S[40]  0012FBF0 59 60 DF 23F0 6A B7 898B 43 7E C247 A3 37 A6
S[50]  0012FC00 34 A7 67 95D8 B1 46 D956 28 A2 5B7D 4C 41 7F
S[60]  0012FC10 5E AE 85 88B2 9C 9B 0F0A AB 8D 6EED 96 40 92
S[70]  0012FC20 45 1A F9 CEB0 3E 9D 1D68 1E E3 132A 51 D6 B4
S[80]  0012FC30 EE 58 D5 E1D1 BB 39 4A4F 15 07 B880 69 E4 FC
S[90]  0012FC40 5A 21 A1 1C7C 9A 0E 5FFD CB 02 B5FA BD 57 86
S[A0]  0012FC50 E9 8E CA E55D 19 6F AA4D CD 71 F2BE 49 0B E2
S[B0]  0012FC60 F1 79 A0 D2B6 DD F6 F82F E6 78 C152 CF 05 04
S[C0]  0012FC70 E0 6D 70 9799 24 FE 064B 91 76 A4B3 FB 63 09
S[D0]  0012FC80 81 64 00 825C C5 EA 36AD 03 C8 0C1F 84 48 C4
S[E0]  0012FC90 74 31 01 5562 66 8F 9F38 61 F7 BF27 7A 22 AC
S[F0]  0012FCA0 9E 65 77 F36B 2C DE DA30 14 3D CC2D 93 D7 10
Section 3:
Generate the Key Stream and Encode Data
i := 0
j := 0
for x from 0 to len(plaintext)
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
K := S[(S[i] + S[j]) mod 256]
output K ^ plaintext[x]
endfor

The next step is to use this newlycreated SBox to encode the data. This is done by creating a keystream using the SBox and this algorithm. The result, K is then used in an XOR operation with each byte of the plaintext to generate the encrypted data.
This routine takes the modified SBox and again performs various byteswaps against the table. It then uses this information to generate the keystream(K). This stream is XOR'd against the plaintext until all of the plaintext has been encoded. If the length of the plaintext exceeds the length of the keystream, the stream starts over at K[0]. Here is how this sample implemented the routine:
Note that this sample used the following structure (other implementations may use u_char for indexes) to store the SBox and its two counters:
struct rc4_state
{
u_char perm[256]; // SBox
__int32 index1; // i
__int32 index2; // j
};
This sample encodes various data about the victims machine and sends the data encoded with this RC4 stream to its Command and Control server. This section of the malware just happens to be encoding a hash of one of my system files. The original hash that it encodes is: EA497F6BD6555BA85127CE083A513BE8:
10002174 loop:
10002174 mov ecx, [ebp+68h+state.index1] ; ECX = i
10002177 inc ecx ; i += 1
10002178 and ecx, esi ; i = i mod 0x100
1000217A mov [ebp+68h+state.index1], ecx ; Store i
1000217D lea edx, [ebp+ecx+68h+state] ; EDX = *S[i]
10002184 movzx ecx, byte ptr [edx] ; ECX = S[i]
10002187 add ecx, [ebp+68h+state.index2] ; ECX = j + S[i]
1000218A and ecx, esi ; ECX = (j + S[i]) mod 0x100
1000218C mov [ebp+68h+state.index2], ecx ; j = (j + S[i]) mod 0x100
1000218F mov al, [ebp+ecx+68h+state.perm] ; AL = S[j]
10002196 movzx ebx, byte ptr [edx] ; EBX = S[i]
10002199 mov [edx], al ; S[i] = S[j]
1000219B mov eax, [ebp+68h+state.index2] ; EAX = j
1000219E mov [ebp+eax+68h+state.perm], bl ; S[j] = S[i]
100021A5 mov eax, [ebp+68h+Plaintext] ; EAX = Plaintext
100021A8 mov edx, [ebp+68h+state.index1] ; EDX = i
100021AB movzx edx, [ebp+edx+68h+state.perm] ; EDX = S[i]
100021B3 lea ecx, [edi+eax] ; ECX = *Plaintext[x]
100021B6 mov eax, [ebp+68h+state.index2] ; EAX = j
100021B9 movzx eax, [ebp+eax+68h+state.perm] ; EAX = S[j]
100021C1 add eax, edx ; EAX = S[i] + S[j]
100021C3 and eax, esi ; EAX = (S[i] + S[j]) mod 0x100
100021C5 mov al, [ebp+eax+68h+state.perm] ; AL = S[(S[i] + S[j]) mod 0x100]
100021CC xor [ecx], al ; Plaintext[x] ^ Output K
100021CE inc edi ; x++
100021CF cmp edi, [ebp+68h+arg_4] ; Check if x < len(Plaintext)
100021D2 jb short loop ; Loop if x < len(Plain
In IDA Pro, the RC4_Crypt loop may resemble these basic blocks:
Once the length of the plaintext is met, the keystream K is completely generated. As each value of K was generated, it was used to XOR the complimentary byte of the plaintext, in this case it looked like this:
To decrypt the ciphertext, simply reverse the process:
Exercise:
Putting it all together with Python
I implemented RC4 in Python, treating input as strings and outputting the SBox contents both before and after scrambling.
*Note: since this script treats input as a string, you would have to send raw bytes for nonASCII characters. In the example above, this can be accomplished like this:
./rc4Gen.py 0006 `perl e 'print "\xEA\x49\x7F\x6B\xD6\x55\x5B\xA8\x51\x27\xCE\x08\x3A\x51\x3B\xE8"'`

I've linked the Python code here: rc4Gen.py
Very nice explanation! Thanks for posting. How is the key, "0006" in your example, typically protected? Is it usually obfuscated in some way? Is it changed for every instance of the code?
ReplyDeleteFor your example, you can use:
ReplyDelete`echo ne "\xEA\x49\x7F\x6B\xD6\x55\x5B\xA8\x51\x27\xCE\x08\x3A\x51\x3B\xE8"`
instead of invoking perl.
Thanks for the replies.
ReplyDeleteThe key can definitely be obfuscated until it is needed. As far as how it is protected, there are endless possibilities in how that can be accomplished. However, it would have to be in the clear during the key stream generation. Setting breakpoints around that section should reveal the key. Though unpacking and using only one byte of the key at a time wouldn't be impossible. In that situation, setting logging breakpoints would be needed to reveal the key.
Excellent call on using echo instead of perl. I'm not sure why I went with perl for the example. I appreciate the suggestion!