One of the most fundamental tasks performed by many software programs involves the reading, writing, and general processing of files. In today's highly networked environments, files and the programs that process them can be found just about everywhere: FTP transfers, HTTP form uploads, email attachments, et cetera.
Because computer users interact with files of so many different varieties on such a regular basis, Oracle Corporation has designed tools to assist programmers with writing software that will support these everyday tasks: Outside In Technology (OIT). From the OIT website: "Outside In Technology is a suite of software development kits (SDKs) that provides developers with a comprehensive solution to extract, normalize, scrub, convert and view the contents of 600 unstructured file formats."
In April, Talos blogged about one of the OIT-related arbitrary code execution bugs patched by Oracle. The impact of that vulnerability, plus these additional eighteen OIT bugs disclosed in this post, is severe because so many third-party products use Oracle's OIT to parse and transform files. A review of an OIT-related CERT advisory from January 2016 reveals a large list of third-party products, especially security and messaging-related products, that are affected. The list of products that, according to CERT, rely on Oracle's Outside In SDK includes:
- Avira AntiVir for Exchange - antivirus protection for Microsoft Exchange
- IBM WebSphere Portal - provides enterprise web portals
- Google Search Appliance - search all content in an enterprise through a single search box
- Guidance Encase - forensic investigation software
- Microsoft Exchange - enterprise email and productivity software
- Novell Groupwise - a collaboration tool for large enterprise
- Raytheon SureView - software designed for enterprise visibility and user activity monitoring
- Veritas (Symantec) Enterprise Vault - a program for information governance through archiving
Talos has not confirmed that each of the third-party products listed above are affected. We have, however, confirmed that some are running vulnerable OIT-related code. For example, if WebReady Document Viewing is enabled for Microsoft Exchange 2013 (& earlier), an attacker could exploit these vulnerabilities by sending a malicious email attachment to a victim who then opens the email using web preview.
Further, if Data Loss Prevention is enabled, the vulnerability can be triggered simply by sending an email with a malicious attachment outbound from the affected Exchange server. If Avira AntiVir for Exchange (v12.0.2775.0 & earlier) is in place, just sending or receiving a malicious email is sufficient, since this program will scan all inbound and outbound email. Additionally, multiple OIT vulnerabilities could conceivably be exploited in a chained fashion for a more effective approach. Talos therefore encourages users to follow up with these vendors directly for more information regarding the scope of the impact of these vulnerabilities.
Table of Contents
- PDF /Size Integer Overflow
- TIFF ExtraSamples Code Execution
- TIFF Photometric Interpretation Code Execution
- GIF ImageWidth Code Execution
- Gem_Text Code Execution
- PSI Image File Code Execution
- Word DggInfo Code Execution
- Mac Works Database VwStreamSection Code Execution
- Mac Word ContentAccess libvs_word+63AC Code Execution
- BMP Heap Buffer Overflow & Code Execution
- Mac Works VwStreamReadRecord Memory Corruption
- PDF /Kids Information Leakage
- PDF NULL Pointer Dereference Denial of Service
- PDF Recursion Stack Overflow Denial of Service
- PDF /FlateDecode /Colors Denial of Service
- PDF /Type /Xref Denial of Service
- PDF Xref Offset Denial of Service
- Mac Word ContentAccess libvs_word Denial of Service
The trailer object gives the "location of the cross-reference table and of certain special objects within the body of the file". In it there are several fields like /ID, /Root, /Size and /Info. /Size holds the number of objects in the PDF.
A "large" /Size, will cause issues with the Oracle OIT PDF parser. Despite the fact that Oracle's parser checks for integer overflow, it later multiplies the result by 4 (left shift), negating any protection offered by the previous overflow checks..
.text:B74ECE59 mov edi, eax  .text:B74ECE5B shl edi, 4  .text:B74ECE5E mov [esp+6BCh+s], edi .text:B74ECE61 call _SYSNativeAlloc  .text:B74ECE66 mov edx, [esp+6BCh+arg_10] .text:B74ECE6D mov [edx+1D6Ch], eax  .text:B74ECE73 test eax, eax .text:B74ECE75 jz loc_B7At , the value in `eax` comes straight from the 32-bit rounded value from the /Size element. At , it is multiplied by four therefore invalidating the integer overflow check that was done previously. A `malloc` wrapper is called at  and the returned pointer is saved at . If a /Size value is chosen carefully, it can lead to an integer overflow at  in the first basic block such that a small value is passed to SYSNativeAlloc at . The problem arises when, due to rounding, the heap allocator returns a pointer to a bigger heap chunk than requested.
For example, if the /Size value is specified to be 0x10000001 it will pass the check before allocation, but when shifted by 4, it becomes 0x10, making a small allocation. Depending on an underlying allocator, the actual size of the allocated chunk would be bigger. In case of Linux, the returned chunk will be 24 bytes long and subsequent `memset` will only initialize the first 16 bytes. If only first 16 bytes of the buffer are initialized, the code will be accessing memory that has not been initialized to zero. This leftover data present in uninitialized memory can cause memory corruption, potentially leading to code execution.
TIFF files are also capable of triggering vulnerabilities that can lead to remote code execution. This vulnerability in the Oracle OIT SDK is a result of insufficient memory allocation on the heap when parsing TIFF files with the 'ExtraSamples' tag present in the Image File Directory (IFD). In this case the ImageWidth, SamplesPerPixel, BitsPerSample, and ExtraSamples values are considered standard for a TIFF file, however the inclusion of ExtraSamples is key to triggering the vulnerability. The inclusion of the ExtraSamples tag allows for a potential heap based overflow as the additional bits are not accounted for upon allocation.
In 1992, the TIFF file format specification was updated, and extensions were added to accommodate new image types. Originally, TIFF files only supported four image types: Black & White, Grayscale, RGB, and Palette-Color. The updated TIFF specification included a new CMYK (color-separated) image type. To specify the TIFF image type a field called "PhotometricInterpretation" is used. A TIFF file having the "PhotometricInterpretation" level set to 5 (CMYK/color-separated format) will cause the Oracle SDK to follow an alternative code path when compared with other settings. This alternative code path allows for the ImageWidth value to be used in an unchecked allocation, and eventually creates a heap overflow.
Besides PDF and TIFF, GIF files can also be a source of danger. The ImageWidth value should describe the absolute width of a given GIF, and should be smaller than the Logical Screen Width value present in the same file. This vulnerability in Oracle's Outside In SDK is triggered when parsing a GIF with an ImageWidth in an Image Descriptor block set to 0xFFFF. An ImageWidth set to 0xFFFF triggers an integer overflow, and leads to an unbounded memory write in two branches of the same function in libvs_gif.so.
GEM metafiles are files containing instructions for rendering pictures in the vector drawing program Gem Draw. An integer overflow vulnerability exists in file parsing code of Oracle Outside In Technology libim_gem2 library. While parsing a Gem metafile data, an unchecked memory allocation is performed. As a result, a specially crafted Gem file can trigger an integer overflow, leading to multiple heap based buffer overflows, and potentially, remote code execution.
A parsing vulnerability exists in Oracle's Outside In Technology libim_psi2 library. Specifically, there is an integer overflow which leads to an erroneous memory allocation, and subsequently a large-sized memory copy operation. While parsing a PSI image file, a 2 byte size field is read and sign extended. This value is then used in memory allocation and a subsequent `memmove` call. The read size value is increased by 8 before an area of memory is allocated, but the original size is used in the `memmove` call.
While parsing a malformed OLE file with a crafted DggInfo element contents, a vulnerability
in Escher drawing parsing library, libvs_eshr, can be triggered. When the ID of the first child
of DggContainer is changed from 0xF006 (Dgg) to 0xF007 (BSE), this leads to parser confusion and ultimately, a 4 byte value from the file is used as a pointer in a 'cmp' instruction. If the comparison fails, the same pointer is used in an indirect 'call' instruction leading to arbitrary code execution.
When parsing a Mac Works Database document memory is being written in a loop using
a counter with an upper value read from a byte in the file. No size checks are performed after the arithmetic operations resulting in an out-of-bounds memory write.
When parsing a Mac Word document a single-byte value from a file is used as a starting value for a counter which is used in arithmetic operations for memory access. No size checks are performed after the arithmetic operations resulting in an out of bounds 4 byte memory write.
While parsing a specially crafted ICO file, an unchecked value specifying bitmap width
is used to calculate the size for the memory write operation. Compression method must be set to 0x01 or BI_RLE8. While reading the file, a piece of memory on the heap is effectively overwritten by zeros. The size of this overwrite is unchecked and comes straight from the bitmap width. This can lead to heap data structures overwrite with NULL bytes. In the supplied test case, the out of bounds null byte write overwrites a function pointer which leads to a crash. By carefully tweaking the size of the overwrite, a function pointer on the heap can be manipulated and arbitrary code execution achieved.
When parsing a Mac Works Database document, memory is being written in a loop using a counter in destination address calculations. No size checks are performed after the arithmetic operations resulting in a partially controlled 2 byte overwrite.
The vulnerability is present in `VwStreamReadRecord` function in libvs_mwkd.so library (with image base at 0xB7F89000), specifically starting in the following basic block:
.text:B7F8ACF6 movzx eax, [esp+3Ch+var_12] .text:B7F8ACFB mov edx, [edi+31Ch] .text:B7F8AD01 mov ecx, ebp .text:B7F8AD03 mov [edx+eax], cl .text:B7F8AD06 movzx eax, word ptr [esp+3Ch+var_10]  .text:B7F8AD0B movzx esi, [esp+3Ch+var_12]  .text:B7F8AD10 mov [edi+eax*2+298h], si  .text:B7F8AD18 add word ptr [esp+3Ch+var_10], 1 .text:B7F8AD1E add esi, 1 .text:B7F8AD21 mov [esp+3Ch+var_12], si .text:B7F8AD26 cmp bp, 0F9h .text:B7F8AD2B ja loc_B7F8AE1A .text:B7F8AD31 test bp, bp .text:B7F8AD34 jz loc_B7F8ADEB .text:B7F8AD3A mov [esp+3Ch+var_1A], 0 .text:B7F8AD41 jmp short loc_B7F8AD71At  and  pre-calculated values of `eax` and `esi` are read from the stack and zero extended. At  `eax` is being used in destination address calculation and the value of `si` is being written there. Initial values of `eax` and `esi` are related, `eax` serving as a counter. No bounds checking is in place resulting in a possible 2 byte out of bounds overwrite.
A specially crafted file could be used to shift the to-be-freed pointer to an attacker controlled area which can then be used to subvert the `free()` and achieve code execution.
The pages of a PDF document are accessed through the page tree, which defines all the pages in a document. Each node in a page tree typically has entries for /Type, /Parent, /Kids, and /Count. The /Kids reference is intended to specify all the child elements directly accessible from the current node.
However, there is a vulnerability in the way the Oracle OIT PDF parser handles the /Kids reference. While parsing a PDF file with an object that contains a malformed /Kids reference, the value right after the /Kids element is interpreted as a string, where an array of references should be located. This leads to the parser expecting a pointer where the string copied from the file is located, resulting in an arbitrary read access violation. In a properly formatted PDF file, an array of at least one reference must follow after /Kids element. The bug appears in libvs_pdf.so (with base address 0x0xB74BF000):
.text:B74E71DB mov eax, [eax]  .text:B74E71DD mov edi, [esp+5Ch+var_24] .text:B74E71E1 mov eax, [eax+edi*4]  .text:B74E71E4 mov [esp+5Ch+var_4C], eax .text:B74E71E8 mov ecx, [esp+5Ch+var_34] .text:B74E71EC mov edx, [esp+5Ch+var_48]
At , `eax` points to the string copied from the file into the heap. The first four bytes of the string are used in the memory access calculation at  causing an arbitrary read access violation. If the value calculated at  ends up pointing to valid memory, the read will succeed at the controlled address. However, if the value after the /Kids element is a pure integer, a different code path is reached and the integer value is interpreted as a pointer resulting in a fully controlled arbitrary read at:
.text:B74E718A mov eax, [esp+5Ch+var_18] .text:B74E718E mov eax, [eax] .text:B74E7190 xor edx, edx .text:B74E7192 mov edi, [eax+4]  .text:B74E7195 test edi, edi .text:B74E7197 jz loc_B74E72A2
When parsing a specially crafted PDF document, a NULL pointer dereference occurs, leading
to process termination. After the parser successfully decodes the /FlateDecode encoded stream data, it proceeds to execute the operators contained within. While executing a `Tj` operator on a piece of text contained in a stream, a memory structure, probably containing charset mappings, is referenced. No NULL pointer check is made and since the structure is zero initialized this can result in a crash.
The root of a PDF document's hierarchy is the catalog dictionary, located by means of the /Root
entry in the Trailer object of the PDF file. The catalog dictionary must have the /Catalog type. While parsing a malformed PDF file which contains a reference to the /Root element with
malformed or missing an xref table, a recursive call to a function is made each time with the
same parameters. This eventually leads to a crash due to process stack exhaustion.
While parsing a PDF file which contains a /FlateDecode encoded stream, with a set /Predictor to a value other than 1, a malformed value for /Colors causes a NULL pointer dereference in libsc_ut.so library while de-initializing the decoder.
When parsing a PDF file with an object containing a stream, a missing object type specification
can lead to arbitrary pointer access. An ASCII integer value appearing after /Type element is converted into a 32-bit integer and subsequently used as a pointer in a comparison operation. In cases when the pointer is invalid, a process crash occurs.
A vulnerability in PDF parser of the OIT SDK exists that results in out of bounds heap memory access following an unchecked memory allocation operation under specific conditions.
In a PDF file an xref table contains multiple rows each containing three values (except for the first row which specifies the first object being referenced and the number of objects). The first value represents the 10 digit offset into the file where object is to be found. In a specially crafted PDF file, the OIT PDF parser uses the specified value as a parameter in a call to `realloc()` which can fail. The return value is checked for errors but is subsequently ignored. The original numerical value is then used as an upper bound in a loop where out of bounds read happens during process cleanup.
When parsing a Mac Word document a single-byte value from a file is used as a max value for a counter which is used in arithmetic operations for memory access. No size checks are performed after the arithmetic operations resulting in an out-of-bounds memory access. Calculated memory address is used as a destination operand in `or byte` instruction.
Over, and over again we see problems that arise from software using untrusted data as input without proper and necessary validation of that data, and because not all software developers are experts in the multitude of file formats in existence they are forced to rely on SDKs such as Oracle's OIT. However, the unfortunate reality is that vulnerabilities that are found in an SDK that is utilized by third-parties will take additional time to patch: First the organization that maintains the SDK issues a fix, and some amount of time later, third-parties that utilize the SDK provide an update to their customers including these fixes. This provides a rather large window of time in which miscreants can exploit vulnerabilities in third-party products.