By Sam Dytrych and Jason Royes.
Executive summary
Modern automobiles are complex machines, merging both mechanical and computer systems under one roof. As automobiles become more advanced, additional sensors and devices are added to help the vehicle understand its internal and external environments. These sensors provide drivers with real-time information, connect the vehicle to the global fleet network and, in some cases, actively use and interpret this telemetry data to drive the vehicle.
These vehicles also frequently integrate both mobile and cloud components to improve the end-user experience. Functionality such as vehicle monitoring, remote start/stop, over-the-air-updates and roadside assistance are offered to the end-user as additional services and quality of life improvements.
All these electronic and computer systems introduce a lot of different attack vectors in connected vehicles – Bluetooth, Digital Radio (HD Radio/DAB), USB, CAN bus, Wi-Fi and, in some cases, cellular. However, like any other embedded system, connected vehicles are exposed to cyber attacks and security threats. Some of the threats that connected vehicles face include software vulnerabilities, hardware-based attacks and even remote control of the vehicle. During some recent research, Cisco's Customer Experience Assessment & Penetration Team (CX APT) discovered a memory corruption vulnerability in GNU libc for ARMv7, which leaves Linux ARMv7 systems open to exploitation. This vulnerability is identified as TALOS-2020-1019/CVE-2020-6096.
CX APT represents the integration of experts from the NDS, Neohapsis, and Portcullis acquisitions. This team provides a variety of security assessment and attack simulation services to customers around the globe. The CX APT IoT security practice specializes in identifying vulnerabilities in connected vehicle components. For more on this vulnerability, you can read the full advisory here. CX APT worked with Cisco Talos to disclose the vulnerability and the libc library maintainers plan to release an update that fixes this vulnerability in August.
Analyzing the initial overflow
As engineers and programmers, we have a lot of inherent assumptions about the behavior of library functions, particularly well-established standard libraries such as libc. When these assumptions prove not to be true, program execution can be altered and result in undefined behavior. In this case, Cisco uncovered a vulnerability in the ARMv7 implementation of memcpy() that was able to cause the program to enter an undefined state and allow for the conditions of remote code execution in the target application. Ultimately, this vulnerability in memcpy() causes program execution to continue in scenarios where a segmentation fault or crash should have occurred. This unexpected behavior can result in a scenario where program execution continues with corrupted runtime state leading to exploitation opportunities.
While conducting a recent penetration test of a connected vehicle, an integer underflow vulnerability was identified within an embedded web server. This embedded web server was found to be externally exposed through a vehicle's Wi-Fi network, meaning that anyone with access to the network would be able to access this web server. While this integer underflow ultimately allowed for achieving remote code execution on the vehicle, the behavior of the memcpy() function on the embedded device was far more interesting.
The vulnerable embedded web server was found to be written in C++. When the web server received a large GET request, it was observed to crash and generate a segmentation fault. Naturally, this crash proved interesting and warranted further investigation.
Further analysis traced the crash to the code snippet shown below. This code was reconstructed from the embedded web server executable image.
Figure 1- Reconstructed Code Snippet The logic above attempts to solve the problem of parsing an HTTP request until the end-of-line characters have been found. In HTTP requests, the end-of-line characters are defined by the CR/LF character sequence, represented in hexadecimal as 0x0D and 0x0A. It is important to note that this particular implementation only needs to find one of these characters (either the CR or the LF) for the line to end.
The state of the parsed GET request is kept in the sLineBuffer structure. This structure is composed of four elements:
Figure 2– sLineBuffer Structure from Disassembly
- bufsz – The size of the buffer
- nl_pos – The offset into the buffer that the CR/LF characters were found
- len – The length of the current line
- buf – The current buffer being parsed
To go from integer underflow to remote code execution, the above HTTP request parsing loop is iterated four times to properly set up the stack for PC control.
First iteration On the first iteration of the loop, the sLineBuffer structure is populated entirely with the default values.
- bufsz = 2048
- nl_pos = 0
- len = 0
Figure 3– State of sLineBuffer Structure at Beginning of Exploit
Due to both sLineBuffer->len and recv_len being set to 0, the 'for' loop at line 10 will be skipped and execution will continue downward to the 'recv' function at line 23. The 'recv() function will then read 2048 bytes from the socket and write to the location sLineBuffer->buf[0].
When recv() returns, the recv_len variable will be set to the return value, 2048, and execution will continue downward to line 31, where sLineBuffer->len will be set to be equal to recv_len.
Second iteration At the beginning of the second iteration, the sLineBuffer structure contains the values set at the end of the previous step.
- bufsz = 2048
- nl_pos = 0
- len = 2048
Figure 4– State of sLineBuffer Structure at Start of Second Iteration
Execution will continue to the for() loop on line 10 which will search the received request for the CR/LF characters. The for() loop will iterate between the start of the buffer and the offset sLineBuffer->len + recv_len. However, because the size of both sLineBuffer->len and recv_len are 2048, the for() loop will end up iterating past the end of the buffer and search the stack for CR/LF characters.
Figure 5 – The for() Loop Will Search for CRLF Characters Past the End of the Buffer If the request has no CR/LF characters within the first 2048 characters, the next CR/LF character to be found will be sitting on the stack past the end of the buffer. In our case, another newline character was found at offset 2760.
Figure 6 – Newline Character Found at Offset of 2760
When the newline is found, the nl_pos and len variables in sLineBuffer are updated accordingly (see Lines 14 and 17). The variable sLineBuffer->nl_pos will be set to 0xAC9 (2761) and sLineBuffer->len will be set to 0xfffffd37 (-713).
Third Iteration Shortly into the third loop iteration, a call to memcpy() is performed as seen on Line 7. However, as shown in the previous step, the value for sLineBuffer->len at this point contains a negative value.
Figure 7 – Register Contents at Call to memcpy()
As shown above, the function argument for num, number of bytes to copy, is set to 0xfffffd37. This hexadecimal value is equivalent +4294966583 or -713 in unsigned and signed decimal form respectively. Because memcpy() expects an unsigned integer, this call to memcpy() should attempt to copy 4,294,966,583 bytes and cause a segmentation fault.
However, in our case the segmentation fault did not occur and memcpy() returned successfully.
Analysis of memcpy
After the call to memcpy(), the address for the memcpy() implementation is loaded into the PC register and execution shifts to the first instruction. The ARMv7 implementation of memcpy() is specific to that architecture.
Figure 8– Assembly of ARMv7 memcpy() Implementation
Due to ARM calling conventions, the memcpy() function parameters are stored in the following registers.
- R0 – Destination Address
- R1 – Source Address
- R2 – Number of Bytes to Copy ('num') As seen on the second line, the CMP instruction will compare the value for num (number of bytes to copy) with sixty-four (64). By definition, the CMP instruction in ARM will subtract the value of the operand from the register value to generate the appropriate condition codes.
CMP{cond} Rn, Operand2
The CMP instruction subtracts the value of Operand2 from the value in Rn. This is the same as a SUBS instruction, except that the result is discarded.
These condition codes are stored in the first four bytes of the Current Program Status Register (CPSR).
Figure 9– Layout of ARM CPSR Register with Condition Codes Set in Most Significant Four Bits
After num is compared with sixty-four (64), the BGE instruction on the third line will interpret these results and branch accordingly.
If 64 or more bytes need to be copied, then program execution will take the branch and execution will shift to address 0x405ffcb4. If less than 64 bytes need to be copied, then the branch will not be taken and execution will continue downward. The remaining lines of this implementation are a series of ARM NEON instructions that are optimized for copying amounts under 64 bytes.
However, in this scenario, the value for num that was passed into this memcpy() call is equal to 0xfffffd37 (set in the caller function before memcpy() was called). Because register R2 contains a negative value, the CMP instruction will result in a negative result, thus setting the condition codes accordingly. The condition codes before and after the CMP instruction are shown below.
Figure 10– Condition CPSR Before CMP Instruction
Figure 11– Contents of CPSR Register After CMP Instruction
As shown, the value of the condition code changes from 0 to 1.
The BGE is a signed branch – it is the signed greater than or equal branch. Thus, BGE will not branch if the negative ('n') condition code is set. As a result, when presented with a negative length value, the memcpy() function will not take this branch and instead will continue on to copying less than 64 bytes.
This means that instead of copying the full 4,294,966,583 bytes which is the unsigned value of our num parameter, only the least significant number of bytes will be copied.
To exhibit this vulnerability and the differences between the memcpy() implementation on ARMv7 versus other platforms, a small test program was written. This program attempts to copy a total of 0xfffffd37 (+4294966583 or -713 in unsigned and signed decimal form respectively) bytes to a location in memory.
Figure 12 - Code of Test Program to Show Differences in memcpy()
When run on other platforms, this program segmentation faults, because the num argument to memcpy(), 0xfffffd37, is interpreted properly as a 'size_t' value (which is an unsigned integer).
Figure 13– Test memcpy() Program Running on ARMv7 Architecture
Figure 14– Test memcpy() Programming Running on x64 Architecture
Surprisingly, the ARMv7 implementation of memcpy() does not treat the num parameter as a 'size_t' value, but rather as a signed integer. The num value of 0xfffffd37 is interpreted as -713 which means only 55 bytes will be copied. Once copied, this program will successfully complete and the program will exit.
Because the memcpy() definition expects the number of bytes to copy (num) to be an unsigned integer, signed branch operations should not be performed on the num parameter at any point during the memcpy() implementation. Instead of BGE, the unsigned greater-than-or-equal equivalent should be used in its place. This would ensure that the num parameter is treated as unsigned and memcpy() will function as expected and not lead to undefined behavior in the caller program
Finishing the exploit
With memcpy() copying far less than our 4,294,966,583 bytes and instead copying 55 bytes, program execution returns from memcpy() (instead of segmentation faulting) and continues processing the GET request.
Figure 15- Reconstructed Code Snippet
Third iteration (continued) After memcpy() returns, we are still left with the negative value for sLineBuffer->len. The following values are set for the variables in the sLineBuffer structure.
Figure 16 - State of sLineBuffer Structure After Call to memcpy()
- bufsz = 2048
- nl_pos = 2761
- len = -713 As execution continues downward, the recv() function at line 24 is reached again. To determine where to write the data to, the program uses sLineBuffer->len as an offset into sLineBuffer->buf. However, since sLineBuffer->len is negative, this will cause recv() to write the contents up the stack, overwriting the sLineBuffer structure.
Figure 17 - Stack Before and After recv() Function Overwrite
Fourth iteration With the sLineBuffer structure overwritten to contain the values we want, the final iteration of the HTTP parse loop will use our overwritten values to gain complete control of the recv() function.
Shortly after the final call to recv() is made, we can see that recv() saves registers R0 to R3 on the stack along the link register (LR). The stack contents before and after this operation are shown below.
Figure 18 - Contents of the Stack Before recv() Saves the Registers
Figure 19 - Contents of the Stack After recv() Saves the Registers
As shown the return address (contents of register LR) are saved on the stack at location 0x762806f8.
However, because the recv() function uses the variables of the sLineBuffer structure as its function arguments, we can use our overwritten values to control where and how much to write.
Figure 20 - The Stack After Overwriting the Return Address with recv()
With this, we can override the PC and execute arbitrary code.
Just before returning, recv() will attempt to POP some of the saved registers (specifically the LR register) off the stack. Because the saved LR register value was overwritten with our buffer, we now have control of where recv() returns to.
Figure 21- End of recv() Function Overwriting the LR Register with Overwritten Value Before Returning
As you can see, the value at the address that the stack pointer (SP) points to is the address of our ROP gadget located within the mcount() function.
Figure 22 - Return Address from recv() Overwritten with Address of ROP Gadget
After popping this value off the stack, the LR register is overwritten with our controlled return address and we have successfully overwritten the return address of recv(). This instruction from mcount() will serve to POP our arguments off the stack and shift program control to system() whereby we can gain remote code execution on the connected vehicle.
Figure 23 - Reverse Shell Returned from Connected Vehicle
Conclusion
As engineers and programmers, we have many inherent assumptions about how standard library functions operate. When these assumptions fail to be true, the resulting behavior can be damaging to program integrity and result in exploitable vulnerabilities.
In our case, a vulnerability in the ARMv7 implementation of memcpy() that was able to cause the program to enter an undefined state and ultimately allow for remote code execution. When exploited, this memcpy() vulnerability causes program execution to continue in scenarios where a segmentation fault or crash should have occurred. This unexpected behavior can result in a scenario where program execution continues with corrupted runtime state leading to exploitation opportunities.
The Customer Experience Assessment & Penetration Team (CX APT) specializes in identifying vulnerabilities in connected vehicle components and represents the integration of experts from NDS, Neohapsis, and Portcullis acquisitions. This team provides a variety of security assessment and attack simulation services to customers around the globe. More information on the CX APT IoT security practice can be found here.