Tuesday, October 6, 2020

90 days, 16 bugs, and an Azure Sphere Challenge




Cisco Talos reports 16 vulnerabilities in Microsoft Azure Sphere's sponsored research challenge.


By Claudio Bozzato, Lilith [-_-]; and Dave McDaniel. 


On May 15, 2020, Microsoft kicked off the Azure Sphere Security Research Challenge, a three-month initiative aimed at finding bugs in Azure Sphere. Among the teams and individuals selected, Cisco Talos conducted a three-month sprint of research into the platform and reported 16 vulnerabilities of various severity, including a privilege escalation bug chain to acquire Azure Sphere Capabilities, the most valuable Linux normal-world permissions in the Azure Sphere context. 

The Azure Sphere platform is a cloud-connected and custom SoC platform designed specifically for IoT application security. Internally, the SoC is made up of a set of several ARM cores that have different roles (e.g. running different types of applications, enforcing security, and managing encryption). Externally, the Azure Sphere platform is supported by Microsoft’s Azure Cloud, which handles secure updates, app deployment, and periodic verification of device integrity to determine if Azure Cloud access should be allowed or not. Note however, that while the Azure Sphere is updated and deploys through the Azure Cloud, customers can still interact with their own servers independently.

Customers push signed applications to their devices grouped in an Azure Sphere Cloud Tenant (or sideload if in development mode), and are granted with extremely limited permissions by default. To use such basic features as connecting to an IP address or hostname, storing any data to disk, or even delaying software updates, a given application must pre-define these needs inside their app_manifest.json. Materially, these definitions cause the user ID (UID) of the application (which is different on every installation) to be granted specific Linux group IDs (GIDs) and/or the Linux capabilities needed to interact with the requested feature. 

Zooming out, all system applications (networkd, azcore, azured, etc.) have specific Linux and/or Azure Sphere capabilities to limit their access to only what they need. These Azure Sphere capabilities, stored and treated differently than normal Linux capabilities, limit access to critical Azure Sphere-specific interfaces and for the most part are used to limit access specifically to the ioctls of /dev/pluton and /dev/security-monitor. It's in these two devices that we find the most critical functionality,  execution in these two devices is considered the highest permissions one can gain on the device. 

Here’s a simplified logical chart of the system:



Since the Azure Sphere platform is designed as a secure IoT environment in which customers can flash arbitrary applications (less than ~600KB in size), the most relevant question for the ASSRC was: "Assuming a customer application has been compromised and code execution gained, what can be done from there?"  This is reflected in the official scope for the challenge:
  • Ability to execute code on Pluton.
  • Ability to execute code on Secure World (security-monitor).
  • Ability to execute code on NetworkD through local attack (compromised customer application) or remotely (external network).
  • Anything allowing execution of unsigned code that isn’t pure return oriented programming (ROP) under Linux.
  • Anything allowing elevation of privilege outside of the capabilities described in the application manifest (e.g. changing user ID, adding access to a binary).
  • Ability to modify software and configuration options (except full device reset) on a device in the manufacturing state DeviceComplete when claimed to a tenant you are not signed into and have no saved capabilities for.
  • Ability to alter the firewall allowing communication out to other domains not in the app manifest (note: not DNS poisoning).
For the purposes of this writeup, we will separate the 16 vulnerabilities by the above in-scope categories, and will also have a section for denial-of-service vulnerabilities which were not considered in scope (no matter the severity or if they were required for other bug chains or not). 

Unsigned code execution

For context as to why "Unsigned Code Vulnerabilities" are even worth mentioning at all: according to the Azure Sphere security model, all code running on the Azure Sphere device must be signed, either by Microsoft or the App developer. Practically speaking, this means all executable data on the device is only  located within the ASXipFS partitions that all Azure Sphere applications (and the RootFS) comprise of. Since the ASXipFS filesystem kernel driver does not support any sort of writes, and also since the Littlefs filesystem (used for storing non-volatile data) is mounted as noexec, Azure Sphere extends the concept of W^X from process memory to the disk itself, creating a defacto W^X protection for the entire device via the filesystems. For those with Android knowledge, ASXipFS effectively replaces DM-Verity, guaranteeing disk integrity without the need to constantly verify blocks and wear out flash memory.   

To keep an attacker from changing page permissions and injecting code into process memory, protections for the mprotect and mmap syscalls were implemented in a custom Azure Sphere Linux Security Module (LSM). Using the VM_MAY* flags of virtual memory pages, Azure Sphere prohibits the changing of a given page’s permissions to executable if it had ever been writable in the past (it also prohibits a writable and executable page from ever being mapped in the first place). While this protection is effective in most cases, we found five different ways to execute unsigned code, something we prioritized looking into due to our assumption that they'd be low hanging fruit (and for the most part, they were). By the third week of the challenge we filed the following two vulnerabilities against version 20.05:

Microsoft Azure Sphere Normal World application ptrace unsigned code execution vulnerability (TALOS-2020-1090)

Microsoft Azure Sphere Normal World application /proc/self/mem unsigned code execution vulnerability (TALOS-2020-1093

TALOS-2020-1090 uses ptrace to attach to a fork() of the current process and modifies the executable (non-writable) memory region where the .text section lies, via ptrace's POKETEXT command, while TALOS-2020-1093 opens and modifies the process’ memory by writing to /proc/self/mem.

While admittedly incredibly simple vulnerabilities, due to the amount of researchers participating in the ASSRC, we considered it important to submit these as soon as possible, an opinion later reinforced by fact that the /proc/self/mem vulnerability had been found and reported by Trail of Bits in a previous exercise, but had not been fixed in time for the ASSRC's start. These issues were fixed in 20.07.

We then discovered the following two unsigned code execution vulnerabilities in 20.06:

Microsoft Azure Sphere Normal World application READ_IMPLIES_EXEC personality unsigned code execution vulnerability (TALOS-2020-1128

Microsoft Azure Sphere Normal World application PACKET_MMAP unsigned code execution vulnerability (TALOS-2020-1134)

TALOS-2020-1128 uses the READ_IMPLIES_EXEC Linux personality to make all mmap calls requesting read permissions to also have exec permissions. This allows for creating an rwx map by only asking for read and write permissions in specific syscalls. This has been fixed in 20.08.

TALOS-2020-1134 uses the AF_PACKET socket type (which requires the CAP_NET_RAW capability, held only by networkd) to create a read+exec memory map holding a ring-buffer, populated by the kernel. Since the ring-buffer gets populated by the network packets received, an attacker controls the contents of the memory map, effectively gaining both write and read+exec permissions on the map. This has been fixed in 20.09.

The above two methods of unsigned code execution were definitely more interesting than the previous two, requiring us to do more than heuristically test and to actually read kernel source code. But shortly after the 20.07 release, we found another unsigned code execution method that resulted from an incomplete fix of the /proc/self/mem vulnerability:

Microsoft Azure Sphere Normal World application /proc/thread-self/mem unsigned code execution vulnerability (TALOS-2020-1138)

While the /proc/self/mem bug from TALOS-2020-1093 was fixed in 20.07, by targeting /proc/thread-self/mem it was possible to trigger the same root issue. This has been fixed in 20.08.

Denial-of-Service

It's first worth noting that, while the arguments were made that certain privilege escalation chains would require a reboot to fully complete, and also that two of these denial-of-services would require device recovery to fix, they were both still considered out of scope. We start with the simplest of service denials: In Azure Sphere 20.05, any user could just send repeated asynchronous ioctl requests to /dev/pluton and it'd knock the device over.

Microsoft Azure Sphere asynchronous ioctl denial-of-service vulnerability (TALOS-2020-1117) 

By repeatedly issuing asynchronous ioctls, it was possible to completely fill the ring buffer used by the Linux kernel to communicate with the Pluton M4 core. Pluton would continuously look for an empty spot in the ring-buffer to place these asynchronous messages, but since there were none, the watchdog would soon trigger and reboot the device. This vulnerability would probably only be useful for situations where a reboot was needed to complete another bug chain, and it was fixed in 20.07 with the complete removal of asynchronous ioctls for /dev/security-monitor and /dev/pluton

Microsoft Azure Sphere Littlefs Quota denial of service vulnerability (TALOS-2020-1129)

In TALOS-2020-1129, any user who can write to the littlefs partition (/mnt/config), can utilize a filesystem quota bypass via the truncate() syscall to put the littlefs filesystem into a state where basically any filesystem operation in /mnt/config results in an infinite loop inside the littlefs codebase and a subsequent reboot by the hardware watchdog. Since /mnt/config is, by design, persistent, this vulnerability functionally disables the device until it is manually recovered. This has been fixed in 20.09.

Microsoft Azure Sphere Pluton SIGN_WITH_TENANT_ATTESTATION_KEY memory corruption vulnerability (TALOS-2020-1139)

Another denial-of-service vulnerability was found in Pluton in Azure Sphere 20.07. Between 20.06 and 20.07, a very large amount of code changed in the /dev/pluton and /dev/security-monitor devices, with ioctl object definitions being moved from the open source Azure Sphere kernel into the respective binary blobs comprising Security Monitor and Pluton. This shifting of code also reduced the amount of Pluton ioctls from 13 down to two: PLUTON_SYSCALL (which subsumed all the old ioctls), and the lone survivor SIGN_WITH_TENANT_ATTESTATION_KEY ioctl (which eventually ends up hitting the same code path as the Pluton syscalls anyways). The submitted vulnerability lies within the latter ioctl:

While there are inherent size checks on all Pluton syscalls, the struct azure_sphere_digest argument passed into the SIGN_WITH_TENANT_ATTESTATION_KEY also contained a nested structure with a size field that was not checked, resulting in an out-of-bounds write in the Pluton memory space and denial of service as the device panics. This has been fixed in 20.09.

Information Disclosure

From the information disclosure side of things, one low-hanging issue was submitted within the first couple weeks:

Microsoft Azure Sphere kernel message ring buffer information disclosure vulnerability (TALOS-2020-1089)

This issue details how access to the kernel ring buffer possibly leaks sensitive memory contents, like kernel or processes addresses. In the advisory, we demonstrated how to leak the ASLR offset from application-manager (the init process). This has been fixed in 20.07.

In a more convoluted sequence, we were also able to dump kernel memory via the littlefs filesystem: 

Microsoft Azure Sphere Littlefs truncate information disclosure vulnerability (TALOS-2020-1130)

By writing a few bytes to a new file to cause its backing memory to be cached, invoking sys_truncate()to extend it, and then reading from that same file, we could read kernel virtual memory pages that had been freed but not cleared upon reuse. This has been fixed in 20.09.

Privilege escalation chain

First off, it's important to note that McAfee ATR also submitted a very similar chain before us that overlapped in the following two vulnerabilities, so we won't really cover these (but there’s more info in the individual writeups).

Microsoft Azure Sphere ASXipFS inode type privilege escalation vulnerability (TALOS-2020-1131)
Microsoft Azure Sphere mtd character device driver privilege escalation vulnerability (TALOS-2020-1132)

These were fixed in 20.07, and while they collided with McAfee ATR, we ended up diverging in how exactly we both corrupted the /mnt/config/uid_map file and subsequently ended up with AZURE_SPHERE_CAP_* capabilities. Instead of causing UID wrapping, we corrupted the uid_map in such a manner that our application's UID would be the same as the azured system application.

Microsoft Azure Sphere uid_map UID uniqueness privilege escalation vulnerability (TALOS-2020-1137

After this occurred, we utilized one of our many denial-of-service vulnerabilities to reboot the device, after which our application would be running as the uid of another application. This bug has been fixed in 20.08.  

Microsoft Azure Sphere Capability access control privilege escalation vulnerability (TALOS-2020-1133)

We also discovered there was no permissions check on whether a process could ptrace another process with higher AZURE_SPHERE_CAP_* capabilities (but the same UID). Thus, we skipped root entirely and went straight to the higher-privilege /dev/security-monitor and /dev/pluton ioctls by ptracing the azured process (or another system app) and injecting a shell.
This vulnerability has been fixed in 20.09.

Privilege escalation (non-chain)

An extra non-utilized vulnerability memory corruption was also discovered within the first month:

Microsoft Azure Sphere AF_AZSPIO socket memory corruption vulnerability (TALOS-2020-1118)

Simply by binding an Azure Sphere specific AF_AZSPIO socket twice, one could cause a double free in a kernel list, resulting in a null dereference and potential memory corruption in the kernel (assuming one is capable of mapping the null page). This has been fixed in 20.07.

Researchers’ retrospective

Azure Sphere was an interesting device to work on due to the constraints imposed upon the device, both in specs and security. We flashed an .imagepackage of busybox to get a decent feel for the device internals before setting up an incomplete QEMU instance that was mostly used for more in-depth testing of the kernel. For anything dealing with other cores or chips of the device, we mainly relied on flashing C programs and gdbserver for testing; using ready-made tools did not seem useful or time-efficient, both due to the complex nature and limited specs of the chipset and environment. Even a simple task like running strace within busybox was a non-trivial endeavour since there wasn't space to fit both binaries, the space for user applications is limited to around 600KB. In the ASSRC's time-limited context, keeping our tools simple helped significantly in getting to the bugs quickly. Our toolbox was basically just tmux, Vim, Binary Ninja, QEMU (eventually), and GDB (although one researcher used SublimeText, which seemed excessive).

With regard to the ASSRC itself, we felt the dual facets of monthly (but sometimes faster) updates and having around 70 people looking at the product simultaneously ended up favoring those who could work quicker and with less tooling. Going for the extremely high-value targets (pluton/security-monitor) strictly could not be done without either having an emulation setup or having a privilege escalation chain, both of which are large time-sinks in the context of a three-month competition (granted, a third option was also available, hoping that Microsoft would accept a "Assuming I have XYZ capability…" bug). And so when 20.07 rolled around, all the privilege escalation chains were broken and there's not really any way to dynamically test pluton or security-monitor anymore.

In fairness we must state that monthly updates are part of Azure Sphere’s security model which keeps the device up to date with the latest security patches. In our opinion, while this tactic works very well for fixing the known bugs promptly, it also results in a much less complete examination of the device than might have been possible, due to researchers being handicapped in ways that an attacker would never be. Lower-value reported targets got fixed periodically, repeatedly leaving researchers to find new routes to the higher level problems. We posit that this type of CTF would just result in everyone going ham on low-hanging targets, leaving the higher level and more critical attack surfaces mostly unexamined.

As a parting thought, we'd like to thank the organizers of the event and also the Azure Sphere technical staff who were all extremely helpful in this collaboration. While there were difficulties in this process regarding higher level concepts (e.g. consistency of bug submission responses, both in quality and haste), we consider the ASSRC to have been an overall positive experience and boon for the security posture of the Azure Sphere platform as a whole. 

Conclusion

During the course of the three-month Azure Sphere Security Research Challenge, Cisco Talos found and reported 16 vulnerabilities in the Azure Sphere platform, (TALOS-2020-1141 is not yet fixed so it remains unpublished). For an in-depth look into each of the vulnerabilities, please refer to the individual vulnerability writeups listed above for more information.

Coverage

The following SNORTⓇ rules will detect exploitation attempts. Note that additional rules may be released at a future date and current rules are subject to change pending additional vulnerability information. For the most current rule information, please refer to your Firepower Management Center or Snort.org.

Snort Rules: 54501-54504, 54645-54648, 54680-54683, 54701-54702, 54729-54732, 54829-54830.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.