image

Preamble

 

Logical exploitation is becoming more and more prevalent in place of memory exploitation in face of advancing hardware and software protections. This is most apparent in XNU based operating systems, where hardware and software protections are by far most abundant, but also as a platform agnostic form of exploitation where supply chains are concerned (log4j) and also on Linux and Windows (the recent PolKit vulnerability disclosed by Qualys, and the infamous pop ss vulnerability in earlier versions of Windows 10). This article will focus mainly on local exploits, though it is to be noted that remote ones do exist, such as log4j.

Logic bugs are also a piece of hot contention between researchers and vendors – “Rotten Apples”, our macOS codesigning vulnerability, as initially demonstrated on x64h hardware, does not constitute a vulnerability by virtue of the limitations of execution on x64 hardware, where code is allowed to execute without a certificate at all. Had it been demonstrated on ARM64E hardware, perhaps Apple’s arbitration would be different.

We will be exploring three notable vulnerabilities  goto fail;, pop ss and pwnkit

goto fail; How A Single Line of Code Undermined iOS 7’s Security framework

 

CVE-2014-1266, or “goto fail;”, was a logic bug in XNU’s Security.framework circa iOS 7. As described by Apple1:

”Secure Transport failed to validate the authenticity of the connection. This issue was addressed by restoring missing validation steps.”

The code in question?

goto fail;

goto fail;

The existence of two consecutive goto fail; statements in the SSLVerifySignedServerKeyExchange function is the source of the issue.

Quoting root-cause analysis from security vendor Synopsys2

Although the indentation of the lines makes it appear as though they’ll both get executed only if the predicate in the if-statement is true, the second one gets executed regardless of whether the predicate is true or false. If the indentation is corrected, the problem becomes more obvious:

if ((err == SSLHashSHA1.update(&hashCtx, &signedParams) != 0 ) {
  goto fail;
goto fail;
...

Since the SSLHashSHA1.update call will generally not return an error, the value of err will almost always be zero when the second goto fail; statement is executed. And what happens when goto fail is executed? The return value of zero is provided to the caller, who believes that the signature verification on the “Server Key Exchange” message passed.

Circa 2014, this vulnerability had been something of a meme. What’s important to take away from this however is that the core issue is not one of memory corruption of the stack or heap followed up by a code reuse attack (ROP or JOP), but an issue in program flow. Should this issue be present today on an ARMv8.5 chip with MTE and PAC, assuming the compiler did not optimize it out, these much-lauded hardware protections would serve as nothing but a paperweight. Software hardening could reduce the impact via sandbox profiles, but the up and coming MTE and already present PAC would do nothing to prevent this bug as it is not reliant on memory corruption or even arbitrary code execution.

Pop ss : Gaining Elevated Code Execution From Userland

 

THE FOLLOWING IS A SIMPLIFIED, YET STILL HIGHLY TECHNICAL SUMMARY AND ASSUMES FAMILIARITY WITH OPERATING SYSTEM INTERNALS AND X86/X64 ISA SEMANTICS, AND MAY BE PARTIALLY INCORRECT IF ONE WERE TO CONSIDER EVERY DETAIL.

 

Logic bugs are not limited to operating systems – they can also occur in software implementations according to hardware documentation. This is where the pop ss (or mov ss ) bug comes into play, a vulnerability affecting all major x86_64 processors found by Nick Everdox (Everdox) of Riot Games, and Nemanja Mulasmajic (0xNemi), formerly of Riot Games. The bug is centered around the instruction: pop ss (or mov ss), executed on a processor when “(it) is executed with debug registers set for break on access to a relevant memory location and the following instruction is an ​ INT N or ​ SYSCALL.

For those not versed well in x86/x64 assembly – SYSCALL does as it says – executes a syscall, and INT N calls an interrupt handler of number N (with 0x80 often being used for syscalls on Linux). In this case, SYSCALL would let you enter the INT 01 handler on the user stack (because the syscall instruction doesnt change RSP and then it immediately fires the #db after SYSCALL finished execution). INT 3 or any INT N instruction where the OS sets the DPL (privilege level of a segment) to 3, you would the INT 01 handler with a potentially bad state. As detailed later on and in the paper, both of these scenarios were leveraged for local privilege escalation.

Syscalls are functions within an operating system usually reserved for low-level computing or use by the kernel, but may be exposed to the user via a wrapper — for example, ptrace on Unix systems. It should be noted that the SYSCALL instruction and syscalls are not interchangeable though it may seem so – SYSCALL is an instruction and ptrace would be the code implementing the syscall. These operations are often security sensitive and may occur in kernel-mode. Continuing on, the paper (viewable here) explains that operating system developers often assume an uninterruptible state when executing this code. However, this can cause OS supervisor software (also known as a hypervisor) built with these implications in mind to erroneously use state information used by unprivileged software.

Many are familiar with the Ring(0…3) model when it comes to operating system internals. For the most part, modern operating systems (in this case, referring to modern Microsoft Windows) don’t have three rings, (more akin to user-mode, Ring 3, and kernel mode, Ring 0 – 1-2 are unused, and while the hypervisor is a separate security domain it does not apply to the x86 privilege model concept of “rings”) In the case where a system like Microsoft Windows has the system hypervisor enabled (Hyper-V) (or in the case of Linux, KVM) this software could use information provided by unprivileged software – running in Ring 3. The easiest explanation of this is that due to unclear documentation in the reference manuals for both AMD and Intel CPUs at the time, an attacker could call an interrupt handler from a userland stack, leading to local privilege escalation, EDR bypass, or execution of arbitrary code, if performed correctly.

The example provided has a user calling INT 01 on $sp (with information provided by a lesser-privileged program, such as a malicious GSBASE). As most operating systems determine the need to SWAPGS based on previous execution mode, things can get messy. The vulnerability, when leveraged with additional pointer leaks inside the Windows kernel, gained a user arbitrary unsigned kernel code execution. Due to changes introduced with Meltdown such as KPTI, changes to the exploit development process must be carried out in order to achieve success, such as using a “KPTI trampoline”, a technique based on the idea that “if a syscall returns normally, there will be a piece of code in the kernel to swap pages back to userland ones” – so reusing that code can achieve our goal. This is covered by swapping page tables with SWAPGS and IRETQ.

The short of it is that even Redmond’s best (and Linux kernel contributors, etc.) implemented handling of the pop ss instruction in a way that allowed undefined behavior and eventually kernel code execution. The vulnerability was later disclosed as both CVE-2018-8897 & CVE-2018-1087. In the end, the code is only as good as the developer, their understanding of the instruction set, and the documentation provided by the vendor, all factors that lead to the implementation of this multi-OS bug.

 

pwnkit: The Latest and Greatest

 

Recently Qualys disclosed a vulnerability in polkit dubbed pwnkit – a logic bug in a subcomponent of pwnkit known as pkexec. In short, executing pkexec, a SUID binary with argc < 1 would:

  • Set integer n to 1;
  • Cause an out-of-bounds pointer read from argv[1]
  • Cause an out-of-bounds pointer write to argv[1]
  • Use of a tainted env could allow usually-sanitized environment variables such as LD_PRELOAD to execute as normal, as seen with the gconv primitive Qualys provides.
435 main (int argc, char *argv[])
 436 {
 ...
 534   for (n = 1; n < (guint) argc; n++)
 535     {
 ...
 568     }
 ...
 610   path = g_strdup (argv[n]);
 ...
 629   if (path[0] != '/')
 630     {
 ...
 632       s = g_find_program_in_path (path);
 ...
 639       argv[n] = path = s;
 640     }

 

INITIAL LOGIC FLOW FOR PKEXEC BUG (SOURCE: QUALYS PWNKIT ADVISORY).

 

Wait… that sounds an awfully lot like memory corruption, doesn’t it? The issue of if pwnkit is or is not a memory corruption issue at heart isn’t exactly something one would call “hotly contested”, but it does raise a few eyebrows. The belief that it is not a memory corruption and more of a logic bug stems from the fact that it is once again a control-flow problem with little besides out-of-bounds reads and writes. Moreover, the exploit primitive widely used does not rely on memory corruption. To quote Qualys:

– if our PATH environment variable is “PATH=name”, and if the directory “name” exists (in the current working directory) and contains an executable file named “value”, then a pointer to the string “name/value” is written out-of-bounds to envp[0];

– or, if our PATH is “PATH=name=.”, and if the directory “name=.” exists and contains an executable file named “value”, then a pointer to the string “name=./value” is written out-of-bounds to envp[0].”

-In other words, this out-of-bounds write allows us to re-introduce an “unsecure” environment variable (for example, LD_PRELOAD) into pkexec‘s environment; these “unsecure” variables are normally removed (by ld.so) from the environment of SUID programs before the main() function is called. We will exploit this powerful primitive in the following section.

It’s also noted that this bug could have been introduced with a simple change over which exec wrapper is being called – calling one where envp[0] (the payload) is not taken into account would prevent the bug entirely. (execvp, execv, execl to name a few) as it’s dependent on execve taking into account envp when executing.

In the end, all one needs to do to achieve arbitrary code execution and thus LPE is a folder, file, and correctly put-together env. There’s potentially other ways that aren’t as dependant on solibs and could be used for a “one-file” exploit, but this exercise is left to the reader. (Hint: /proc/ pseudo filesystem, ln(1) ).

The beauty of the bug, unlike pop ss/ mov ss, is how easy it is to both trigger and understand. As mentioned in goto fail, with the rise in ARM computing, the hardware memory protections such as PAC would (likely not) trigger, unless envp[0] was a protected pointer via PAC, though on x86, due to the lack of a code-reuse attack, Intel’s CET or shadow stack would do nothing.

 

Prologue

 

With these three exploits, one can see how logic bugs are a powerful tool in any exploit engineer’s toolkit due to ease of use and the general lack of any major memory corruption. Going forward with Apple’s shift to ARM entirely, Intel implementing new hardware features to mitigate code reuse attacks, and more, exploit developers will be forced to think creatively about solutions to the problems posed – both challenging existing preconceptions of what does and doesn’t work, due to regressions, or simply starting off with undefined behavior and compounding on the undefined behavior from there. One for PAC has already been found – what is dubbed “ PAC gadgets ” are instructions in existing ARMv8.3+ code that can be used to deprotect a register or pointer for use in an exploit chain, though they are far from being aplenty.

Nobody truly knows what the future will bring, but we can be sure that one will be able to see a slow and steady uptick in logic bugs, either as an exploit itself or in a chain, in the future, needless to say of ones that can lead to remote code execution, such as log4j.

 

Sources Cited

1, 2 https://www.synopsys.com/blogs/software-security/understanding-apple-goto-fail-vulnerability-2/