TRAVERTINE

pmap_ro_zone_memcpy -> memcpy -> rep movsb
“`

Clearly, as `zalloc_ro_mut` eventually uses `rep movsb`, it is not atomic, and should not be used in places that atomic writes are required (remember SMR’s requirement for writers?).
The XNU authors seem to have accounted for the non-atomicity of `zalloc_ro_mut` by providing the private `zalloc_ro_mut_atomic` API, which takes an argument for which kind of atomic operation should be performed.

So, if we can find a place where `zalloc_ro_mut` was used where `zalloc_ro_mut_atomic` (or one of its variants) should have been, there’s a good chance we have a race condition bug. (Hint: this is foreshadowing).

### Usage of Read-Only Objects in XNU

A general pattern I have observed with these read-only objects is that usually they are “paired” with a complementary read-write struct.
For example, the `struct proc` that represents a process has a matching read-only `struct proc_ro`.
`proc.p_proc_ro` points to a given proc’s proc_ro, and `proc_ro.pr_proc` points back to its matching proc.

The really important bits of the structure are stored in the read-only object, and the relatively unimportant stuff is stored in the read-write version.

This diagram shows how `proc`’s and `ucred`’s are related.

“`
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ proc_task_zone │ │ proc_ro zone │ │ kauth_cred zone │ │ ucred_rw_zone │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ ▼ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───▶│ struct proc │ ┌───▶│ struct │ ┌───▶│struct ucred │ ┌───▶│ struct │ │ │ (BSD) │ │ │ proc_ro │ │ │ │ │ │ ucred_rw │ │ │┌───────────┐│ │ │┌───────────┐│ │ │┌───────────┐│ │ │ │ │ ││ p_proc_ro │├───┤ ││ p_ucred │├───┘ ││ cred_rw │├───┘ │ │ │ │└───────────┘│ │ │└───────────┘│ *SMR* │└───────────┘│ │ │ │ ├─────────────┤ │ │┌───────────┐│ │┌───────────┐│ └─────────────┘ │ ┌─▶│ struct task │ │ ┌──┤│ pr_task ││ ││ struct ││ │ │ │ (Mach) │ │ │ │└───────────┘│ ││posix_cred ││ │ │ │┌───────────┐│ │ │ │┌───────────┐│ │└───────────┘│ │ │ ││bsd_info_ro│├───┘ │ ┌┤│ pr_proc ││ └─────────────┘ │ │ │└───────────┘│ │ ││└───────────┘│ │ │ └─────────────┘ │ │└─────────────┘ │ └──────────────────────┘ │ └──────────────────────────┘
“`

Every process has a `struct proc` and `struct task` (allocated right next to each other).
The BSD world uses the `struct proc` and the Mach world uses the `struct task`.
Both of these structures point to a matching `struct proc_ro`, which points back to both of them.

The `proc_ro` is a read-only object allocated from the `proc_ro` zone.
It is used for storing the sensitive data for this process- for example, the SMR protected pointer to a `struct ucred`.
The `ucred` also has a matching `struct ucred_rw` structure it points to.

**The important takeaway here is that every `proc` has a `proc_ro`, which is a read-only object that holds an SMR pointer to its credential.**

This means that if a process’s credential needs to be changed, it needs to be performed using the `zalloc_ro` API.
I sure hope there isn’t a place where they use `zalloc_ro_mut` to write to `p_ucred` instead of an atomic API, because `p_ucred` is an SMR pointer and therefore **must** be written to atomically (hint: foreshadowing).

# 3. Per-Thread Credentials

A credential ( `struct ucred`) in XNU is a data structure that tracks a number of security related fields, such as the thread’s user ID.
Here is the definition of a ucred:

“`
struct ucred { struct ucred_rw *cr_rw; void *cr_unused; u_long cr_ref; /* reference count */ struct posix_cred { /* * The credential hash depends on everything from this point on * (see kauth_cred_get_hashkey) */ uid_t cr_uid; /* effective user id */ uid_t cr_ruid; /* real user id */ uid_t cr_svuid; /* saved user id */ u_short cr_ngroups; /* number of groups in advisory list */ u_short __cr_padding; gid_t cr_groups[NGROUPS];/* advisory group list */ gid_t cr_rgid; /* real group id */ gid_t cr_svgid; /* saved group id */ uid_t cr_gmuid; /* UID for group membership purposes */ int cr_flags; /* flags on credential */ } cr_posix; … };
“`

The `posix_cred` part of a credential is used for tracking the privileges of the current thread.
Most threads in the system will have identical permissions- whatever permissions the current user has.
Storing a copy of these identical credentials for every thread would cost quite a bit of memory.
Instead, XNU makes use of an SMR hash table to hash these cred structs to allow threads to share the same credential object.
Credential objects use a reference count ( `cr_ref`) to track when they can be freed.

The hash is calculated using the second half of the cred (eg. `cr_posix` and onward).
This allows threads with identical permissions to share the same credential object, saving memory.
This will be important later.

### Managing Credentials

In XNU, threads belonging to the same process can have different credentials.
This is managed using `current_cached_proc_cred_update`, which is called during every syscall.
During every syscall entry, the kernel retrieves the current process credential pointer and compares it to the per-thread credential to see if any changes need to be made.

“`
void current_cached_proc_cred_update(void) { thread_ro_t tro = current_thread_ro(); proc_t proc = tro->tro_proc; if (__improbable(tro->tro_task != kernel_task && tro->tro_realcred != proc_ucred_unsafe(proc))) { kauth_cred_thread_update_slow(tro, proc); } }
“`

This method compares the `p_ucred` pointer from the `proc_ro` with the `tro_realcred` pointer from the current `thread_ro`.
If the credentials are equivalent (eg. they have the same `cr_posix` values), these pointers will point to the same cred in the hash table.

If the pointers don’t match, `kauth_cred_thread_update_slow` is called to update things, which eventually enters an SMR region and dereferences `p_ucred`.
Note that in `current_cached_proc_cred_update`, the usage of `proc_ucred_unsafe` (reading `p_ucred` without being in an SMR region) is ok since we only read the credential pointer’s value and do not dereference it yet.

The important thing here is that every syscall can cause the kernel to enter an SMR region and dereference `p_ucred`, the SMR-protected pointer to the process’s credential structure.
This happens whenever the thread’s credentials change even slightly.

# 4. The Race Condition

Now that we have all the pieces, let’s put them together.

To recap:

– `proc_ro` is a read-only object used for managing a process’s sensitive data (such as its credentials), and can only be modified via the `zalloc_ro_mut*` family of functions.
– `proc_ro.p_ucred` is an SMR-protected pointer to a process’s credential structure.
– Since `p_ucred` is an SMR pointer, writers must synchronize with one another via a lock (specifically this one), and when writing must use atomic operations to change `p_ucred`.
– `zalloc_ro_mut`, a function used for modifying read-only objects, is non-atomic and is therefore unsuitable for modifying `p_ucred`.

You might be able to see where this is going.

The bug is that there is a spot in the code that updates `proc_ro.p_ucred` non-atomically using `zalloc_ro_mut`.
I found a way to race this update call against an SMR dereference of `p_ucred`, which will load without locking.
If you do this enough times, eventually you will observe a partially written value of `p_ucred` that points to a **different credential**!

### The Buggy Function

The bug lives in `kauth_cred_proc_update`, which is the function responsible for updating a `proc_ro`’s `p_ucred` pointer.
I have highlighted the buggy line in red.

“`
bool kauth_cred_proc_update( proc_t p, proc_settoken_t action, kauth_cred_derive_t derive_fn) { kauth_cred_t cur_cred, free_cred, new_cred; cur_cred = kauth_cred_proc_ref(p); for (;;) { new_cred = kauth_cred_derive(cur_cred, derive_fn); if (new_cred == cur_cred) { … kauth_cred_unref(&new_cred); kauth_cred_unref(&cur_cred); return false; } proc_ucred_lock(p); if (__probable(proc_ucred_locked(p) == cur_cred)) { kauth_cred_ref(new_cred); kauth_cred_hold(new_cred); // This is the bug: zalloc_ro_mut(ZONE_ID_PROC_RO, proc_get_ro(p), offsetof(struct proc_ro, p_ucred), &new_cred, sizeof(struct ucred *)); kauth_cred_drop(cur_cred); ucred_rw_unref_live(cur_cred->cr_rw); proc_update_creds_onproc(p, new_cred); proc_ucred_unlock(p); … kauth_cred_unref(&new_cred); kauth_cred_unref(&cur_cred); return true; } … } }
“`

To trigger the bug, we need to be able to cause frequent credential updates.
I tried several ways to do this and found the best way is to use `setgid` to adjust the group ID over and over again.
Each time the group ID changes, `kauth_cred_proc_update` will need to be called to adjust `p_ucred` to point to the correct credential object in the hash table.

Let’s take a closer look at how `p_ucred` is read and written.

### Reading p_ucred

Readers of `p_ucred` use `proc_ucred_smr` to fetch the `p_proc_ro->p_ucred` field from a given `proc_t`.

“`
kauth_cred_t proc_ucred_unsafe(proc_t p) { kauth_cred_t cred = smr_serialized_load(&proc_get_ro(p)->p_ucred); return kauth_cred_require(cred); } kauth_cred_t proc_ucred_smr(proc_t p) { assert(smr_entered(&smr_proc_task)); return proc_ucred_unsafe(p); }
“`

After ensuring we are in an `smr_region`, `smr_serialized_load` just returns the value of `p_ucred` from memory without locking.
Whatever is currently in memory is what we get, even if it is an in-progress write from a non-atomic writer thread.

### Writing p_ucred

The XNU SMR API requires writers to be serialized by an external mechanism- in this case, it’s the `p_ucred_mlock` (via the `proc_ucred_locked` API).
This lock serializes writers so that, in theory, a correct pointer is always present in memory, allowing readers to read without locking.
However, as we’ve seen, even though `kauth_cred_proc_update` correctly acquires the writer lock, it violates the SMR requirements due to the use of the non-atomic `zalloc_ro_mut` API.

### Triggering the Bug

Every time `kauth_cred_proc_update` changes `p_ucred`, the bug is triggered.
However, most of the time, this will not cause problems, because normal workflows only update their credentials rarely, if at all.
To hit the bug we need to read `p_ucred` while a write is occurring.

We don’t need to trigger any allocations or frees, all we need is to cause `p_ucred` to be changed via `zalloc_ro_mut`.
Specifically, this happens when a `kauth_cred_derive_t` closure returns `true`.
Many paths in the kernel can cause this (eg. `setuid`, `setgid`, `setgroups`, etc.)

To hit the bug we need two threads- one to trigger frequent `p_ucred` changes, and one to read `p_ucred`.

### Writer Thread

To allow for an unprivileged local attacker to cause credential changes, I use a binary with the `setgid` bit.
This allows us to switch the effective group ID back and forth between the saved and real group ID for the caller without requiring root.
Each time the effective group ID changes, `p_ucred` will need to be updated as well.
Specifically, two credentials will be allocated in the hash table (one for each possible GID), and `kauth_cred_proc_update` will switch between them.

Here is what that thread looks like:

“`
while(true) { setgid(rg); // real gid setgid(eg); // eff. gid }
“`

Each time we call `setgid` in this manner, `setgid` will use `kauth_cred_proc_update` to update the credential pointer in our proc’s `p_ucred`.
Unprivileged users are allowed to swap between the saved GID and real GID without root privileges, so this is a practical way to trigger many credential changes.

Each time `p_ucred` is changed with `zalloc_ro_mut`, there is a chance that a concurrent reader will observe an intermediate value.

### Reader Thread

`unix_syscall64` takes a reference to the current proc cred during every syscall to support maintaining different credentials across threads.
As we have seen, `current_cached_proc_cred_update` will attempt to verify and dereference `p_ucred` on credential changes to read the `cr_rw` field.

Any syscall running concurrent to group ID changes will trigger this read.
My reader thread just calls `getgid()` in a loop.

“`
volatile gid_t tmp; while(true) tmp = getgid();
“`

At some point, one of these reads will observe a `p_ucred` value that is halfway written, which will cause a crash if you are lucky- or maybe silent corruption of the credential if you are unlucky!

### Running the Proof-of-Concept

The binary needs to be a `setgid` binary run as a different group than the real GID of the current user so that we have a different group to switch to.
The default group on macOS is `staff`, so I use `everyone` as this second group.
This just gives us a convenient way of getting `kauth_cred_proc_update` to switch credentials without needing root.
Other ways of triggering this are also possible.

“`
chgrp everyone poc chmod g+s poc ./poc
“`

After running the proof of concept for a while, eventually your process’s credential pointer will become corrupted.
This could cause a kernel panic, or maybe it could cause your credentials to silently be changed to point to some other credential object in the `kauth_cred` zone.

You can find a proof of concept that uses two threads to race `kauth_cred_proc_update` against `current_cached_proc_cred_update` here.

# 5. Conclusion

First off, I should note that this race is quite hard to win. Since the two credentials we are switching between are in the same zone, many of the bytes in their addresses will be identical. This means that even when this race is triggered it may not cause visible issues.

When investigating this bug I would commonly setup a 2013 Mac Pro running OCLP (I got one for a few hundred bucks from eBay), turn on my proof of concept code, and leave it running overnight with a debugger attached, hoping that it would be stopped in a panic condition when I woke up the next day.

Second, I have only observed this issue affect Intel systems.
I currently believe this to be due to the fact that the version of memcpy used in ARM64 is optimized to copy larger blocks of bytes at a time, which gives some degree of atomicity in practice.
While the code is still not strictly correct on ARM systems (because `zalloc_ro_mut` does not guarantee atomicity), I was not able to cause any `kauth_cred_t` corruption there.
Maybe someone reading this can get it to work on an ARM system? If you do, let me know on X @0xjprx.

### Suggested Fix

When I reported this bug to Apple, I provided the following suggested fix.

“`
@@ -3947,9 +3947,9 @@ kauth_cred_proc_update( kauth_cred_ref(new_cred); kauth_cred_hold(new_cred); – zalloc_ro_mut(ZONE_ID_PROC_RO, proc_get_ro(p), + zalloc_ro_mut_atomic(ZONE_ID_PROC_RO, proc_get_ro(p), offsetof(struct proc_ro, p_ucred), – &new_cred, sizeof(struct ucred *)); + ZRO_ATOMIC_XCHG_LONG, (uint64_t)new_cred); kauth_cred_drop(cur_cred); ucred_rw_unref_live(cur_cred->cr_rw);
“`

Running the kernel with this patch applied completely eliminated the bug from my setup.
I used `zalloc_ro_mut_atomic` with `ZRO_ATOMIC_XCHG_LONG` to atomically swap the old credential pointer for the new one.

A better function to use here is probably something like `zalloc_ro_update_field_atomic`, but I found there were non-trivial incompatibilities between the implicit structs declared via the SMR pointer macro and the macros used by update_field_atomic, so I just called `zalloc_ro_mut_atomic` directly.

### Winning the Race

When you win the race, if the invalid pointer is not properly aligned for an element of the zone, you’ll get a panic like this:

“`
panic: zone_require_ro failed: element improperly aligned (addr: 0xffffff86c79e8350) @zalloc.c:7376 Panicked task 0xffffff952d31db88: 3 threads: pid 1110: poc Backtrace (CPU 8), panicked thread: 0xffffff90610770c8, Frame : Return Address 0xfffffff4078abac0 : 0xffffff8007becc41 mach_kernel : _handle_debugger_trap + 0x4c1 0xfffffff4078abb10 : 0xffffff8007d598ec mach_kernel : _kdp_i386_trap + 0x11c 0xfffffff4078abb50 : 0xffffff8007d48f6b mach_kernel : _kernel_trap + 0x48b 0xfffffff4078abc10 : 0xffffff8007b82971 mach_kernel : _return_from_trap + 0xc1 0xfffffff4078abc30 : 0xffffff8007becf37 mach_kernel : _DebuggerTrapWithState + 0x67 0xfffffff4078abd30 : 0xffffff8007bec5d2 mach_kernel : _panic_trap_to_debugger + 0x1e2 0xfffffff4078abda0 : 0xffffff80083d4938 mach_kernel : _panic + 0x81 0xfffffff4078abe90 : 0xffffff80083dab9f mach_kernel : ___smr_stail_invalid + 0x2ce9 0xfffffff4078abed0 : 0xffffff80080c6757 mach_kernel : _kauth_cred_proc_ref + 0x167 0xfffffff4078abf00 : 0xffffff80080c64c8 mach_kernel : _kauth_cred_ref + 0xc8 0xfffffff4078abf40 : 0xffffff800823b4eb mach_kernel : _unix_syscall64 + 0x39b 0xfffffff4078abfa0 : 0xffffff8007b82db6 mach_kernel : _hndl_unix_scall64 + 0x16 Process name corresponding to current thread: poc Mac OS version: 24A335 Kernel version: Darwin Kernel Version 24.0.0: Mon Aug 12 20:54:30 PDT 2024; root:xnu-11215.1.10~2/RELEASE_X86_64 Kernel UUID: 5DD51D41-0315-3DDD-BD5D-50E782643BDB roots installed: 0 KernelCache slide: 0x0000000007800000 KernelCache base: 0xffffff8007a00000 Kernel slide: 0x00000000078e4000 Kernel text base: 0xffffff8007ae4000 __HIB text base: 0xffffff8007900000
“`

It might also be possible that the creds align just right such that combining them will give you a pointer to a correctly aligned credential, effectively changing your process’s credentials.

Without some other kind of mechanism for deterministically controlling where in the kernel your credential objects are allocated, there isn’t a lot of control over how the invalid pointer gets formed, so this may be hard to achieve in practice. Maybe you could try to get them lined up via spraying creds in a particular pattern? I’ll leave that as an “exercise for the reader.”

Even so, I consider this bug to be extraordinarily fascinating and a great learning example for some really cool features of XNU. What do you think? Feel free to reach out on X @0xjprx.

This bug was fixed in macOS 15.3, released on January 27, 2025.

# References

A few cool links about concurrency and lock-free data structures.

[1] Paul E. McKenney. _Is Parallel Programming Hard, And, If So, What Can You Do About It?_.
This whole book is amazing. Chapter 9 on Deferred Processing is of particular relevance.

[2] Keir Fraser. _Practical Lock-Freedom_..

[3] Locks in the Linux Kernel.

[4] SMR Discussion in the FreeBSD Mailing List.

-ravi

January 30, 2025