Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernelCTF CVE-2024-41009_lts_cos #118

Merged
merged 26 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-41009_lts_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
# Background
Taken from [commit message](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=cfa1a2329a691ffd991fcf7248a57d752e712881):

> The BPF ring buffer internally is implemented as a power-of-2 sized circular buffer, with two logical and ever-increasing counters: consumer_pos is the consumer counter to show which logical position the consumer consumed the data, and producer_pos which is the producer counter denoting the amount of data reserved by all producers.<br><br>
Each time a record is reserved, the producer that "owns" the record will successfully advance producer counter. In user space each time a record is read, the consumer of the data advanced the consumer counter once it finished processing. Both counters are stored in separate pages so that from user space, the producer counter is __read-only__ and the consumer counter is __read-write__.

This is structure layout of bpf_ringbuf:
```
struct bpf_ringbuf {
wait_queue_head_t waitq;
struct irq_work work;
u64 mask;
struct page **pages;
int nr_pages;
spinlock_t spinlock ____cacheline_aligned_in_smp;
atomic_t busy ____cacheline_aligned_in_smp;
unsigned long consumer_pos __aligned(PAGE_SIZE); // read-write from user space
unsigned long producer_pos __aligned(PAGE_SIZE); // read-only from user space
unsigned long pending_pos;
char data[] __aligned(PAGE_SIZE);
};
```

`BPF_FUNC_ringbuf_reserve` is used to allocate a memory chunk from `BPF_MAP_TYPE_RINGBUF`. It reserve 8 bytes space to record header structure.
```C
/* 8-byte ring buffer record header structure */
struct bpf_ringbuf_hdr {
u32 len;
u32 pg_off;
};
```
And return `(void *)hdr + BPF_RINGBUF_HDR_SZ` for eBPF program to use. eBPF program is unable to modify `bpf_ringbuf_hdr` due to it is outside of memory chunk.

But with malformed `&rb->consumer_pos`, it's possible to make second allocated memory chunk overlapping with first chunk.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you specifically add that consumer_pos and producer_pos are editable because you can mmap() them in userspace? This isn't entirely clear from the write-up.

As the result, eBPF program is able to edit first chunk's hdr. This is how we do it:

1. First, we create a `BPF_MAP_TYPE_RINGBUF` with size is 0x4000. Modify `consumer_pos` to 0x3000 before call `BPF_FUNC_ringbuf_reserve`.
2. Allocate chunk A, it will be in `[0x0,0x3008]`, and eBPF program is able to edit `[0x8,0x3008]`.
3. Now allocate chunk B with size 0x3000, it will sucess because we edit consumer_pos ahead to pass the check.
4. Chunk B will be in `[0x3008,0x6010]`, and eBPF program is able to edit `[0x3010,0x6010]`.

In kernel code side, this is how they do the check.
```C
static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
{
...
len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
...
prod_pos = rb->producer_pos;
new_prod_pos = prod_pos + len;
/* check for out of ringbuf space by ensuring producer position
* doesn't advance more than (ringbuf_size - 1) ahead
*/
if (new_prod_pos - cons_pos > rb->mask) {
// failed path
spin_unlock_irqrestore(&rb->spinlock, flags);
return NULL;
}
// success path
}
```
It can pass the checked, because `cons_pos` had a value 0x3000 (edited via userspace), `new_prod_pos` (0x6010), and `rb->mask` (0x4000 - 1) will satisfy the condition and return buffer allocated in `[0x3008,0x6010]` for the eBPF program.

Due to ringbuf memory layout is allocated in the following way:
```C
static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
{
int nr_meta_pages = RINGBUF_NR_META_PAGES;
int nr_data_pages = data_sz >> PAGE_SHIFT;
int nr_pages = nr_meta_pages + nr_data_pages;
...
/* Each data page is mapped twice to allow "virtual"
* continuous read of samples wrapping around the end of ring
* buffer area:
* ------------------------------------------------------
* | meta pages | real data pages | same data pages |
* ------------------------------------------------------
* | | 1 2 3 4 5 6 7 8 9 | 1 2 3 4 5 6 7 8 9 |
* ------------------------------------------------------
* | | TA DA | TA DA |
* ------------------------------------------------------
* ^^^^^^^
* |
* Here, no need to worry about special handling of wrapped-around
* data due to double-mapped data pages. This works both in kernel and
* when mmap()'ed in user-space, simplifying both kernel and
* user-space implementations significantly.
*/
array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
pages = bpf_map_area_alloc(array_size, numa_node);
if (!pages)
return NULL;

for (i = 0; i < nr_pages; i++) {
page = alloc_pages_node(numa_node, flags, 0);
if (!page) {
nr_pages = i;
goto err_free_pages;
}
pages[i] = page;
if (i >= nr_meta_pages)
pages[nr_data_pages + i] = page;
}

rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
VM_MAP | VM_USERMAP, PAGE_KERNEL);
...
}
```

`[0x0,0x4000]` and `[0x4000,0x8000]` points to same data pages. It means that we can access chunk B at `[0x4000,0x4008]` that will point to chunk A's hdr.

# Exploit
`BPF_FUNC_ringbuf_submit`/`BPF_FUNC_ringbuf_discard` use hdr's pg_off to locate the meta pages.

```C
bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *hdr)
{
unsigned long addr = (unsigned long)(void *)hdr;
unsigned long off = (unsigned long)hdr->pg_off << PAGE_SHIFT;

return (void*)((addr & PAGE_MASK) - off);
}
static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard)
{
unsigned long rec_pos, cons_pos;
struct bpf_ringbuf_hdr *hdr;
struct bpf_ringbuf *rb;
u32 new_len;

hdr = sample - BPF_RINGBUF_HDR_SZ;
rb = bpf_ringbuf_restore_from_rec(hdr);
```

`pg_off` in `bpf_ringbuf_hdr` is the chunks's page offset from `bpf_ringbuf` structure, so `bpf_ringbuf_restore_from_rec` will substract the ringbuf chunk address with `pg_off` to locate `bpf_ringbuf` object. We can see `bpf_ringbuf_hdr` structure again:
```C
struct bpf_ringbuf {
...
unsigned long consumer_pos __aligned(PAGE_SIZE); // read-write from user space
unsigned long producer_pos __aligned(PAGE_SIZE); // read-only from user space
unsigned long pending_pos;
char data[] __aligned(PAGE_SIZE);
}
```
Suppose chunk A located at the first page of `rb->data`, distance chunk A address with `rb->consumer_pos` is `2`, using bug's primitive we modify `pg_off` of chunk A to `2`, then the meta pages that calculated from `bpf_ringbuf_restore_from_rec` will point to the `rb->consumer_pos`. We can mmap `rb->consumer_pos` in user space and control its content.

By crafting `work` field inside `bpf_ringbuf` and call `bpf_ringbuf_commit` with `BPF_RB_FORCE_WAKEUP` it will call our crafted `irq_work` object to `irq_work_queue`.
```C
static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard)
{
...
rb = bpf_ringbuf_restore_from_rec(hdr);
...

if (flags & BPF_RB_FORCE_WAKEUP)
irq_work_queue(&rb->work);\
...
```
Crafted irq_work will processed at `irq_work_single` and will execute our controlled function pointer.
```C
void irq_work_single(void *arg)
{
struct irq_work *work = arg;
int flags;

flags = atomic_read(&work->node.a_flags);
flags &= ~IRQ_WORK_PENDING;
atomic_set(&work->node.a_flags, flags);

...
lockdep_irq_work_enter(flags);
work->func(work); // [1]
lockdep_irq_work_exit(flags);
...
}
```

# KASLR Bypass
To bypass kASLR we refer to this [technique](https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-6817_mitigation/docs/exploit.md#kaslr-bypass).

# ROP Chain
By observation we see RBX/RDI will contain the address of `work` field and we can control the ROP data started at `RDI + 0x18`. Then, we use this ROP gadget for stack pivot to our controlled data.
```
0x00000000004b78b1 : push rbx ; or byte ptr [rbx + 0x41], bl ; pop rsp ; pop r13 ; pop rbp ; ret
```
Then we continue to execute ROP payload that will overwrite `core_pattern` to our exploit. By trigger crash it will execute our exploit as high privileged.
12 changes: 12 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-41009_lts_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- Requirements:
- Capabilites: NA
- Kernel configuration: CONFIG_BPF_SYSCALL=y
- User namespaces required: No
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=457f44363a8894135c85b7a9afd2bd8196db24ab
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=cfa1a2329a691ffd991fcf7248a57d752e712881
- Affected Version: v5.8 - v6.9
- Affected Component: bpf
- Syscall to disable: bpf
- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-41009
- Cause: Buffer overlapping
- Description: A buffer overlapping vulnerability in the Linux kernel's bpf ringbuf. It is possible to make a second allocated memory chunk overlapping with the firstchunk and as a result, the BPF program is able to edit the first chunk's header. Once first chunk's header is modified, then bpf_ringbuf_commit() refers to the wrong page and could cause a crash. We recommend upgrading past commit cfa1a2329a691ffd991fcf7248a57d752e712881
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
exploit: exploit.c
$(CC) -O3 -ggdb -static -Wall -lpthread -o $@ $^

real_exploit: exploit.c
$(CC) -O3 -ggdb -static -Wall -lpthread -DKASLR_BYPASS_INTEL=1 -o exploit $^
Binary file not shown.
Loading
Loading