google · JordyZomer · Dec 6, 2024 · Jul 23, 2024 · Jul 23, 2024 · Jul 23, 2024
diff --git a/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/docs/exploit.md
@@ -0,0 +1,186 @@
+# Background
+Taken from [commit message](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=cfa1a2329a691ffd991fcf7248a57d752e712881):
+
+> The BPF ring buffer internally is implemented as a power-of-2 sized circular buffer, with two logical and ever-increasing counters: consumer_pos is the consumer counter to show which logical position the consumer consumed the data, and producer_pos which is the producer counter denoting the amount of data reserved by all producers.<br><br>
+Each time a record is reserved, the producer that "owns" the record will successfully advance producer counter. In user space each time a record is read, the consumer of the data advanced the consumer counter once it finished processing. Both counters are stored in separate pages so that from user space, the producer counter is __read-only__ and the consumer counter is __read-write__.
+
+This is structure layout of bpf_ringbuf:
+```
+struct bpf_ringbuf {
+	wait_queue_head_t waitq;
+	struct irq_work work;
+	u64 mask;
+	struct page **pages;
+	int nr_pages;
+	spinlock_t spinlock ____cacheline_aligned_in_smp;
+	atomic_t busy ____cacheline_aligned_in_smp;
+	unsigned long consumer_pos __aligned(PAGE_SIZE); // read-write from user space
+	unsigned long producer_pos __aligned(PAGE_SIZE); // read-only from user space
+	unsigned long pending_pos;
+	char data[] __aligned(PAGE_SIZE);
+};
+```
+
+`BPF_FUNC_ringbuf_reserve` is used to allocate a memory chunk from `BPF_MAP_TYPE_RINGBUF`.  It reserve 8 bytes space to record header structure.
+```C
+/* 8-byte ring buffer record header structure */
+struct bpf_ringbuf_hdr {
+	u32 len;
+	u32 pg_off;
+};
+```
+And return `(void *)hdr + BPF_RINGBUF_HDR_SZ` for eBPF program to use.  eBPF program is unable to modify `bpf_ringbuf_hdr` due to it is outside of memory chunk.  
+
+But with malformed `&rb->consumer_pos`, it's possible to make second allocated memory chunk overlapping with first chunk.  
+As the result, eBPF program is able to edit first chunk's hdr. This is how we do it: 
+
+1. First, we create a `BPF_MAP_TYPE_RINGBUF` with size is 0x4000. Modify `consumer_pos` to 0x3000 before call `BPF_FUNC_ringbuf_reserve`.
+2. Allocate chunk A, it will be in `[0x0,0x3008]`, and eBPF program is able to edit `[0x8,0x3008]`.
+3. Now allocate chunk B with size 0x3000, it will sucess because we edit consumer_pos ahead to pass the check.
+4. Chunk B will be in `[0x3008,0x6010]`, and eBPF program is able to edit `[0x3010,0x6010]`.  
+
+In kernel code side, this is how they do the check.
+```C
+ static void *__bpf_ringbuf_reserve(struct bpf_ringbuf *rb, u64 size)
+ {
+	...
+	len = round_up(size + BPF_RINGBUF_HDR_SZ, 8);
+	...
+ 	prod_pos = rb->producer_pos;
+ 	new_prod_pos = prod_pos + len;
+/* check for out of ringbuf space by ensuring producer position
+* doesn't advance more than (ringbuf_size - 1) ahead
+*/
+	if (new_prod_pos - cons_pos > rb->mask) {
+		// failed path
+		spin_unlock_irqrestore(&rb->spinlock, flags);
+		return NULL;
+	}
+	// success path
+}
+```
+It can pass the checked, because `cons_pos` had a value 0x3000 (edited via userspace), `new_prod_pos` (0x6010), and `rb->mask` (0x4000 - 1) will satisfy the condition and return buffer allocated in `[0x3008,0x6010]` for the eBPF program.
+
+Due to ringbuf memory layout is allocated in the following way:  
+```C
+static struct bpf_ringbuf *bpf_ringbuf_area_alloc(size_t data_sz, int numa_node)
+{
+	int nr_meta_pages = RINGBUF_NR_META_PAGES;
+	int nr_data_pages = data_sz >> PAGE_SHIFT;
+	int nr_pages = nr_meta_pages + nr_data_pages;
+	...
+	/* Each data page is mapped twice to allow "virtual"
+	 * continuous read of samples wrapping around the end of ring
+	 * buffer area:
+	 * ------------------------------------------------------
+	 * | meta pages |  real data pages  |  same data pages  |
+	 * ------------------------------------------------------
+	 * |            | 1 2 3 4 5 6 7 8 9 | 1 2 3 4 5 6 7 8 9 |
+	 * ------------------------------------------------------
+	 * |            | TA             DA | TA             DA |
+	 * ------------------------------------------------------
+	 *                               ^^^^^^^
+	 *                                  |
+	 * Here, no need to worry about special handling of wrapped-around
+	 * data due to double-mapped data pages. This works both in kernel and
+	 * when mmap()'ed in user-space, simplifying both kernel and
+	 * user-space implementations significantly.
+	 */
+	array_size = (nr_meta_pages + 2 * nr_data_pages) * sizeof(*pages);
+	pages = bpf_map_area_alloc(array_size, numa_node);
+	if (!pages)
+		return NULL;
+
+	for (i = 0; i < nr_pages; i++) {
+		page = alloc_pages_node(numa_node, flags, 0);
+		if (!page) {
+			nr_pages = i;
+			goto err_free_pages;
+		}
+		pages[i] = page;
+		if (i >= nr_meta_pages)
+			pages[nr_data_pages + i] = page;
+	}
+
+	rb = vmap(pages, nr_meta_pages + 2 * nr_data_pages,
+		  VM_MAP | VM_USERMAP, PAGE_KERNEL);
+	...
+}
+```
+
+`[0x0,0x4000]` and `[0x4000,0x8000]` points to same data pages. It means that we can access chunk B at `[0x4000,0x4008]` that will point to chunk A's hdr.
+
+# Exploit
+`BPF_FUNC_ringbuf_submit`/`BPF_FUNC_ringbuf_discard` use hdr's pg_off to locate the meta pages.  
+
+```C
+bpf_ringbuf_restore_from_rec(struct bpf_ringbuf_hdr *hdr)
+{
+	unsigned long addr = (unsigned long)(void *)hdr;
+	unsigned long off = (unsigned long)hdr->pg_off << PAGE_SHIFT;
+
+	return (void*)((addr & PAGE_MASK) - off);
+}
+static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard)
+{
+	unsigned long rec_pos, cons_pos;
+	struct bpf_ringbuf_hdr *hdr;
+	struct bpf_ringbuf *rb;
+	u32 new_len;
+
+	hdr = sample - BPF_RINGBUF_HDR_SZ;
+	rb = bpf_ringbuf_restore_from_rec(hdr);
+```
+
+`pg_off` in `bpf_ringbuf_hdr` is the chunks's page offset from `bpf_ringbuf` structure, so `bpf_ringbuf_restore_from_rec` will substract the ringbuf chunk address with `pg_off` to locate `bpf_ringbuf` object. We can see `bpf_ringbuf_hdr` structure again:
+```C
+struct bpf_ringbuf {
+	...
+	unsigned long consumer_pos __aligned(PAGE_SIZE); // read-write from user space
+	unsigned long producer_pos __aligned(PAGE_SIZE); // read-only from user space
+	unsigned long pending_pos;
+	char data[] __aligned(PAGE_SIZE);
+}
+```
+Suppose chunk A located at the first page of `rb->data`, distance chunk A address with `rb->consumer_pos` is `2`, using bug's primitive we modify `pg_off` of chunk A to `2`, then the meta pages that calculated from `bpf_ringbuf_restore_from_rec` will point to the `rb->consumer_pos`. We can mmap `rb->consumer_pos` in user space and control its content.
+
+By crafting `work` field inside `bpf_ringbuf` and call `bpf_ringbuf_commit` with `BPF_RB_FORCE_WAKEUP` it will call our crafted `irq_work` object to `irq_work_queue`.
+```C
+static void bpf_ringbuf_commit(void *sample, u64 flags, bool discard)
+{
+	...
+	rb = bpf_ringbuf_restore_from_rec(hdr);
+	...
+
+	if (flags & BPF_RB_FORCE_WAKEUP)
+		irq_work_queue(&rb->work);\
+  ...
+```
+Crafted irq_work will processed at `irq_work_single` and will execute our controlled function pointer.
+```C
+void irq_work_single(void *arg)
+{
+    struct irq_work *work = arg;
+    int flags;
+
+    flags = atomic_read(&work->node.a_flags);
+    flags &= ~IRQ_WORK_PENDING;
+    atomic_set(&work->node.a_flags, flags);
+
+    ...
+    lockdep_irq_work_enter(flags);
+    work->func(work); // [1]
+    lockdep_irq_work_exit(flags);
+    ...
+}
+```
+
+# KASLR Bypass
+To bypass kASLR we refer to this [technique](https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-6817_mitigation/docs/exploit.md#kaslr-bypass).
+
+# ROP Chain
+By observation we see RBX/RDI will contain the address of `work` field and we can control the ROP data started at `RDI + 0x18`. Then, we use this ROP gadget for stack pivot to our controlled data.
+```
+0x00000000004b78b1 : push rbx ; or byte ptr [rbx + 0x41], bl ; pop rsp ; pop r13 ; pop rbp ; ret
+```
+Then we continue to execute ROP payload that will overwrite `core_pattern` to our exploit. By trigger crash it will execute our exploit as high privileged.
diff --git a/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/docs/vulnerability.md
@@ -0,0 +1,12 @@
+- Requirements:
+    - Capabilites: NA
+    - Kernel configuration: CONFIG_BPF_SYSCALL=y
+    - User namespaces required: No
+- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=457f44363a8894135c85b7a9afd2bd8196db24ab
+- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=cfa1a2329a691ffd991fcf7248a57d752e712881
+- Affected Version: v5.8 - v6.9
+- Affected Component: bpf
+- Syscall to disable: bpf
+- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-41009
+- Cause: Buffer overlapping
+- Description: A buffer overlapping vulnerability in the Linux kernel's bpf ringbuf. It is possible to make a second allocated memory chunk overlapping with the firstchunk and as a result, the BPF program is able to edit the first chunk's header. Once first chunk's header is modified, then bpf_ringbuf_commit() refers to the wrong page and could cause a crash. We recommend upgrading past commit cfa1a2329a691ffd991fcf7248a57d752e712881
diff --git a/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/exploit/cos-105-17412.370.23/Makefile b/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/exploit/cos-105-17412.370.23/Makefile
@@ -0,0 +1,5 @@
+exploit: exploit.c
+	$(CC) -O3 -ggdb -static -Wall -lpthread -o $@ $^
+
+real_exploit: exploit.c
+	$(CC) -O3 -ggdb -static -Wall -lpthread -DKASLR_BYPASS_INTEL=1 -o exploit $^
diff --git a/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/exploit/cos-105-17412.370.23/exploit b/pocs/linux/kernelctf/CVE-2024-41009_lts_cos/exploit/cos-105-17412.370.23/exploit