iobuf: update the allocator schedule #24475

travisdowns · 2024-12-06T15:20:22Z

iobuf uses a growing schedule of allocations (growth factor 1.5) for each fragment. This schedule was not aware of the seastar allocation sizes, so it would request, for example 66K allocation, but seastar internally rounds that up to 128K, so for that allocation we waste ~50% of the memory which is invisible to iobuf.

In this change we update the schedule to be seastar allocator aware, i.e., using the same ~1.5x growth factor, but rounding up to the next seastar allocator boundary. At 16K or below, these boundaries are log-linear: every 2^n size, plus 3 evenly spread sizes in between each power of 2. Above 16K, sizes are 2^n.

This change slightly reduces the total number of steps until we get to the max size of 128K, as from 32K to 128K we use doubling steps instead of 1.5. For large iobufs this results in 1 fewer total which is why some tests which tested the exact number of fragments needed to be decreased by 1.

Fixes CORE-8478.

Backports Required

Release Notes

travisdowns · 2024-12-06T15:24:00Z

src/v/bytes/details/io_allocation_size.h

+       12288,
+       16384,
+       32768,
+       65536,
       131072});


The table is similar to the old one but "zero waste". At <= 16K, the differences are very small, i.e., the entries can be matched up 1:1 with only slight differences to avoid waste.

Above 16K the new table doubles, since the allocator doubles, so sizes like 90K are just waste compared to 128K (i.e., the underlying alloc in both cases is 128K).

I guess we could save even more waste by only using small pool sizes that divide pages cleanly but it's probably not worth it as the wastage-percent goes to zero once you seriously start using that size.

@StephanDollberg that's only 2^n sizes, I think? Small pool always allocates spans, spans are always 2^n, and only power of 2 can exactly divide other powers of two (because the prime factorization of powers of 2 only have "2s", and so must anything that divides evenly into it).

So we couldn't do that and preserve 1.5x growth factor.

The waste for non power of two can still be very small, however, and does vary across sizes.

that's only 2^n sizes

Yeah, don't think it's worth addressing as the wastage should be negligible.

travisdowns · 2024-12-06T15:25:37Z

src/v/bytes/tests/iobuf_tests.cc

+      16384,
+      32768,
+      65536,
+      131072,
      131072,


This test just tests that passing size at position N in the table results in size N+1 when calling next alloc size. So the table is copy/paste from the impl, and the second table is the same as the first with the first entry deleted (shifted up by 1).

I also include an additional row in the table testing the case that 128K maps to 128K unchanged.

travisdowns · 2024-12-06T15:25:52Z

src/v/bytes/tests/iobuf_tests.cc

@@ -379,38 +379,36 @@ SEASTAR_THREAD_TEST_CASE(iobuf_as_ostream) {
 }

 SEASTAR_THREAD_TEST_CASE(alloctor_forward_progress) {
-    static constexpr std::array<uint32_t, 14> src = {{
+    static constexpr auto src = std::to_array<uint32_t>({


No counting needed in C++20.

travisdowns · 2024-12-06T15:28:40Z

src/v/bytes/tests/iobuf_tests.cc

@@ -429,7 +427,7 @@ SEASTAR_THREAD_TEST_CASE(test_next_chunk_allocation_append_temp_buf) {
    }
    // slow but tha'ts life.
    auto distance = std::distance(buf.begin(), buf.end());
-    BOOST_REQUIRE_EQUAL(distance, 324);
+    BOOST_REQUIRE_EQUAL(distance, 323);


For very large iobufs the new schedule results in ~1 less fragment since it grow slightly faster in the 32K -> 128K range, and these tests assert the exact fragment count.

src/v/bytes/details/io_allocation_size.h

iobuf uses a growing schedule of allocations (growth factor 1.5) for each fragment. This schedule was not aware of the seastar allocation sizes, so it would request, for example 66K allocation, but seastar internally rounds that up to 128K, so for that allocation we waste ~50% of the memory which is invisible to iobuf. In this change we update the schedule to be seastar allocator aware, i.e., using the same ~1.5x growth factor, but rounding up to the next seastar allocator boundary. At 16K or below, these boundaries are log-linear: every 2^n size, plus 3 evenly spread sizes in between each power of 2. Above 16K, sizes are 2^n. This change slightly reduces the total number of steps until we get to the max size of 128K, as from 32K to 128K we use doubling steps instead of 1.5. For large iobufs this results in 1 fewer total which is why some tests which tested the exact number of fragments needed to be decreased by 1. Fixes CORE-8478.

StephanDollberg · 2024-12-06T17:01:53Z

src/v/bytes/details/io_allocation_size.h

+       12288,
+       16384,
+       32768,
+       65536,
       131072});


I guess we could save even more waste by only using small pool sizes that divide pages cleanly but it's probably not worth it as the wastage-percent goes to zero once you seriously start using that size.

vbotbuildovich · 2024-12-06T17:54:30Z

the below tests from https://buildkite.com/redpanda/redpanda/builds/59386#01939ca9-8d6e-41b5-8e82-6f1c9027fafd have failed and will be retried

storage_e2e_single_thread_rpunit

vbotbuildovich · 2024-12-06T18:47:13Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59386#01939d01-4cd5-49d7-85ab-ff3c5112ba55

travisdowns requested a review from StephanDollberg December 6, 2024 15:20

github-actions bot added the area/redpanda label Dec 6, 2024

travisdowns requested review from dotnwat and Lazin December 6, 2024 15:20

travisdowns commented Dec 6, 2024

View reviewed changes

WillemKauf reviewed Dec 6, 2024

View reviewed changes

src/v/bytes/details/io_allocation_size.h Show resolved Hide resolved

travisdowns force-pushed the td-better-iobuf-alloc-schedule branch from 9c0c302 to 7deb38b Compare December 6, 2024 15:48

StephanDollberg approved these changes Dec 6, 2024

View reviewed changes

travisdowns merged commit 8d90667 into redpanda-data:dev Dec 9, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iobuf: update the allocator schedule #24475

iobuf: update the allocator schedule #24475

travisdowns commented Dec 6, 2024

travisdowns Dec 6, 2024

StephanDollberg Dec 6, 2024

travisdowns Dec 6, 2024

travisdowns Dec 6, 2024

StephanDollberg Dec 6, 2024

travisdowns Dec 6, 2024

travisdowns Dec 6, 2024

travisdowns Dec 6, 2024

StephanDollberg Dec 6, 2024

vbotbuildovich commented Dec 6, 2024

vbotbuildovich commented Dec 6, 2024

+,
+,
+,
+,
 });

+,
+,
+,
+,
 ,

iobuf: update the allocator schedule #24475

iobuf: update the allocator schedule #24475

Conversation

travisdowns commented Dec 6, 2024

Backports Required

Release Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbotbuildovich commented Dec 6, 2024

vbotbuildovich commented Dec 6, 2024