-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iobuf: update the allocator schedule #24475
iobuf: update the allocator schedule #24475
Conversation
12288, | ||
16384, | ||
32768, | ||
65536, | ||
131072}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table is similar to the old one but "zero waste". At <= 16K, the differences are very small, i.e., the entries can be matched up 1:1 with only slight differences to avoid waste.
Above 16K the new table doubles, since the allocator doubles, so sizes like 90K are just waste compared to 128K (i.e., the underlying alloc in both cases is 128K).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could save even more waste by only using small pool sizes that divide pages cleanly but it's probably not worth it as the wastage-percent goes to zero once you seriously start using that size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StephanDollberg that's only 2^n sizes, I think? Small pool always allocates spans, spans are always 2^n, and only power of 2 can exactly divide other powers of two (because the prime factorization of powers of 2 only have "2s", and so must anything that divides evenly into it).
So we couldn't do that and preserve 1.5x growth factor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The waste for non power of two can still be very small, however, and does vary across sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's only 2^n sizes
Yeah, don't think it's worth addressing as the wastage should be negligible.
16384, | ||
32768, | ||
65536, | ||
131072, | ||
131072, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test just tests that passing size at position N in the table results in size N+1 when calling next alloc size. So the table is copy/paste from the impl, and the second table is the same as the first with the first entry deleted (shifted up by 1).
I also include an additional row in the table testing the case that 128K maps to 128K unchanged.
@@ -379,38 +379,36 @@ SEASTAR_THREAD_TEST_CASE(iobuf_as_ostream) { | |||
} | |||
|
|||
SEASTAR_THREAD_TEST_CASE(alloctor_forward_progress) { | |||
static constexpr std::array<uint32_t, 14> src = {{ | |||
static constexpr auto src = std::to_array<uint32_t>({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No counting needed in C++20.
@@ -429,7 +427,7 @@ SEASTAR_THREAD_TEST_CASE(test_next_chunk_allocation_append_temp_buf) { | |||
} | |||
// slow but tha'ts life. | |||
auto distance = std::distance(buf.begin(), buf.end()); | |||
BOOST_REQUIRE_EQUAL(distance, 324); | |||
BOOST_REQUIRE_EQUAL(distance, 323); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For very large iobufs the new schedule results in ~1 less fragment since it grow slightly faster in the 32K -> 128K range, and these tests assert the exact fragment count.
iobuf uses a growing schedule of allocations (growth factor 1.5) for each fragment. This schedule was not aware of the seastar allocation sizes, so it would request, for example 66K allocation, but seastar internally rounds that up to 128K, so for that allocation we waste ~50% of the memory which is invisible to iobuf. In this change we update the schedule to be seastar allocator aware, i.e., using the same ~1.5x growth factor, but rounding up to the next seastar allocator boundary. At 16K or below, these boundaries are log-linear: every 2^n size, plus 3 evenly spread sizes in between each power of 2. Above 16K, sizes are 2^n. This change slightly reduces the total number of steps until we get to the max size of 128K, as from 32K to 128K we use doubling steps instead of 1.5. For large iobufs this results in 1 fewer total which is why some tests which tested the exact number of fragments needed to be decreased by 1. Fixes CORE-8478.
9c0c302
to
7deb38b
Compare
12288, | ||
16384, | ||
32768, | ||
65536, | ||
131072}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could save even more waste by only using small pool sizes that divide pages cleanly but it's probably not worth it as the wastage-percent goes to zero once you seriously start using that size.
the below tests from https://buildkite.com/redpanda/redpanda/builds/59386#01939ca9-8d6e-41b5-8e82-6f1c9027fafd have failed and will be retried
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59386#01939d01-4cd5-49d7-85ab-ff3c5112ba55 |
iobuf uses a growing schedule of allocations (growth factor 1.5) for each fragment. This schedule was not aware of the seastar allocation sizes, so it would request, for example 66K allocation, but seastar internally rounds that up to 128K, so for that allocation we waste ~50% of the memory which is invisible to iobuf.
In this change we update the schedule to be seastar allocator aware, i.e., using the same ~1.5x growth factor, but rounding up to the next seastar allocator boundary. At 16K or below, these boundaries are log-linear: every 2^n size, plus 3 evenly spread sizes in between each power of 2. Above 16K, sizes are 2^n.
This change slightly reduces the total number of steps until we get to the max size of 128K, as from 32K to 128K we use doubling steps instead of 1.5. For large iobufs this results in 1 fewer total which is why some tests which tested the exact number of fragments needed to be decreased by 1.
Fixes CORE-8478.
Backports Required
Release Notes