Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iobuf: update the allocator schedule #24475

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 37 additions & 19 deletions src/v/bytes/details/io_allocation_size.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,28 +29,46 @@ class io_allocation_size {
static constexpr size_t ss_max_small_allocation = 16384;

public:
// >>> x=512
// >>> while x < int((1024*128)):
// ... print(x)
// ... x=int(((x*3)+1)/2)
// ... x=int(min(1024*128,x))
// print(1024*128)
// This script computes the table immediately below, which are "ideal"
// allocation sizes: rounded up to the next true size supported by
// the seastar allocator. At <= 16K we apply the small pool indexing
// logic, and above we use powers of 2 since we are in the buddy allocator
// which only supports power-of-two sizes.
//
// We scale the target size by 1.47, i.e., 1.5 tweaked slightly to ensure
// we hit 16K at the small<->big pool boundary.
//
// def lg2(size: int):
// return size.bit_length() - 1
// def p(v: object):
// print(f"{v},")
// s = 512
// fb = 2 # idx_frac_bits
// while s <= 2**14:
// # The size calculation below is doing idx_to_size(size_to_idx(s)),
// # i.e., figuring out which small point index the allocation falls in
// # then seeing what the size of that small pool is, i.e., the size of
// # the smallest small pool that can fix this allocation.
// # See the corresponding routines in src/core/memory.cc:
// #https://github.com/scylladb/seastar/blob/f840b860432e7e716e3cfc004690897b50dc122c/src/core/memory.cc#L478-L499
// idx = ((lg2(s) << fb) - ((1 << fb) - 1)) + ((s - 1) >> (lg2(s) - fb))
WillemKauf marked this conversation as resolved.
Show resolved Hide resolved
// p((((1 << fb) | (idx & ((1 << fb) - 1))) << (idx >> fb)) >> fb)
// s = int(s * 1.47)
// for e in [15, 16, 17]:
// p(2**e)
static constexpr auto alloc_table = std::to_array<uint32_t>(
// computed from a python script above
{512,
768,
1152,
1728,
2592,
3888,
5832,
8748,
13122,
19683,
29525,
44288,
66432,
99648,
1280,
1792,
2560,
3584,
6144,
8192,
12288,
16384,
32768,
65536,
131072});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table is similar to the old one but "zero waste". At <= 16K, the differences are very small, i.e., the entries can be matched up 1:1 with only slight differences to avoid waste.

Above 16K the new table doubles, since the allocator doubles, so sizes like 90K are just waste compared to 128K (i.e., the underlying alloc in both cases is 128K).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could save even more waste by only using small pool sizes that divide pages cleanly but it's probably not worth it as the wastage-percent goes to zero once you seriously start using that size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StephanDollberg that's only 2^n sizes, I think? Small pool always allocates spans, spans are always 2^n, and only power of 2 can exactly divide other powers of two (because the prime factorization of powers of 2 only have "2s", and so must anything that divides evenly into it).

So we couldn't do that and preserve 1.5x growth factor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The waste for non power of two can still be very small, however, and does vary across sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's only 2^n sizes

Yeah, don't think it's worth addressing as the wastage should be negligible.

static size_t next_allocation_size(size_t data_size);

Expand Down
58 changes: 28 additions & 30 deletions src/v/bytes/tests/iobuf_tests.cc
Original file line number Diff line number Diff line change
Expand Up @@ -379,38 +379,36 @@ SEASTAR_THREAD_TEST_CASE(iobuf_as_ostream) {
}

SEASTAR_THREAD_TEST_CASE(alloctor_forward_progress) {
static constexpr std::array<uint32_t, 14> src = {{
static constexpr auto src = std::to_array<uint32_t>({
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No counting needed in C++20.

512,
768,
1152,
1728,
2592,
3888,
5832,
8748,
13122,
19683,
29525,
44288,
66432,
99648,
}};
static constexpr std::array<uint32_t, 14> expected = {{
1280,
1792,
2560,
3584,
6144,
8192,
12288,
16384,
32768,
65536,
131072,
});
static constexpr auto expected = std::to_array<uint32_t>({
768,
1152,
1728,
2592,
3888,
5832,
8748,
13122,
19683,
29525,
44288,
66432,
99648,
1280,
1792,
2560,
3584,
6144,
8192,
12288,
16384,
32768,
65536,
131072,
131072,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test just tests that passing size at position N in the table results in size N+1 when calling next alloc size. So the table is copy/paste from the impl, and the second table is the same as the first with the first entry deleted (shifted up by 1).

I also include an additional row in the table testing the case that 128K maps to 128K unchanged.

}};
});
BOOST_REQUIRE_EQUAL(src.size(), expected.size());
for (size_t i = 0; i < src.size(); ++i) {
BOOST_REQUIRE_EQUAL(
Expand All @@ -429,7 +427,7 @@ SEASTAR_THREAD_TEST_CASE(test_next_chunk_allocation_append_temp_buf) {
}
// slow but tha'ts life.
auto distance = std::distance(buf.begin(), buf.end());
BOOST_REQUIRE_EQUAL(distance, 324);
BOOST_REQUIRE_EQUAL(distance, 323);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For very large iobufs the new schedule results in ~1 less fragment since it grow slightly faster in the 32K -> 128K range, and these tests assert the exact fragment count.

constexpr size_t sz = 40000 * 1024;
auto msg = iobuf_as_scattered(std::move(buf));
BOOST_REQUIRE_EQUAL(msg.size(), sz);
Expand All @@ -445,7 +443,7 @@ SEASTAR_THREAD_TEST_CASE(test_next_chunk_allocation_append_iobuf) {
}
// slow but tha'ts life.
auto distance = std::distance(buf.begin(), buf.end());
BOOST_REQUIRE_EQUAL(distance, 324);
BOOST_REQUIRE_EQUAL(distance, 323);
constexpr size_t sz = 40000 * 1024;
auto msg = iobuf_as_scattered(std::move(buf));
BOOST_REQUIRE_EQUAL(msg.size(), sz);
Expand Down
Loading