pageserver: reorder upload queue when possible #10218

erikgrinaker · 2024-12-20T12:56:39Z

Problem

The upload queue currently sees significant head-of-line blocking. For example, index uploads act as upload barriers, and for every layer flush we schedule a layer and index upload, which effectively serializes layer uploads.

Resolves #10096.

Summary of changes

Allow upload queue operations to bypass the queue if they don't conflict with preceding operations, increasing parallelism.

NB: the upload queue currently schedules an explicit barrier after every layer flush as well (see #8550). This must be removed to enable parallelism. This will require a better mechanism for compaction backpressure, see e.g. #8390 or #5415.

pageserver/src/tenant/remote_timeline_client.rs

github-actions · 2024-12-20T13:21:26Z

7330 tests run: 6959 passed, 0 failed, 371 skipped (full report)

Flaky tests (2)

Postgres 17

test_scrubber_tenant_snapshot[4]: debug-x86-64

Postgres 14

test_lr_with_slow_safekeeper: release-arm64

Code coverage* (full report)

functions: 32.9% (8143 of 24756 functions)
lines: 48.2% (68254 of 141718 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
11a217e at 2025-01-14T14:45:52.163Z :recycle:}

erikgrinaker · 2025-01-03T11:10:39Z

Added a benchmark measuring the next_ready() time for an UploadMetadata with a given number of Delete tasks already in progress. This shows a linear cost of 1 ms per 10,000 in-progress operations (per queued task). 85% of this is due to LayerName::hash for IndexPart::layer_metadata lookups. We can estimate that the common case of layer uploads/deletes only use 1 ms per 100,000 in-progress operations.

It isn't clear that we'll hit this number of in-progress operations under realistic conditions, so this is probably fine.

VladLazar

Went over it a few times. Looks correct and the tests are convincing. Perhaps worth a second pair of eyes though.

erikgrinaker · 2025-01-06T18:44:25Z

@skyzh Want to give this another look-over?

skyzh

overall LGTM.

The patch is still bottlenecked by two things: flush deletion queue (as mentioned in #10283) and reorder index part uploads. These could be addressed in future patches.

The patch allows uploads to skip the queue if they are not yet in index_part yet, which means that we can upload more aggressively, and that we would potentially get files not referenced in index_part in the remote storage. I'll update the storage scrubber check at some point.

Moving forward, we can of course modify the bypass algorithm to allow more parallelism (i.e., batch index part upload), but I wonder if things could get easier with the following ideas:

Having a unique ID for each of the file (i.e., a monotonically increasing ID for each layer file) so that we don't need to deal with upload-delete race at all.
Handle meta operations in the upload queue instead of atomic primitives like upload/delete. i.e., we only have a few operations in the layer file manager: L0 compaction, freeze/flush, GC. This means that the operations submitted to the upload queue are always in the form of "upload layers x N, deletion, update index part". Hence, upload/delete never needs to get reordered between between index_part uploads because deletes always get appended to the queue after uploads for each of the meta operation.

pageserver/src/tenant/upload_queue.rs

skyzh · 2025-01-06T18:52:21Z

Can we also apply the env variable in CI and see if it passes all tests?

erikgrinaker · 2025-01-14T11:38:38Z

Finally getting back to this. TFTRs!

we would potentially get files not referenced in index_part in the remote storage

This is already the case today -- we always have to upload the file before we upload the index. This won't become more common with this patch, since we don't delay index uploads more than we did before. If anything it might become less common, because index uploads no longer necessarily have to wait for all in-progress operations to complete first.

The patch is still bottlenecked by two things: flush deletion queue (as mentioned in #10283) and reorder index part uploads.

It's not really bottlenecked on the deletion queue race -- that race already exists today, but we can more easily fix that race now. And I think we'll have to do something similar to the current deletion queue flushing even if we do address it.

See #10248 for index upload coalescing.

Having a unique ID for each of the file (i.e., a monotonically increasing ID for each layer file) so that we don't need to deal with upload-delete race at all.

Yeah, immutable files are generally a much better idea. I'd be +1 on doing that instead at some point, and making path conflicts an error condition.

Handle meta operations in the upload queue instead of atomic primitives like upload/delete

I think I'd prefer smaller atomic primitives -- it's a smaller set of cases to deal with, and it's easier to reason about the interactions.

That scheme also wouldn't address the motivating case here: layer flushes currently upload layer/index/layer/index, and we want to upload the layers in parallel.

Can we also apply the env variable in CI and see if it passes all tests?

Good idea, I'll open a draft PR for a CI run.

erikgrinaker · 2025-01-14T13:56:17Z

Can we also apply the env variable in CI and see if it passes all tests?

Good idea, I'll open a draft PR for a CI run.

#10385

I'm also going to add a limit for the number of in-progress items here, and set it to the remote storage concurrency (100), since it's pointless to schedule more tasks that this. That will put a bound on the complexity in most common cases

Split this out to a follow-up PR: #10384.

erikgrinaker · 2025-01-14T16:31:46Z

All tests passed on #10385, let's see if this thing flies.

## Problem With upload queue reordering in #10218, we can easily get into a situation where multiple index uploads are queued back to back, which can't be parallelized. This will happen e.g. when multiple layer flushes enqueue layer/index/layer/index/... and the layers skip the queue and are uploaded in parallel. These index uploads will incur serial S3 roundtrip latencies, and may block later operations. Touches #10096. ## Summary of changes When multiple back-to-back index uploads are ready to upload, only upload the most recent index and drop the rest.

erikgrinaker force-pushed the erik/upload-reorder branch from 11ee986 to b457d21 Compare December 20, 2024 13:01

jcsp reviewed Dec 20, 2024

View reviewed changes

pageserver/src/tenant/remote_timeline_client.rs Outdated Show resolved Hide resolved

erikgrinaker force-pushed the erik/upload-reorder branch from b457d21 to c2215f1 Compare December 20, 2024 13:48

skyzh self-requested a review December 20, 2024 22:56

erikgrinaker force-pushed the erik/upload-reorder branch 4 times, most recently from fdb6fbb to b032c18 Compare December 27, 2024 16:38

erikgrinaker changed the base branch from main to erik/assert-upload-index December 30, 2024 09:38

erikgrinaker force-pushed the erik/upload-reorder branch 2 times, most recently from 4eea68f to c09b6a7 Compare December 30, 2024 14:17

erikgrinaker mentioned this pull request Dec 30, 2024

pageserver: coalesce index uploads when possible #10248

Merged

erikgrinaker force-pushed the erik/upload-reorder branch 9 times, most recently from 91be562 to dad6139 Compare January 2, 2025 14:46

erikgrinaker mentioned this pull request Jan 2, 2025

pageserver: simplify block_deletions handling #10261

Draft

erikgrinaker force-pushed the erik/upload-reorder branch 2 times, most recently from 76d75e5 to e207145 Compare January 3, 2025 11:01

Base automatically changed from erik/assert-upload-index to main January 3, 2025 16:04

erikgrinaker force-pushed the erik/upload-reorder branch from e207145 to 419019e Compare January 4, 2025 09:07

erikgrinaker marked this pull request as ready for review January 4, 2025 09:08

erikgrinaker requested a review from a team as a code owner January 4, 2025 09:08

erikgrinaker requested a review from VladLazar January 4, 2025 09:12

pageserver: reorder upload queue when possible

e91e7e4

erikgrinaker force-pushed the erik/upload-reorder branch from 419019e to e91e7e4 Compare January 4, 2025 09:36

erikgrinaker mentioned this pull request Jan 5, 2025

pageserver: upload/delete race #10283

Open

VladLazar approved these changes Jan 6, 2025

View reviewed changes

skyzh approved these changes Jan 6, 2025

View reviewed changes

pageserver/src/tenant/upload_queue.rs Show resolved Hide resolved

pageserver/src/tenant/upload_queue.rs Show resolved Hide resolved

Merge branch 'main' into erik/upload-reorder

11a217e

This was referenced Jan 14, 2025

pageserver: limit number of upload queue tasks #10384

Merged

pageserver: disable upload queue reordering, for tests #10385

Closed

erikgrinaker added this pull request to the merge queue Jan 14, 2025

Merged via the queue into main with commit ffaa52f Jan 14, 2025
85 checks passed

erikgrinaker deleted the erik/upload-reorder branch January 14, 2025 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: reorder upload queue when possible #10218

pageserver: reorder upload queue when possible #10218

erikgrinaker commented Dec 20, 2024 •

edited

Loading

github-actions bot commented Dec 20, 2024 •

edited

Loading

Postgres 17

Postgres 14

erikgrinaker commented Jan 3, 2025

VladLazar left a comment

erikgrinaker commented Jan 6, 2025

skyzh left a comment •

edited

Loading

skyzh commented Jan 6, 2025

erikgrinaker commented Jan 14, 2025

erikgrinaker commented Jan 14, 2025 •

edited

Loading

erikgrinaker commented Jan 14, 2025

pageserver: reorder upload queue when possible #10218

pageserver: reorder upload queue when possible #10218

Conversation

erikgrinaker commented Dec 20, 2024 • edited Loading

Problem

Summary of changes

github-actions bot commented Dec 20, 2024 • edited Loading

7330 tests run: 6959 passed, 0 failed, 371 skipped (full report)

Postgres 17

Postgres 14

Code coverage* (full report)

erikgrinaker commented Jan 3, 2025

VladLazar left a comment

Choose a reason for hiding this comment

erikgrinaker commented Jan 6, 2025

skyzh left a comment • edited Loading

Choose a reason for hiding this comment

skyzh commented Jan 6, 2025

erikgrinaker commented Jan 14, 2025

erikgrinaker commented Jan 14, 2025 • edited Loading

erikgrinaker commented Jan 14, 2025

erikgrinaker commented Dec 20, 2024 •

edited

Loading

github-actions bot commented Dec 20, 2024 •

edited

Loading

skyzh left a comment •

edited

Loading

erikgrinaker commented Jan 14, 2025 •

edited

Loading