feat(storage): wake up memory acquire request in order #15921

Little-Wallace · 2024-03-26T10:17:20Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

close #15786
See design details in issue

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Little-Wallace <[email protected]>

wenym1

The current memory limiter is very similar to tokio::sync::Semaphore. Maybe we can try borrowing some implementation details from it.

wenym1 · 2024-05-07T05:54:37Z

src/storage/src/hummock/utils.rs

+                // When this request is the first waiter but the previous `MemoryTracker` is just release a large quota, it may skip notifying this waiter because it has checked `inflight_barrier` and found it was zero. So we must set it one and retry `try_require_memory_in_capacity` again to avoid deadlock.
+                self.pending_request_count.store(1, AtomicOrdering::Release);
+                if self.try_require_memory_in_capacity(quota, self.fast_quota) {
+                    self.pending_request_count.store(0, AtomicOrdering::Release);


Can we move this fast path code self.try_require_memory_in_capacity(quota, self.fast_quota) to before acquiring the lock? In this way, if we successfully acquire the quota via CAS, we can avoid locking and writing the atomic variable.

This method is always called before try_require_memory, it means that has a failed try by CAS without lock

We must try to acquire with lock, because if there is no tracker hold by other threads, we can not notify this waiter forever.

src/storage/src/hummock/utils.rs

wenym1 · 2024-05-07T07:23:20Z

src/storage/src/hummock/utils.rs

+            self.inner.limiter.release_quota(quota);
+        }
+        // check `inflight_barrier` to avoid access lock every times drop `MemoryTracker`.
+        if self


Maybe we can optimize the case when there are concurrent drops in multiple threads.

A rough idea will be, each thread will try to set the pending_request_count to 0 via CAS in a loop, and the only one that successfully sets it to 0 will do the notification.

src/common/src/config.rs

Signed-off-by: Little-Wallace <[email protected]>

wenym1

Can we have some unit tests to test the correctness and performance? In unit test we can launch a multi-threaded tokio runtime and spawn many tasks to keep doing require_memory, sleep for a while and then drop the tracker, in this way we may find out some uncovered corner case.

src/storage/src/hummock/utils.rs

Signed-off-by: Little-Wallace <[email protected]>

wenym1

Rest LGTM

src/storage/src/hummock/utils.rs

Signed-off-by: Little-Wallace <[email protected]>

hzxa21 · 2024-05-10T10:29:10Z

src/storage/src/hummock/utils.rs

+        let first_req = waiters.is_empty();
+        if first_req {
+            // When this request is the first waiter but the previous `MemoryTracker` is just release a large quota, it may skip notifying this waiter because it has checked `inflight_barrier` and found it was zero. So we must set it one and retry `try_require_memory` again to avoid deadlock.
+            self.pending_request_count.store(1, AtomicOrdering::Release);


I think it is simpler and easier to understand if we make this an atomic boolean and rename it to has_waiters

src/storage/src/hummock/utils.rs

Signed-off-by: Little-Wallace <[email protected]>

MrCroxx

LGTM

src/storage/src/hummock/utils.rs

wenym1

LGTM

src/storage/src/hummock/utils.rs

Li0k · 2024-05-11T06:30:32Z

src/storage/src/hummock/utils.rs

+        let mut waiters = self.controller.lock();
+        let first_req = waiters.is_empty();
+        if first_req {
+            // When this request is the first waiter but the previous `MemoryTracker` is just release a large quota, it may skip notifying this waiter because it has checked `has_waiter` and found it was false. So we must set it one and retry `try_require_memory` again to avoid deadlock.


typo: “set it one”

hzxa21 · 2024-05-11T07:35:18Z

src/storage/src/hummock/utils.rs

+                break;
+            }
+            let (tx, quota) = waiters.pop_front().unwrap();
+            let _ = tx.send(MemoryTrackerImpl::new(self.clone(), quota));


The MemoryTrackerImpl and the newly added PendingRequestCancelGuard seem to be an over-design for the corner case mentioned above. Let's analyze all the cases when quota is acquired:

When the quota is acquired via MemoryRequest::Ready (L287), we need to release the quota and notify waiters.

When the quota is acquired here in may_notify_waiters in L243, there are two sub-cases:
a. tx.send here succeeds, regardless of whether the rx is dropped or not, we need to release quota and notify waiter.
b. tx.send here fails (due to rx has been dropped before tx.send). we only need to release quota but not notify watier (Otherwise, it will deadlock).

Therefore, the implementation can be greatly simplified without introducing MemoryTrackerImpl and PendingRequestCancelGuard.

// MemoryTrackerImpl is not needed. struct MemoryTracker { limiter: Arc<MemoryLimiterInner>, quota: Option<u64>, } // MemoryTracker::drop will release quota + notifier waiters if quota is not None (1 + 2.a) impl Drop for MemoryTracker { fn drop(&mut self) { if let Some(quota) = self.quota.take() { self.limiter.release_quota(quota); self.limiter.may_notify_waiters(); } } } // Add a new method to releae quota without notify waiters. impl MemoryTracker { fn release_quota(self) { if let Some(quota) = self.quota.take() { self.limiter.release_quota(quota); } } } ... // MemoryRequest takes MemoryTracker enum MemoryRequest { Ready(MemoryTracker), Pending(Receiver<MemoryTracker>), } ... impl MemoryLimiterInner { ... fn may_notify_waiters(self: &Arc<Self>) { ... let mut waiters = self.controller.lock(); while let Some((_, quota)) = waiters.front() { if !self.try_require_memory(*quota) { break; } ... if let Err(SendError(tracker)) = tx.send(MemoryTracker::new(self.clone(), quota)) { // 2.b tracker.release_quota(quota); } } ... } ... impl MemoryLimiter { pub async fn require_memory(&self, quota: u64) -> MemoryTracker { // PendingRequestCancelGuard is not needed match self.inner.require_memory(quota) { MemoryRequest::Ready(tracker) => tracker, MemoryRequest::Pending(rx) => rx.await } }

We can not send MemoryTracker with in lock of controller because if it drops in this method, it may call may_notify_waiters again, which would acquire one lock twice.

We met this bug in #6634

If tokio-rs/tokio#6558 looks good from the tokio community, we can have a patch on the tokio version we are using, and then we can adopt the code in this comment.

Signed-off-by: Little-Wallace <[email protected]>

wenym1 · 2024-05-15T06:19:55Z

I found loom, an interesting framework to test concurrent utilities. It provides mocked atomic variable and mutex, and block_on to support future.await, which covers all functionalities we use to sync between multiple parallelisms in our memory limiter. Shall we have some test written with the loom framework? It will be really helpful to prove the correctness of our implementation. You may refer to the loom test written for tokio::sync.

hzxa21

LGTM

Li0k

Rest LGTM

Li0k · 2024-05-15T08:31:45Z

src/storage/src/hummock/utils.rs

            true
        } else {
            false
        }
    }
 }

+// We must notify waiters outside `MemoryTracker` to avoid dead-lock and loop-owner.


support blocking behind barrier

0f24619

Signed-off-by: Little-Wallace <[email protected]>

github-actions bot added the type/feature label Mar 26, 2024

Little-Wallace added 2 commits March 26, 2024 21:31

notify and change first barrier

cea0606

Signed-off-by: Little-Wallace <[email protected]>

Merge branch 'main' into wallace/memory-limit-barrier

9de6dcc

Little-Wallace marked this pull request as ready for review March 29, 2024 13:11

Little-Wallace added 3 commits April 15, 2024 10:39

Merge branch 'main' into wallace/memory-limit-barrier

66aabc5

refactor to single que

c69ad89

Signed-off-by: Little-Wallace <[email protected]>

fix memory order

8f51202

Signed-off-by: Little-Wallace <[email protected]>

wenym1 requested review from hzxa21, Li0k and wenym1 May 6, 2024 10:06

wenym1 reviewed May 7, 2024

View reviewed changes

Little-Wallace added 4 commits May 7, 2024 15:32

fix ut

241a2fb

Signed-off-by: Little-Wallace <[email protected]>

address comment

93f608d

Signed-off-by: Little-Wallace <[email protected]>

revert fast quota

b6ae16b

Signed-off-by: Little-Wallace <[email protected]>

fix ci

ce102f4

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace changed the title ~~feat(storage): allocate quota in barrier order~~ feat(storage): wake up memory acquire request in order May 8, 2024

Little-Wallace force-pushed the wallace/memory-limit-barrier branch from 616e2aa to ce102f4 Compare May 8, 2024 06:27

wenym1 reviewed May 8, 2024

View reviewed changes

src/storage/src/hummock/utils.rs Outdated Show resolved Hide resolved

src/storage/src/hummock/utils.rs Outdated Show resolved Hide resolved

Little-Wallace added 3 commits May 8, 2024 17:43

fix notify order

b46bf76

Signed-off-by: Little-Wallace <[email protected]>

add ut

efa24f4

Signed-off-by: Little-Wallace <[email protected]>

fix test

df6f40d

Signed-off-by: Little-Wallace <[email protected]>

wenym1 requested a review from MrCroxx May 9, 2024 08:58

wenym1 reviewed May 9, 2024

View reviewed changes

address comment

11fc843

Signed-off-by: Little-Wallace <[email protected]>

hzxa21 reviewed May 10, 2024

View reviewed changes

address comment

e33824f

Signed-off-by: Little-Wallace <[email protected]>

MrCroxx approved these changes May 11, 2024

View reviewed changes

src/storage/src/hummock/utils.rs Outdated Show resolved Hide resolved

wenym1 approved these changes May 11, 2024

View reviewed changes

Li0k reviewed May 11, 2024

View reviewed changes

hzxa21 requested changes May 11, 2024

View reviewed changes

refactor

22188ed

Signed-off-by: Little-Wallace <[email protected]>

Little-Wallace force-pushed the wallace/memory-limit-barrier branch from 57d2a3f to 22188ed Compare May 14, 2024 03:34

Merge branch 'main' into wallace/memory-limit-barrier

3bd9ad5

hzxa21 approved these changes May 15, 2024

View reviewed changes

Little-Wallace added this pull request to the merge queue May 15, 2024

Li0k approved these changes May 15, 2024

View reviewed changes

Merged via the queue into main with commit 44e711c May 15, 2024
27 of 28 checks passed

Little-Wallace deleted the wallace/memory-limit-barrier branch May 15, 2024 08:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): wake up memory acquire request in order #15921

feat(storage): wake up memory acquire request in order #15921

Little-Wallace commented Mar 26, 2024

wenym1 left a comment

wenym1 May 7, 2024

Little-Wallace May 7, 2024

Little-Wallace May 7, 2024

wenym1 May 7, 2024

wenym1 left a comment

wenym1 left a comment

hzxa21 May 10, 2024

Little-Wallace May 10, 2024

MrCroxx left a comment

wenym1 left a comment

Li0k May 11, 2024

hzxa21 May 11, 2024 •

edited

Loading

Little-Wallace May 14, 2024

Little-Wallace May 14, 2024

wenym1 May 15, 2024

wenym1 commented May 15, 2024

hzxa21 left a comment

Li0k left a comment

Li0k May 15, 2024

feat(storage): wake up memory acquire request in order #15921

feat(storage): wake up memory acquire request in order #15921

Conversation

Little-Wallace commented Mar 26, 2024

What's changed and what's your intention?

Checklist

Documentation

Release note

wenym1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenym1 left a comment

Choose a reason for hiding this comment

wenym1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MrCroxx left a comment

Choose a reason for hiding this comment

wenym1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzxa21 May 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenym1 commented May 15, 2024

hzxa21 left a comment

Choose a reason for hiding this comment

Li0k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzxa21 May 11, 2024 •

edited

Loading