feat(streaming): memory-size-based back-pressure in exchange #13775

wangrunji0408 · 2023-12-04T03:32:41Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR changes the back-pressure mechanism from row-based to memory-size-based, in order to avoid OOM in exchange.

In implementation, it simply changes the permit unit from number of rows to number of estimated bytes of the chunk. The related parameters are also renamed to reflect this change:

exchange_initial_permits -> exchange_max_bytes
exchange_batched_permits -> exchange_ack_bytes

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Signed-off-by: Runji Wang <[email protected]>

BugenZhao · 2023-12-04T03:47:56Z

The default size limit is arbitrarily set to 1MB.

Previously it's 2048 rows. So we're basically assuming one row to be 500 bytes, which seems to be more than enough. However, considering that these values are the decreased results since #10419 which indeed hurts the performance, I guess we can have a try.

BugenZhao

cc @fuyufjh @lmatz Do we have a test that can verify these changes help to relieve the OOM problem?

fuyufjh · 2023-12-04T03:58:52Z

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Recommend trying this: @Nightly run all nexmark (16 sets of nexmark queries) with 1000 throughput https://buildkite.com/risingwave-test/longevity-test/builds/832. Now it always fails, and the metrics show it's not the fault of our LRU watermark manager, so it's somewhat likely to be caused by this (too much data in exchange channel).

fuyufjh

The code LGTM

fuyufjh · 2023-12-04T04:03:15Z

src/common/src/config.rs

-        pub fn stream_exchange_initial_permits() -> usize {
-            2048
+        pub fn stream_exchange_max_bytes() -> usize {
+            1 << 20 // 1MB


The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Agree. This may have great impact on performance. Let's run some performance testing first.

@wangrunji0408 Have we run some performance bench yet? Not longevity, such as nexmark-benchmark

Sorry, I forgot to run perf test before merging. It may be the reason for #13821. Should we revert this PR first or increase the memory limit?

Yeah, perhaps, unless we can fix it in 1 day. I guess we will need to run for rounds to find out an optimimal setting, and this may need some time.

wcy-fdu · 2023-12-04T05:28:56Z

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Recommend trying this: @Nightly run all nexmark (16 sets of nexmark queries) with 1000 throughput https://buildkite.com/risingwave-test/longevity-test/builds/832. Now it always fails, and the metrics show it's not the fault of our LRU watermark manager, so it's somewhat likely to be caused by this (too much data in exchange channel).

If still oom with this pr, we may need further discussion on how to control small memory in our system.

codecov · 2023-12-04T05:37:07Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (da79ff5) 68.15% compared to head (92bf7af) 68.13%.
Report is 11 commits behind head on main.

Files	Patch %	Lines
src/common/src/array/arrow.rs	69.23%	8 Missing ⚠️
src/stream/src/executor/exchange/input.rs	0.00%	1 Missing ⚠️
src/stream/src/executor/exchange/permit.rs	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #13775      +/-   ##
==========================================
- Coverage   68.15%   68.13%   -0.03%     
==========================================
  Files        1524     1524              
  Lines      262331   262310      -21     
==========================================
- Hits       178793   178721      -72     
- Misses      83538    83589      +51

Flag	Coverage Δ
rust	`68.13% <77.77%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wangrunji0408 · 2023-12-04T09:31:02Z

Longevity test: https://buildkite.com/risingwave-test/longevity-test/builds/833#018c33a4-9615-441d-93a9-4fedd2ca3248
Still running.

wangrunji0408 · 2023-12-04T10:00:48Z

Longevity test failed. Still OOM. 😕

https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=usrlngvty-20231204-070444&from=1701673743000&to=1701682250000

wcy-fdu · 2023-12-04T10:07:53Z

Longevity test failed. Still OOM. 😕

https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=usrlngvty-20231204-070444&from=1701673743000&to=1701682250000

Thus, I believe the oom is due to the accumulation of various small memories. Currently in this longevity test, 400 MVs is created, to some extent it is an unreasonable workload so I think it's not urgent. We need to think about how to manage these small memories. tracked here

st1page · 2023-12-06T02:54:08Z

bench to check if it is related to #13821, use the same image in the longevity test. https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2609

wangrunji0408 · 2023-12-06T03:17:48Z

I also started a bench to see the performance with larger memory limit (1MB -> 10MB)
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2610

BugenZhao · 2023-12-06T03:36:54Z

Although I'm not sure if 10 MB performs well, it seems too large. 😕 An exchange between 16 upstream actors and 16 downstream ones will consume up to 2.5 GiB. I'm quite skeptical about whether this can truly help on OOM.

BugenZhao · 2023-12-06T03:43:04Z

FYI, when first introducing the permit-based back-pressure, I proposed a conjecture that the "width" of a chunk does not impact the performance very much (as most columns are "details" and do not involve computation), but significantly contribute to its memory usage. That is why I chose to use the row count as the unit.

However, it's true that this is essentially a trade-off between stability and performance. Perhaps the ultimate solution should be more dynamic and flexible.

st1page · 2023-12-06T03:45:50Z

I think It is related to #13821
This PR branch:

nightly-1205:

nightly-1204:

st1page · 2023-12-06T03:57:08Z

could it be because the buffers consume too much memory and make other modules inefficient? 🤔 Although I check the operator's cache miss rate and they do not changes

st1page · 2023-12-06T04:02:27Z

https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2611 100KB channel

…13775)" This reverts commit 51c76aa.

wangrunji0408 · 2023-12-06T05:39:03Z

I also started a bench to see the performance with larger memory limit (1MB -> 10MB) https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2610

It seems even more unstable for 10MB limit.

Needs more investigation, such as what is the distribution of chunk sizes in the exchange channel. Before figuring it out, let's revert this PR.

Signed-off-by: Runji Wang <[email protected]>

change the unit of exchange channel permits from rows to bytes

92bf7af

Signed-off-by: Runji Wang <[email protected]>

wangrunji0408 requested review from TennyZhuang, fuyufjh and BugenZhao December 4, 2023 03:32

github-actions bot added the type/feature label Dec 4, 2023

BugenZhao approved these changes Dec 4, 2023

View reviewed changes

fuyufjh approved these changes Dec 4, 2023

View reviewed changes

TennyZhuang approved these changes Dec 5, 2023

View reviewed changes

wangrunji0408 added this pull request to the merge queue Dec 5, 2023

Merged via the queue into main with commit 51c76aa Dec 5, 2023
38 of 39 checks passed

wangrunji0408 deleted the wrj/memory-based-channel-permit branch December 5, 2023 05:48

wangrunji0408 added a commit that referenced this pull request Dec 6, 2023

Revert "feat(streaming): memory-size-based back-pressure in exchange (#…

6efe5ce

…13775)" This reverts commit 51c76aa.

wangrunji0408 mentioned this pull request Dec 6, 2023

Revert "feat(streaming): memory-size-based back-pressure in exchange" #13822

Merged

fuyufjh mentioned this pull request Dec 6, 2023

Compute node OOM while loading ch-benchmark data with 18 modified MVs #13060

Closed

wangrunji0408 added a commit that referenced this pull request Jan 30, 2024

feat(streaming): memory-size-based back-pressure in exchange (#13775)

1aaf333

Signed-off-by: Runji Wang <[email protected]>

wangrunji0408 added a commit that referenced this pull request Mar 4, 2024

feat(streaming): memory-size-based back-pressure in exchange (#13775)

195ce6e

Signed-off-by: Runji Wang <[email protected]>

wangrunji0408 mentioned this pull request Apr 8, 2024

feat(streaming): add memory size limit to exchange #16195

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(streaming): memory-size-based back-pressure in exchange #13775

feat(streaming): memory-size-based back-pressure in exchange #13775

wangrunji0408 commented Dec 4, 2023 •

edited

Loading

BugenZhao commented Dec 4, 2023 •

edited

Loading

BugenZhao left a comment

fuyufjh commented Dec 4, 2023

fuyufjh left a comment

fuyufjh Dec 4, 2023

fuyufjh Dec 5, 2023

wangrunji0408 Dec 6, 2023

fuyufjh Dec 6, 2023

wcy-fdu commented Dec 4, 2023

codecov bot commented Dec 4, 2023 •

edited

Loading

wangrunji0408 commented Dec 4, 2023

wangrunji0408 commented Dec 4, 2023 •

edited

Loading

wcy-fdu commented Dec 4, 2023

st1page commented Dec 6, 2023

wangrunji0408 commented Dec 6, 2023

BugenZhao commented Dec 6, 2023

BugenZhao commented Dec 6, 2023 •

edited

Loading

st1page commented Dec 6, 2023

st1page commented Dec 6, 2023

st1page commented Dec 6, 2023

wangrunji0408 commented Dec 6, 2023

feat(streaming): memory-size-based back-pressure in exchange #13775

feat(streaming): memory-size-based back-pressure in exchange #13775

Conversation

wangrunji0408 commented Dec 4, 2023 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

BugenZhao commented Dec 4, 2023 • edited Loading

BugenZhao left a comment

Choose a reason for hiding this comment

fuyufjh commented Dec 4, 2023

fuyufjh left a comment

Choose a reason for hiding this comment

fuyufjh Dec 4, 2023

Choose a reason for hiding this comment

fuyufjh Dec 5, 2023

Choose a reason for hiding this comment

wangrunji0408 Dec 6, 2023

Choose a reason for hiding this comment

fuyufjh Dec 6, 2023

Choose a reason for hiding this comment

wcy-fdu commented Dec 4, 2023

codecov bot commented Dec 4, 2023 • edited Loading

Codecov Report

wangrunji0408 commented Dec 4, 2023

wangrunji0408 commented Dec 4, 2023 • edited Loading

wcy-fdu commented Dec 4, 2023

st1page commented Dec 6, 2023

wangrunji0408 commented Dec 6, 2023

BugenZhao commented Dec 6, 2023

BugenZhao commented Dec 6, 2023 • edited Loading

st1page commented Dec 6, 2023

st1page commented Dec 6, 2023

st1page commented Dec 6, 2023

wangrunji0408 commented Dec 6, 2023

wangrunji0408 commented Dec 4, 2023 •

edited

Loading

BugenZhao commented Dec 4, 2023 •

edited

Loading

codecov bot commented Dec 4, 2023 •

edited

Loading

wangrunji0408 commented Dec 4, 2023 •

edited

Loading

BugenZhao commented Dec 6, 2023 •

edited

Loading