Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(streaming): memory-size-based back-pressure in exchange #13775

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Dec 4, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR changes the back-pressure mechanism from row-based to memory-size-based, in order to avoid OOM in exchange.

In implementation, it simply changes the permit unit from number of rows to number of estimated bytes of the chunk. The related parameters are also renamed to reflect this change:

  • exchange_initial_permits -> exchange_max_bytes
  • exchange_batched_permits -> exchange_ack_bytes

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

@BugenZhao
Copy link
Member

BugenZhao commented Dec 4, 2023

The default size limit is arbitrarily set to 1MB.

Previously it's 2048 rows. So we're basically assuming one row to be 500 bytes, which seems to be more than enough. However, considering that these values are the decreased results since #10419 which indeed hurts the performance, I guess we can have a try.

Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @fuyufjh @lmatz Do we have a test that can verify these changes help to relieve the OOM problem?

@fuyufjh
Copy link
Member

fuyufjh commented Dec 4, 2023

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Recommend trying this: @Nightly run all nexmark (16 sets of nexmark queries) with 1000 throughput https://buildkite.com/risingwave-test/longevity-test/builds/832. Now it always fails, and the metrics show it's not the fault of our LRU watermark manager, so it's somewhat likely to be caused by this (too much data in exchange channel).

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM

pub fn stream_exchange_initial_permits() -> usize {
2048
pub fn stream_exchange_max_bytes() -> usize {
1 << 20 // 1MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Agree. This may have great impact on performance. Let's run some performance testing first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangrunji0408 Have we run some performance bench yet? Not longevity, such as nexmark-benchmark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to run perf test before merging. It may be the reason for #13821. Should we revert this PR first or increase the memory limit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps, unless we can fix it in 1 day. I guess we will need to run for rounds to find out an optimimal setting, and this may need some time.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Dec 4, 2023

The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.

Recommend trying this: @Nightly run all nexmark (16 sets of nexmark queries) with 1000 throughput https://buildkite.com/risingwave-test/longevity-test/builds/832. Now it always fails, and the metrics show it's not the fault of our LRU watermark manager, so it's somewhat likely to be caused by this (too much data in exchange channel).

If still oom with this pr, we may need further discussion on how to control small memory in our system.

Copy link

codecov bot commented Dec 4, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (da79ff5) 68.15% compared to head (92bf7af) 68.13%.
Report is 11 commits behind head on main.

Files Patch % Lines
src/common/src/array/arrow.rs 69.23% 8 Missing ⚠️
src/stream/src/executor/exchange/input.rs 0.00% 1 Missing ⚠️
src/stream/src/executor/exchange/permit.rs 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13775      +/-   ##
==========================================
- Coverage   68.15%   68.13%   -0.03%     
==========================================
  Files        1524     1524              
  Lines      262331   262310      -21     
==========================================
- Hits       178793   178721      -72     
- Misses      83538    83589      +51     
Flag Coverage Δ
rust 68.13% <77.77%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wangrunji0408
Copy link
Contributor Author

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Dec 4, 2023

Longevity test failed. Still OOM. 😕

https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=usrlngvty-20231204-070444&from=1701673743000&to=1701682250000 WX20231204-175951@2x

Thus, I believe the oom is due to the accumulation of various small memories. Currently in this longevity test, 400 MVs is created, to some extent it is an unreasonable workload so I think it's not urgent. We need to think about how to manage these small memories. tracked here

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Dec 5, 2023
Merged via the queue into main with commit 51c76aa Dec 5, 2023
38 of 39 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/memory-based-channel-permit branch December 5, 2023 05:48
@st1page
Copy link
Contributor

st1page commented Dec 6, 2023

bench to check if it is related to #13821, use the same image in the longevity test. https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2609

@wangrunji0408
Copy link
Contributor Author

I also started a bench to see the performance with larger memory limit (1MB -> 10MB)
https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2610

@BugenZhao
Copy link
Member

Although I'm not sure if 10 MB performs well, it seems too large. 😕 An exchange between 16 upstream actors and 16 downstream ones will consume up to 2.5 GiB. I'm quite skeptical about whether this can truly help on OOM.

@BugenZhao
Copy link
Member

BugenZhao commented Dec 6, 2023

FYI, when first introducing the permit-based back-pressure, I proposed a conjecture that the "width" of a chunk does not impact the performance very much (as most columns are "details" and do not involve computation), but significantly contribute to its memory usage. That is why I chose to use the row count as the unit.

However, it's true that this is essentially a trade-off between stability and performance. Perhaps the ultimate solution should be more dynamic and flexible.

@st1page
Copy link
Contributor

st1page commented Dec 6, 2023

I think It is related to #13821
This PR branch:
image
nightly-1205:
image
nightly-1204:
image

@st1page
Copy link
Contributor

st1page commented Dec 6, 2023

could it be because the buffers consume too much memory and make other modules inefficient? 🤔 Although I check the operator's cache miss rate and they do not changes

@st1page
Copy link
Contributor

st1page commented Dec 6, 2023

@wangrunji0408
Copy link
Contributor Author

I also started a bench to see the performance with larger memory limit (1MB -> 10MB) https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2610

It seems even more unstable for 10MB limit.

WX20231206-130026@2x

Needs more investigation, such as what is the distribution of chunk sizes in the exchange channel. Before figuring it out, let's revert this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants