-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(streaming): memory-size-based back-pressure in exchange #13775
Conversation
Signed-off-by: Runji Wang <[email protected]>
Previously it's 2048 rows. So we're basically assuming one row to be 500 bytes, which seems to be more than enough. However, considering that these values are the decreased results since #10419 which indeed hurts the performance, I guess we can have a try. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend trying this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM
pub fn stream_exchange_initial_permits() -> usize { | ||
2048 | ||
pub fn stream_exchange_max_bytes() -> usize { | ||
1 << 20 // 1MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.
Agree. This may have great impact on performance. Let's run some performance testing first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangrunji0408 Have we run some performance bench yet? Not longevity, such as nexmark-benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to run perf test before merging. It may be the reason for #13821. Should we revert this PR first or increase the memory limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, perhaps, unless we can fix it in 1 day. I guess we will need to run for rounds to find out an optimimal setting, and this may need some time.
If still oom with this pr, we may need further discussion on how to control small memory in our system. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #13775 +/- ##
==========================================
- Coverage 68.15% 68.13% -0.03%
==========================================
Files 1524 1524
Lines 262331 262310 -21
==========================================
- Hits 178793 178721 -72
- Misses 83538 83589 +51
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Longevity test: https://buildkite.com/risingwave-test/longevity-test/builds/833#018c33a4-9615-441d-93a9-4fedd2ca3248 |
Thus, I believe the oom is due to the accumulation of various small memories. Currently in this longevity test, 400 MVs is created, to some extent it is an unreasonable workload so I think it's not urgent. We need to think about how to manage these small memories. tracked here |
bench to check if it is related to #13821, use the same image in the longevity test. https://buildkite.com/risingwave-test/nexmark-benchmark/builds/2609 |
I also started a bench to see the performance with larger memory limit (1MB -> 10MB) |
Although I'm not sure if 10 MB performs well, it seems too large. 😕 An exchange between 16 upstream actors and 16 downstream ones will consume up to 2.5 GiB. I'm quite skeptical about whether this can truly help on OOM. |
FYI, when first introducing the permit-based back-pressure, I proposed a conjecture that the "width" of a chunk does not impact the performance very much (as most columns are "details" and do not involve computation), but significantly contribute to its memory usage. That is why I chose to use the row count as the unit. However, it's true that this is essentially a trade-off between stability and performance. Perhaps the ultimate solution should be more dynamic and flexible. |
I think It is related to #13821 |
could it be because the buffers consume too much memory and make other modules inefficient? 🤔 Although I check the operator's cache miss rate and they do not changes |
It seems even more unstable for 10MB limit. Needs more investigation, such as what is the distribution of chunk sizes in the exchange channel. Before figuring it out, let's revert this PR. |
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR changes the back-pressure mechanism from row-based to memory-size-based, in order to avoid OOM in exchange.
In implementation, it simply changes the permit unit from number of rows to number of estimated bytes of the chunk. The related parameters are also renamed to reflect this change:
exchange_initial_permits
->exchange_max_bytes
exchange_batched_permits
->exchange_ack_bytes
The default size limit is arbitrarily set to 1MB. I'm not sure if it is a proper value and may need to be tuned later.
Checklist
./risedev check
(or alias,./risedev c
)Documentation