Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(streaming): add memory size limit to exchange #16195

Closed
wants to merge 16 commits into from

Conversation

wangrunji0408
Copy link
Contributor

Signed-off-by: Runji Wang [email protected]I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR adds a memory size limit to exchange channels, in order to prevent potential OOM issues. This is a successor of #13775. In that PR we completely replaced row limit by memory size limit, but it has been proven to have a bad impact on performance. Therefore, in this PR, we do not abandon the row limit, but add an additional memory limit. In the implementation, we add a new bytes permit which will work in parallel with the other two. The default size limit is arbitrarily set to 1MB.

Besides, this PR removes the explicit threshold to send the permit back. In my opinion, ACK can be delayed but should not be delayed forever. So I changed the receiver behavior to accept all ready messages in a batch and then ACK them once. This should allow ACKs to be replied timely but not too frequently. 🤔

Checklist

  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Signed-off-by: Runji Wang <[email protected]>
Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please do some benchmark & longevity before merging

src/stream/src/executor/exchange/input.rs Outdated Show resolved Hide resolved
src/stream/src/executor/exchange/input.rs Outdated Show resolved Hide resolved
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I changed the receiver behavior to accept all ready messages in a batch and then ACK them once. This should allow ACKs to be replied timely but not too frequently. 🤔

I'm not sure if it works as well as before, but it sounds promising. Let's give it a try.

@wangrunji0408 wangrunji0408 requested a review from a team as a code owner April 15, 2024 03:44
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
@wangrunji0408
Copy link
Contributor Author

Performance degraded in nexmark q0, q5(not shown, about -20%), q15, q17, q20. 😕
improved: q8.

截屏2024-04-24 10 38 05

Copy link
Contributor

github-actions bot commented Jul 3, 2024

This PR has been open for 60 days with no activity.
If it's blocked by code review, feel free to ping a reviewer or ask someone else to review it.
If you think it is still relevant today, and have time to work on it in the near future, you can comment to update the status, or just manually remove the no-pr-activity label.
You can also confidently close this PR to keep our backlog clean. (If no further action taken, the PR will be automatically closed after 7 days. Sorry! 🙏) Don't worry if you think the PR is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/run-cpu-flamegraph-cron Runs the cron workflow for generating cpu flamegraph no-pr-activity type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants