Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: reduce stream barrier body and only send one copy of it to each CN #14533

Closed
yezizp2012 opened this issue Jan 12, 2024 · 2 comments · Fixed by #15644
Closed

feat: reduce stream barrier body and only send one copy of it to each CN #14533

yezizp2012 opened this issue Jan 12, 2024 · 2 comments · Fixed by #15644
Assignees
Milestone

Comments

@yezizp2012
Copy link
Member

yezizp2012 commented Jan 12, 2024

When the parallelism in the cluster is relatively high and there are many multi-way join streaming jobs, the barrier body BarrierMutation on the stream will be amplified multiple times. When flowing between different compute nodes, it will cause a significant amount of memory usage for prost message decoding and may result in OOM. Here is a solution to fix it, details described as bellow:

Some thoughts discussed with @st1page , there is one feasible optimization solution to change the process of the barrier:
Before sending the barrier, it can be sent to the local barrier manager on compute node first. When injecting the barrier, we can provide the id (epoch) only and let the actors to read specific mutation information from local barrier manager if necessary. By this way, BarrierMutation only needs to be decoded once on each compute node.

Originally posted by @yezizp2012 in #13060 (comment)

@github-actions github-actions bot added this to the release-1.7 milestone Jan 12, 2024
@BugenZhao
Copy link
Member

+1 for this. In this way the Stashed state can also be removed. 😄

/// Barriers from some actors have been collected and stashed, however no `send_barrier`
/// request from the meta service is issued.
Stashed {
/// Actor ids we've collected and stashed.
collected_actors: HashSet<ActorId>,
},

@yezizp2012 yezizp2012 changed the title feat: reduce stream barrier body and send one copy of barrier body to each CN feat: reduce stream barrier body and only send one copy of it to each CN Jan 12, 2024
@yezizp2012 yezizp2012 self-assigned this Jan 12, 2024
@yezizp2012 yezizp2012 modified the milestones: release-1.7, release-1.8 Mar 6, 2024
@kwannoel
Copy link
Contributor

Maybe related @fuyufjh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants