Tracking: online scaling in compute node #3750

BugenZhao · 2022-07-08T14:17:46Z

To support scaling in our system, we decide to generally follow the design in Re-Introduce Configuration Change based on Pause Barrier. After consistent hash has been utilized in most of the critical places in our system (#3543), it's high time we start doing this!

This task can be roughly divided into several steps below.

The text was updated successfully, but these errors were encountered:

jon-chuang · 2022-07-11T08:06:22Z

Support connecting or disconnecting an actor from the upstream and downstream of the graph.

Isn't this already covered by AddOutput and Stop?

BugenZhao · 2022-07-11T08:27:51Z

Support connecting or disconnecting an actor from the upstream and downstream of the graph.

Isn't this already covered by AddOutput and Stop?

Exactly. However, they're only used for creating and dropping materialized views. We should reuse them to support creating and dropping parallel units.

jon-chuang · 2022-07-11T09:32:20Z

What should the interface be for this? I guess you are referring to having a high-level interface in meta like:

// in meta/..
fragment.add_parallel_units(par_unit_ids: &[usize]) // -> create_actors, delete_actors, add outputs to upstream, stop prev outputs from upstream
fragment.remove_parallel_units(par_unit_ids: &[usize])

Currently, we need to wait until new actors are created, before we can send AddOutput barrier messages? Or is there a form of synchronization for this in the LocalStreamManager?

I guess for the time being, whenever there are new compute nodes added, we can scale up to the new parallel units. We can have a more fine-grained per-fragment control of parallel units in the future once we decide on a scaling and placement policy.

Furthermore, what should be the behaviour for stateful operators? Should it always replace all of the current actors? Or should we allow existing actors to continue?

In the latter case, we need an UpdateVnodes barrier message, which if it matches the actor ID of the executor, will trigger update of vnodes of the state table. The state table will need to flush its previous state, ensuring !state_table.is_dirty().

Some of the keys (for scale up) may no longer be relevant to the node. We could rely on LRU to evict these unused keys or explicitly evict them by iterating through all the keys in the application cache and evict the ones whose vnodes are in the removed_vnodes set.

After resume barrier, we can make use of the new vnodes in the state table.

I guess for the time being we can stop all actors and create new ones for the entire fragment. It's unproven whether its necessary to reuse existing actors. Actor startup should be fast and cache can be populated easily from block cache.

BugenZhao · 2022-07-12T05:43:46Z

I guess you are referring to having a high-level interface in meta like:

Exactly.

Currently, we need to wait until new actors are created, before we can send AddOutput barrier messages?

Yep. This is very similar to creating materialized views. Currently, we first build the actors on the worker nodes and then issue a command to the global barrier manager. Therefore, the consistency is kept.

whenever there are new compute nodes added, we can scale up to the new parallel units

Yes. This will be included in the step "Utilize parallel units of newly joined compute nodes."

Should it always replace all of the current actors? Or should we allow existing actors to continue?

In the original design by @fuyufjh, we will drop all of the current actors for simplicity. However, after we've unified the state interface with StateTable, updating the partition info (vnodes) can also be simple so I think we can reuse the existing actors.

Actor startup should be fast and cache can be populated easily from block cache.

Not sure how much it will cost, while keeping the cache in the original actors can always be better.

fuyufjh · 2022-07-12T06:53:28Z

Should it always replace all of the current actors? Or should we allow existing actors to continue?

In the original design by @fuyufjh, we will drop all of the current actors for simplicity. However, after we've unified the state interface with StateTable, updating the partition info (vnodes) can also be simple so I think we can reuse the existing actors.

Agree. Actually, my initial design is to reuse the existing actor and ignore the data that is not owned by it anymore, which will not affect anything and will be evicted out soon or later in theory. But please feel free to simplify the design.

BugenZhao assigned shanicky, yezizp2012 and BugenZhao Jul 8, 2022

BugenZhao added component/meta Meta related issue. component/streaming Stream processing related issue. type/feature labels Jul 8, 2022

BugenZhao mentioned this issue Jul 8, 2022

refactor(meta): improve create mview / scheduler readability #3748

Merged

BugenZhao added the difficulty/hard Issues that need deep insight of the system and expected to cost lot of work label Jul 8, 2022

yezizp2012 assigned HuaHuaY Jul 11, 2022

yuhao-su self-assigned this Jul 11, 2022

fuyufjh mentioned this issue Jul 22, 2022

Configuration Change based on Pause/Resume Barrier #767

Closed

4 tasks

yezizp2012 mentioned this issue Jul 25, 2022

Tracking: scale in Meta Service #60

Closed

5 tasks

This was referenced Jul 28, 2022

risectl: manually scaling and migration for test purposes #4244

Closed

feat(ctl): show cluster info / parallel unit matrix in risectl #4252

Merged

cnissnzg self-assigned this Aug 5, 2022

yezizp2012 mentioned this issue Aug 18, 2022

feat: notify actor info when CN joined and dynamically adjust the parallelism of chain #4710

Closed

3 tasks

BugenZhao mentioned this issue Aug 22, 2022

feat: enforce chain parallelism and fill proper upstream with same vnode range #4740

Merged

3 tasks

fuyufjh added this to the release-0.1.13 milestone Sep 5, 2022

BugenZhao unassigned yuhao-su Sep 7, 2022

yezizp2012 unassigned cnissnzg Sep 7, 2022

This was referenced Sep 7, 2022

feat(streaming): support configuration change on receiver executor #5174

Merged

feat: support rescheduling table source & handle corner cases #5226

Merged

shanicky mentioned this issue Sep 15, 2022

feat: scaling with reschedule semantics #5358

Merged

3 tasks

This was referenced Sep 21, 2022

fix(streaming): corners cases for scaling (Part 1) #5470

Merged

fix(streaming): wait epoch when scaling (Part 2) #5503

Closed

BugenZhao mentioned this issue Sep 23, 2022

fix(streaming): wait epoch before scaling (Part 2) #5517

Merged

3 tasks

BugenZhao modified the milestones: release-0.1.13, release-0.1.14 Sep 30, 2022

BugenZhao mentioned this issue Nov 18, 2022

feat: config change to scale in/out #3284

Closed

4 tasks

fuyufjh modified the milestones: release-0.1.14, release-0.1.15 Nov 21, 2022

fuyufjh removed this from the release-0.1.15 milestone Dec 19, 2022

yezizp2012 unassigned HuaHuaY Jan 4, 2023

yezizp2012 added the type/tracking Tracking issue. label Jul 12, 2023

shanicky modified the milestones: release-1.10, release-1.11 Jul 10, 2024

shanicky removed this from the release-2.0 milestone Aug 19, 2024

shanicky added this to the future-release-2.4 milestone Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: online scaling in compute node #3750

Tracking: online scaling in compute node #3750

BugenZhao commented Jul 8, 2022 •

edited

Loading

jon-chuang commented Jul 11, 2022

BugenZhao commented Jul 11, 2022

jon-chuang commented Jul 11, 2022 •

edited

Loading

BugenZhao commented Jul 12, 2022

fuyufjh commented Jul 12, 2022 •

edited

Loading

Tracking: online scaling in compute node #3750

Tracking: online scaling in compute node #3750

Comments

BugenZhao commented Jul 8, 2022 • edited Loading

jon-chuang commented Jul 11, 2022

BugenZhao commented Jul 11, 2022

jon-chuang commented Jul 11, 2022 • edited Loading

BugenZhao commented Jul 12, 2022

fuyufjh commented Jul 12, 2022 • edited Loading

BugenZhao commented Jul 8, 2022 •

edited

Loading

jon-chuang commented Jul 11, 2022 •

edited

Loading

fuyufjh commented Jul 12, 2022 •

edited

Loading