feat(meta): deprecate parallel unit #17523

4 / 4 tasks completed

All tasks have been completed

Required Tasks

Task	Status
Removed the dependency on persistent global `ParallelUnit`, replacing them with dynamic temporary `WorkerSlots`. Previously, `ParallelUnit` had two semantics: Worker and Worker Parallelism. So for each Worker with parallelism of Parallelism, we generated (WorkerId, 0), (WorkerId, 1) and so on to replace it. The vnode mapping in the communication between meta and the frontend also uses worker slots, which should be simplified to WorkerMapping in the future.	Incomplete
Removed the `VnodeMapping` of `Fragment`. Since `VnodeMapping` was implemented through `ParallelUnitMapping`, we can derive `VnodeMapping` from `ActorBitmaps` and `ActorStatus`, so a persistent `VnodeMapping` is not needed. Moreover, binding Fragments to Workers is actually quite strange.	Incomplete
Modified the Reschedule interface. The previous syntax was `{fragment_id}-[parallel_unit]+[parallel_unit]`, which has now been changed to ~~`{fragment_id}-[worker_id:count]+[worker_id:count]`~~ `{fragment_id}:[worker_id:diff]`. Reschedule is now defined by modifying the number of allocations on workers. Simulation tests have all been updated to adapt.	Incomplete
~~Considering compatibility, `rw_parallel_units` is still retained, changed to the form of `(slot_id, worker_id)`.~~	Incomplete
The ParallelUnit in ActorStatus has not been removed because I haven't figured out how to handle backward compatibility elegantly yet. 🥵 We will only use the WorkerId field. If it's a new ActorStatus, we will fill ParallelUnit with u32::MAX as the id.	Incomplete
The original interface for generating stable resize plans has been removed in this PR. Due to the introduction of alter syntax and auto scale, this feature is rarely used. If we were to modify it, this PR would continue to grow larger. Let's discuss this later.	Incomplete
The `parallel_units` field in the `WorkerNode` message has been removed; its field number is now reserved.	Incomplete
A new field, `uint32 parallelism`, has been introduced to the `WorkerNode` message, enhancing its descriptive power.	Incomplete
Reserved `vnode_mapping` in `TableFragments`.	Incomplete
Replaced and reserved `parallel_unit_migration_plan` in `MigrationPlan` with a new `worker_slot_migration_plan` field.	Incomplete
Reserved `parallel_unit_id` in `ListActorStatesFoResponse`, substituting it with a new `worker_id` field.	Incomplete
The `Reschedule` message is now more appropriately named `WorkerReschedule`, with changes to the map types for `added_parallel_units` and `removed_parallel_units`.	Incomplete
The `RescheduleRequest` message has been simplified by removing the `reschedules` field and including `worker_reschedules`.	Incomplete
Deletion of `GetReschedulePlanRequest` and `GetReschedulePlanResponse` messages alongside their corresponding RPC method in `ScaleService` (i.e., `GetReschedulePlan`) to reflect the new design.	Incomplete
Reflected the Protocol Buffers changes with a switch from using `worker.parallel_units.len()` to `worker.parallelism as usize`.	Incomplete
Redesigned the worker slot to worker node mapping system following the removal of `parallel_units`, utilizing a new structure based on the `id` and `parallelism`.	Incomplete
Modified the `total_parallelism` function to compute using the new `parallelism` field rather than the length of `parallel_units`.	Incomplete
`WorkerSlotId` now uses a custom implementation of `fmt` for its `Debug	Incomplete
I have written necessary rustdoc comments	Completed
I have added necessary unit tests and integration tests	Completed
I have added test labels as necessary. See [details]	Completed
All checks passed in `./risedev check` (or alias, `./risedev c`)	Completed
During scheduling, we use WorkerSlot to ensure this correspondence, meaning the WorkerSlots of NoShuffle's upstream and downstream are consistent.	Incomplete
During scaling, which is the rescheduling process, the original 1-to-1 relationship is retained in the actor's dispatcher. We can rebuild based on this relationship without needing WorkerSlot alignment. For upstream and downstream in cascade no shuffle expansion, we use temporary WorkerSlots to solve it (same as scheduling).	Incomplete
We can have a "code deletion" PR first (e.g., `GetReschedulePlan`). I guess it's trivial to merge, and can reduce a lot of LoC of this PR.	Incomplete
Another idea is to do the "concept mapping" (or do abstraction) refactoring first: e.g., previously we have `worker.parallel_units.len()`. We can first refactor that into a method `worker.parallelism()` (without changing the implementation). And in the new PR, we just need to change the implementation. (But I'm not sure how much similar things we can do for this PR, and how much it would help)	Incomplete
Graphite can help a lot to work on the splitted PRs simultaneously! (specifically, automatically rebase later PRs when you make changes to prior ones)	Incomplete
Do we persist the mapping?	Incomplete
Do we directly use worker id?	Incomplete
Is worker id reusable?	Incomplete
Do we persist the mapping?	Incomplete
Do we directly use worker id?	Incomplete
Is worker id reusable?	Incomplete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(meta): deprecate parallel unit #17523

feat(meta): deprecate parallel unit #17523

4 / 4 tasks completed

Details

Required Tasks

Re-running checks...

feat(meta): deprecate parallel unit #17523

Update assert in sink recovery test to allow equal counts

feat(meta): deprecate parallel unit #17523

4 / 4 tasks completed

Details

Required Tasks

Re-running checks...