This repository has been archived by the owner on Oct 19, 2024. It is now read-only.
[FEATURE] Auto-Parallelization for Graphs with Predesignated Parameter Shardings #550
Labels
enhancement
New feature
Currently, Alpa only supports auto parallelization of a graph without existing annotation. However, there are cases where the shardings of some parameters are provided in advance. For example, when performing inference on variable length inputs, the graphs for different should share the same parameter sharding to be able to serve these graphs concurrently efficiently. This doc outlines the steps towards fully supporting auto-parallelization with user provided shardings:
Manual inter-operator parallelism and sharding-propagation based intra-operator parallelism
This should be our first step and should be the easiest to be implemented (@comaniac). We can follow the logic of compile_create_state_executable to:
alpa/alpa/create_state_parallel.py
Lines 84 to 86 in 8d69815
alpa/alpa/create_state_parallel.py
Lines 128 to 132 in 8d69815
Manual inter-operator parallelism and Alpa intra-operator parallelism
In this step, we need to modify the auto-sharding ILP in Alpa to support user provided annotation. The basic idea is to force some variables in the ILP to select the user provided choice. This part has already been implemented within Google's internal auto-sharding pass. We are in the process of open-sourcing that part into the official tensoflow/XLA codebase, which should be out within two weeks. I will update this part after the code is open-sourced.
Alpa inter- and intra-operator parallelism
In this step, we need to modify the stage-slicing DP algorithm in Alpa as follows:
I can follow up with a more detailed design for the algorithmic changes here.
The text was updated successfully, but these errors were encountered: