update en/train_vit_with_hybrid_parallel.py

hpcaitech · Oct 9, 2023 · 9dcbf3a · 9dcbf3a
1 parent c033711
commit 9dcbf3a
Show file tree

Hide file tree

Showing 6 changed files with 244 additions and 611 deletions.
diff --git a/docs/source/en/advanced_tutorials/train_gpt_using_hybrid_parallelism.md b/docs/source/en/advanced_tutorials/train_gpt_using_hybrid_parallelism.md
@@ -3,7 +3,7 @@
 Author: Hongxin Liu, Yongbin Li, Mingyan Jiang
 
 **Prerequisite:**
-- [parallellism plugin](../basics/booster_plugins.md)
+- [parallelism plugin](../basics/booster_plugins.md)
 - [booster API](../basics/booster_api.md)
 
 **Example Code**
@@ -138,7 +138,7 @@ def _criterion(outputs, inputs):
     loss = criterion(outputs)
     return loss
 ```
-## Boost GPT-2 Model
+## Boost the GPT-2 Model
 Define a booster with `HybridParallelPlugin`. Based on the configured plugin parameters, the booster will inject one or more parallel strategies into the model. In this example, pipeline parallelism, zero1, and mixed-precision training optimizations are utilized.
 ```python
 booster_kwargs=dict(mixed_precision='fp16')

diff --git a/docs/source/en/advanced_tutorials/train_vit_using_pipeline_parallelism.md b/docs/source/en/advanced_tutorials/train_vit_using_pipeline_parallelism.md
@@ -3,7 +3,7 @@
 Author: Hongxin Liu, Yongbin Li
 
 **Prerequisite:**
-- [parallellism plugin](../basics/booster_plugins.md)
+- [parallelism plugin](../basics/booster_plugins.md)
 - [booster API](../basics/booster_api.md)
 
 **Example Code**
@@ -110,8 +110,8 @@ def _criterion(outputs, inputs):
     loss = criterion(outputs)
     return loss
 ```
-## Boost VIT Model
-We begin by enhancing the model with colossalai's pipeline parallelism strategy. First, we define a `HybridParallelPlugin` object. `HybridParallelPlugin` encapsulates various parallelism strategies in colossalai. You can specify the use of pipeline parallelism by setting three parameters: pp_size, num_microbatches, and microbatch_size. For specific parameter settings, refer to the plugin-related documentation. Then, we initialize the booster with the `HybridParallelPlugin` object.
+## Boost the VIT Model
+We begin to enhance the model with colossalai's pipeline parallelism strategy. Firstly we define a `HybridParallelPlugin` object. `HybridParallelPlugin` encapsulates various parallelism strategies in colossalai. You can specify the use of pipeline parallelism by setting three parameters: `pp_size`, `num_microbatches`, and `microbatch_size`. For specific parameter settings, refer to the plugin-related documentation. Then, we initialize the booster with the `HybridParallelPlugin` object.
 ```python
 plugin = HybridParallelPlugin(
             tp_size=TP_SIZE,
@@ -125,14 +125,14 @@ plugin = HybridParallelPlugin(
 booster_kwargs=dict(mixed_precision='fp16')
 booster = Booster(plugin=plugin, **booster_kwargs)
 ```
-Next, we use booster.boost to inject the features encapsulated by the plugin into the model training components.
+Then, we use booster.boost to inject the features encapsulated by the plugin into the model training components.
 ```python
 model, optimizer, _criterion, train_dataloader, lr_scheduler = booster.boost(
         model=model, optimizer=optimizer, criterion=criterion, dataloader=train_dataloader, lr_scheduler=lr_scheduler
     )
 ```
 ## Training ViT using pipeline
-Finally, we can train the model using pipeline parallelism. First, we define a training function that describes the training process. It's important to note that when using pipeline parallelism, you need to call `booster.execute_pipeline` to perform the model training. This function will invoke the scheduler to manage the model's forward and backward operations.
+Finally, we can train the model using pipeline parallelism. Now, we define a training function that describes the training process. It's important to note that when using pipeline parallelism, you need to call `booster.execute_pipeline` to perform the model training. This function will invoke the scheduler to manage the model's forward and backward operations.
 ```python
 def run_forward_backward(
     model: nn.Module,
@@ -142,10 +142,10 @@ def run_forward_backward(
     booster: Booster,
 ):
 # run pipeline forward backward when enabling pp in hybrid parallel plugin
-output_dict = booster.execute_pipeline(
+    output_dict = booster.execute_pipeline(
     data_iter, model, criterion, optimizer, return_loss=True, return_outputs=True
 )
-loss, outputs = output_dict["loss"], output_dict["outputs"]
+    loss, outputs = output_dict["loss"], output_dict["outputs"]
 
 
 def train_epoch(