Skip to content

Commit

Permalink
fix initial typos (#2150)
Browse files Browse the repository at this point in the history
  • Loading branch information
kashif authored Nov 14, 2023
1 parent 2b53a90 commit b55855a
Show file tree
Hide file tree
Showing 7 changed files with 11 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docs/source/concept_guides/big_model_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ By passing `device_map="auto"`, we tell 🤗 Accelerate to determine automatical
#### `no_split_module_classes`

This parameter will indicate that some of the modules with the name `"Block"` should not be split across different devices. You should set here all blocks that
include a residutal connection of some kind.
include a residual connection of some kind.


#### The `device_map`
Expand Down
4 changes: 2 additions & 2 deletions docs/source/concept_guides/gradient_synchronization.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ their gradients computed, collated, and updated before moving on to the next
batch of data.
When performing gradient accumulation, you accumulate `n` loss gradients and
skip `optimizer.step()` until `n` batches have been reached. As all training
processes only need to sychronize by the time `optimizer.step()` is called,
without any modification to your training step, this neededless inter-process
processes only need to synchronize by the time `optimizer.step()` is called,
without any modification to your training step, this needless inter-process
communication can cause a significant slowdown.

How can you avoid this overhead?
Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage_guides/distributed_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def run_inference(rank, world_size):
One will notice how we have to check the rank to know what prompt to send, which can be a bit tedious.

A user might then also think that with 🤗 Accelerate, using the `Accelerator` to prepare a dataloader for such a task might also be
a simple way to manage this. (To learn more, check out the relvent section in the [Quick Tour](../quicktour#distributed-evaluation))
a simple way to manage this. (To learn more, check out the relevant section in the [Quick Tour](../quicktour#distributed-evaluation))

Can it manage it? Yes. Does it add unneeded extra code however: also yes.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage_guides/explore.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
# Learning how to incorporate 🤗 Accelerate features quickly!

Please use the interactive tool below to help you get started with learning about a particular
feature of 🤗 Accelerate and how to utilize it! It will provide you with a code diff, an explaination
feature of 🤗 Accelerate and how to utilize it! It will provide you with a code diff, an explanation
towards what is going on, as well as provide you with some useful links to explore more within
the documentation!

Expand Down
8 changes: 4 additions & 4 deletions docs/source/usage_guides/megatron_lm.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Do you want to enable Sequence Parallelism? [YES/no]:
What is the Pipeline Parallelism degree/size? [1]:2
What is the number of micro-batches? [1]:2
Do you want to enable selective activation recomputation? [YES/no]:
Do you want to use distributed optimizer which shards optimizer state and gradients across data pralellel ranks? [YES/no]:
Do you want to use distributed optimizer which shards optimizer state and gradients across data parallel ranks? [YES/no]:
What is the gradient clipping value based on global L2 Norm (0 to disable)? [1.0]:
How many GPU(s) should be used for distributed training? [1]:4
Do you wish to use FP16 or BF16 (mixed precision)? [NO/fp16/bf16]: bf16
Expand Down Expand Up @@ -355,8 +355,8 @@ def main():

2. For using the Megatron-LM datasets, a few more changes are required. Dataloaders for these datasets
are available only on rank 0 of each tensor parallel group. As such, there are rank where dataloader won't be
avaiable and this requires tweaks to the training loop. Being able to do all this shows how
felixble and extensible 🤗 Accelerate is. The changes required are as follows.
available and this requires tweaks to the training loop. Being able to do all this shows how
flexible and extensible 🤗 Accelerate is. The changes required are as follows.

a. For Megatron-LM indexed datasets, we need to use `MegatronLMDummyDataLoader`
and pass the required dataset args to it such as `data_path`, `seq_length` etc.
Expand Down Expand Up @@ -547,7 +547,7 @@ The `model(**batch_data)` call return loss(es) averaged across the data parallel
This is fine for most cases wherein pre-training jobs are run using Megatron-LM features and
you can easily compute the `perplexity` using the loss.
For GPT model, returning logits in addition to loss(es) is supported.
These logits aren't gathered across data prallel ranks. Use `accelerator.utils.gather_across_data_parallel_groups`
These logits aren't gathered across data parallel ranks. Use `accelerator.utils.gather_across_data_parallel_groups`
to gather logits across data parallel ranks. These logits along with labels can be used for computing various
performance metrics.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/usage_guides/training_zoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.

# Example Zoo

Below contains a non-exhuastive list of tutorials and scripts showcasing 🤗 Accelerate
Below contains a non-exhaustive list of tutorials and scripts showcasing 🤗 Accelerate

## Official Accelerate Examples:

Expand Down
2 changes: 1 addition & 1 deletion src/accelerate/commands/config/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -451,7 +451,7 @@ def get_cluster_input():

megatron_lm_config[prefix + "use_distributed_optimizer"] = _ask_field(
"Do you want to use distributed optimizer "
"which shards optimizer state and gradients across data pralellel ranks? [YES/no]: ",
"which shards optimizer state and gradients across data parallel ranks? [YES/no]: ",
_convert_yes_no_to_bool,
default=True,
error_message="Please enter yes or no.",
Expand Down

0 comments on commit b55855a

Please sign in to comment.