Skip to content

Commit

Permalink
[docs] Doc sprint (#3099)
Browse files Browse the repository at this point in the history
* docs sprint

* youtube id

* feedback
  • Loading branch information
stevhliu authored Sep 11, 2024
1 parent 3a670bd commit fc52fa9
Show file tree
Hide file tree
Showing 48 changed files with 344 additions and 254 deletions.
26 changes: 13 additions & 13 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
- local: basic_tutorials/tpu
title: TPU training
- local: basic_tutorials/launch
title: Launching distributed code
title: Launching Accelerate scripts
- local: basic_tutorials/notebook
title: Launching distributed training from Jupyter Notebooks
title: Tutorials
Expand All @@ -34,7 +34,7 @@
- local: usage_guides/profiler
title: Profiler
- local: usage_guides/checkpoint
title: Save and load training states
title: Checkpointing
- local: basic_tutorials/troubleshooting
title: Troubleshoot
- local: usage_guides/training_zoo
Expand All @@ -53,7 +53,7 @@
- local: usage_guides/ddp_comm_hook
title: DDP Communication Hooks
- local: usage_guides/fsdp
title: Fully Sharded Data Parallelism
title: Fully Sharded Data Parallel
- local: usage_guides/megatron_lm
title: Megatron-LM
- local: usage_guides/sagemaker
Expand All @@ -73,7 +73,7 @@
title: How to guides
- sections:
- local: concept_guides/internal_mechanism
title: 🤗 Accelerate's internal mechanism
title: Accelerate's internal mechanism
- local: concept_guides/big_model_inference
title: Loading big models into memory
- local: concept_guides/performance
Expand All @@ -85,39 +85,39 @@
- local: concept_guides/fsdp_and_deepspeed
title: FSDP vs DeepSpeed
- local: concept_guides/low_precision_training
title: How training in low-precision environments is possible (FP8)
title: Low precision training methods
- local: concept_guides/training_tpu
title: TPU best practices
title: Training on TPUs
title: Concepts and fundamentals
- sections:
- local: package_reference/accelerator
title: Accelerator
- local: package_reference/state
title: Stateful configuration classes
title: Stateful classes
- local: package_reference/cli
title: The Command Line
- local: package_reference/torch_wrappers
title: Torch wrapper classes
title: DataLoaders, Optimizers, Schedulers
- local: package_reference/tracking
title: Experiment trackers
- local: package_reference/launchers
title: Distributed launchers
title: Launchers
- local: package_reference/deepspeed
title: DeepSpeed utilities
- local: package_reference/logging
title: Logging
- local: package_reference/big_modeling
title: Working with large models
- local: package_reference/inference
title: Distributed inference with big models
title: Pipeline parallelism
- local: package_reference/kwargs
title: Kwargs handlers
- local: package_reference/fp8
title: FP8 Functionality
title: FP8
- local: package_reference/utilities
title: Utility functions and classes
- local: package_reference/megatron_lm
title: Megatron-LM Utilities
title: Megatron-LM utilities
- local: package_reference/fsdp
title: Fully Sharded Data Parallelism Utilities
title: Fully Sharded Data Parallel utilities
title: "Reference"
27 changes: 13 additions & 14 deletions docs/source/basic_tutorials/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,31 +13,29 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->

# Installation and Configuration
# Installation

Before you start, you will need to setup your environment, install the appropriate packages, and configure 🤗 Accelerate. 🤗 Accelerate is tested on **Python 3.8+**.
Before you start, you will need to setup your environment, install the appropriate packages, and configure Accelerate. Accelerate is tested on **Python 3.8+**.

## Installing 🤗 Accelerate
Accelerate is available on pypi and conda, as well as on GitHub. Details to install from each are below:

🤗 Accelerate is available on pypi and conda, as well as on GitHub. Details to install from each are below:
## pip

### pip

To install 🤗 Accelerate from pypi, perform:
To install Accelerate from pypi, perform:

```bash
pip install accelerate
```

### conda
## conda

🤗 Accelerate can also be installed with conda with:
Accelerate can also be installed with conda with:

```bash
conda install -c conda-forge accelerate
```

### Source
## Source

New features are added every day that haven't been released yet. To try them out yourself, install
from the GitHub repository:
Expand All @@ -56,9 +54,9 @@ cd accelerate
pip install -e .
```

## Configuring 🤗 Accelerate
## Configuration

After installing, you need to configure 🤗 Accelerate for how the current system is setup for training.
After installing, you need to configure Accelerate for how the current system is setup for training.
To do so run the following and answer the questions prompted to you:

```bash
Expand All @@ -70,7 +68,8 @@ To write a barebones configuration that doesn't include options such as DeepSpee
```bash
python -c "from accelerate.utils import write_basic_config; write_basic_config(mixed_precision='fp16')"
```
🤗 Accelerate will automatically utilize the maximum number of GPUs available and set the mixed precision mode.

Accelerate will automatically utilize the maximum number of GPUs available and set the mixed precision mode.

To check that your configuration looks fine, run:

Expand Down Expand Up @@ -99,4 +98,4 @@ An example output is shown below, which describes two GPUs on a single machine w
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
```
```
16 changes: 8 additions & 8 deletions docs/source/basic_tutorials/launch.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->

# Launching your 🤗 Accelerate scripts
# Launching Accelerate scripts

In the previous tutorial, you were introduced to how to modify your current training script to use 🤗 Accelerate.
In the previous tutorial, you were introduced to how to modify your current training script to use Accelerate.
The final version of that code is shown below:

```python
Expand Down Expand Up @@ -69,14 +69,14 @@ Next, you need to launch it with `accelerate launch`.
<Tip warning={true}>

It's recommended you run `accelerate config` before using `accelerate launch` to configure your environment to your liking.
Otherwise 🤗 Accelerate will use very basic defaults depending on your system setup.
Otherwise Accelerate will use very basic defaults depending on your system setup.

</Tip>


## Using accelerate launch

🤗 Accelerate has a special CLI command to help you launch your code in your system through `accelerate launch`.
Accelerate has a special CLI command to help you launch your code in your system through `accelerate launch`.
This command wraps around all of the different commands needed to launch your script on various platforms, without you having to remember what each of them is.

<Tip>
Expand All @@ -101,7 +101,7 @@ CUDA_VISIBLE_DEVICES="0" accelerate launch {script_name.py} --arg1 --arg2 ...
```

You can also use `accelerate launch` without performing `accelerate config` first, but you may need to manually pass in the right configuration parameters.
In this case, 🤗 Accelerate will make some hyperparameter decisions for you, e.g., if GPUs are available, it will use all of them by default without the mixed precision.
In this case, Accelerate will make some hyperparameter decisions for you, e.g., if GPUs are available, it will use all of them by default without the mixed precision.
Here is how you would use all GPUs and train with mixed precision disabled:

```bash
Expand Down Expand Up @@ -129,7 +129,7 @@ accelerate launch -h

<Tip>

Even if you are not using 🤗 Accelerate in your code, you can still use the launcher for starting your scripts!
Even if you are not using Accelerate in your code, you can still use the launcher for starting your scripts!

</Tip>

Expand Down Expand Up @@ -178,7 +178,7 @@ accelerate launch {script_name.py} {--arg1} {--arg2} ...
## Custom Configurations

As briefly mentioned earlier, `accelerate launch` should be mostly used through combining set configurations
made with the `accelerate config` command. These configs are saved to a `default_config.yaml` file in your cache folder for 🤗 Accelerate.
made with the `accelerate config` command. These configs are saved to a `default_config.yaml` file in your cache folder for Accelerate.
This cache folder is located at (with decreasing order of priority):

- The content of your environment variable `HF_HOME` suffixed with `accelerate`.
Expand Down Expand Up @@ -211,7 +211,7 @@ accelerate launch --config_file {path/to/config/my_config_file.yaml} {script_nam
```

## Multi-node training
Multi-node training with 🤗Accelerate is similar to [multi-node training with torchrun](https://pytorch.org/tutorials/intermediate/ddp_series_multinode.html). The simplest way to launch a multi-node training run is to do the following:
Multi-node training with Accelerate is similar to [multi-node training with torchrun](https://pytorch.org/tutorials/intermediate/ddp_series_multinode.html). The simplest way to launch a multi-node training run is to do the following:

- Copy your codebase and data to all nodes. (or place them on a shared filesystem)
- Setup your python packages on all nodes.
Expand Down
7 changes: 2 additions & 5 deletions docs/source/basic_tutorials/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,8 +220,5 @@ To further customize where and how states are saved through [`~Accelerator.save_

Any other stateful items to be stored should be registered with the [`~Accelerator.register_for_checkpointing`] method so they can be saved and loaded. Every object passed to this method to be stored must have a `load_state_dict` and `state_dict` function.

<Note>

If you have [`torchdata>=0.8.0`](https://github.com/pytorch/data/tree/main) installed, you can additionally pass `use_stateful_dataloader=True` into your [`~utils.DataLoaderConfiguration`]. This extends Accelerate's DataLoader classes with a `load_state_dict` and `state_dict` function, and makes it so `Accelerator.save_state` and `Accelerator.load_state` also track how far into the training dataset it has read when persisting the model.

</Note>
> [!TIP]
> If you have [`torchdata>=0.8.0`](https://github.com/pytorch/data/tree/main) installed, you can additionally pass `use_stateful_dataloader=True` into your [`~utils.DataLoaderConfiguration`]. This extends Accelerate's DataLoader classes with a `load_state_dict` and `state_dict` function, and makes it so `Accelerator.save_state` and `Accelerator.load_state` also track how far into the training dataset it has read when persisting the model.
8 changes: 4 additions & 4 deletions docs/source/basic_tutorials/notebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License.
rendered properly in your Markdown viewer.
-->

# Launching Multi-GPU Training from a Jupyter Environment
# Launching distributed training from Jupyter Notebooks

This tutorial teaches you how to fine tune a computer vision model with 🤗 Accelerate from a Jupyter Notebook on a distributed system.
You will also learn how to setup a few requirements needed for ensuring your environment is configured properly, your data has been prepared properly, and finally how to launch training.
Expand All @@ -26,13 +26,13 @@ You will also learn how to setup a few requirements needed for ensuring your env

## Configuring the Environment

Before any training can be performed, a 🤗 Accelerate config file must exist in the system. Usually this can be done by running the following in a terminal and answering the prompts:
Before any training can be performed, a Accelerate config file must exist in the system. Usually this can be done by running the following in a terminal and answering the prompts:

```bash
accelerate config
```

However, if general defaults are fine and you are *not* running on a TPU, 🤗Accelerate has a utility to quickly write your GPU configuration into a config file via [`utils.write_basic_config`].
However, if general defaults are fine and you are *not* running on a TPU, Accelerate has a utility to quickly write your GPU configuration into a config file via [`utils.write_basic_config`].

The following code will restart Jupyter after writing the configuration, as CUDA code was called to perform this.

Expand Down Expand Up @@ -454,7 +454,7 @@ epoch 4: 94.71

And that's it!

Please note that [`notebook_launcher`] ignores the 🤗 Accelerate config file, to launch based on the config use:
Please note that [`notebook_launcher`] ignores the Accelerate config file, to launch based on the config use:

```bash
accelerate launch
Expand Down
4 changes: 2 additions & 2 deletions docs/source/basic_tutorials/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ rendered properly in your Markdown viewer.

# Overview

Welcome to the 🤗 Accelerate tutorials! These introductory guides will help catch you up to speed on working with 🤗 Accelerate.
Welcome to the Accelerate tutorials! These introductory guides will help catch you up to speed on working with Accelerate.
You'll learn how to modify your code to have it work with the API seamlessly, how to launch your script properly,
and more!

These tutorials assume some basic knowledge of Python and familiarity with the PyTorch framework.

If you have any questions about 🤗 Accelerate, feel free to join and ask the community on our [forum](https://discuss.huggingface.co/c/accelerate/18).
If you have any questions about Accelerate, feel free to join and ask the community on our [forum](https://discuss.huggingface.co/c/accelerate/18).
4 changes: 2 additions & 2 deletions docs/source/basic_tutorials/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,8 +204,8 @@ Vastly different GPUs within the same setup can lead to performance bottlenecks.

If none of the solutions and advice here helped resolve your issue, you can always reach out to the community and Accelerate team for help.

- Ask for help on the Hugging Face forums by posting your question in the [🤗 Accelerate category](https://discuss.huggingface.co/c/accelerate/18). Make sure to write a descriptive post with relevant context about your setup and reproducible code to maximize the likelihood that your problem is solved!
- Ask for help on the Hugging Face forums by posting your question in the [Accelerate category](https://discuss.huggingface.co/c/accelerate/18). Make sure to write a descriptive post with relevant context about your setup and reproducible code to maximize the likelihood that your problem is solved!

- Post a question on [Discord](http://hf.co/join/discord), and let the team and the community help you.

- Create an Issue on the 🤗 Accelerate [GitHub repository](https://github.com/huggingface/accelerate/issues) if you think you've found a bug related to the library. Include context regarding the bug and details about your distributed setup to help us better figure out what's wrong and how we can fix it.
- Create an Issue on the Accelerate [GitHub repository](https://github.com/huggingface/accelerate/issues) if you think you've found a bug related to the library. Include context regarding the bug and details about your distributed setup to help us better figure out what's wrong and how we can fix it.
Loading

0 comments on commit fc52fa9

Please sign in to comment.