From 65544d8fe9db999fe9e3d6453f10d20f081b3b96 Mon Sep 17 00:00:00 2001 From: omahs <73983677+omahs@users.noreply.github.com> Date: Fri, 1 Mar 2024 18:19:05 +0100 Subject: [PATCH] [docs] Fix typos (#2490) * fix typos * fix typos * fix typo * fix typos * fix typos * fix typos * fix typo * fix typo --------- Co-authored-by: Zach Mueller --- docs/source/concept_guides/low_precision_training.md | 4 ++-- docs/source/quicktour.md | 3 ++- docs/source/usage_guides/deepspeed.md | 6 +++--- docs/source/usage_guides/local_sgd.md | 2 +- docs/source/usage_guides/low_precision_training.md | 8 ++++---- docs/source/usage_guides/megatron_lm.md | 6 +++--- docs/source/usage_guides/model_size_estimator.md | 4 ++-- src/accelerate/commands/config/config_args.py | 2 +- 8 files changed, 18 insertions(+), 17 deletions(-) diff --git a/docs/source/concept_guides/low_precision_training.md b/docs/source/concept_guides/low_precision_training.md index 79e252d0b95..467b3c40065 100644 --- a/docs/source/concept_guides/low_precision_training.md +++ b/docs/source/concept_guides/low_precision_training.md @@ -34,7 +34,7 @@ MS-AMP O3 | FP8 | FP8 | FP8 | FP16 | FP8 | FP8+FP16 ## `TransformersEngine` -`TransformersEngine` is the first solution to trying to train in 8-bit floating point. It works by using drop-in replacement layers for certain ones in a model that utilize their FP8-engine to reduce the number of bits (such as 32 to 8) without degrading the final accuracy of the model. +`TransformersEngine` is the first solution to trying to train in 8-bit floating point. It works by using drop-in replacement layers for certain ones in a model that utilizes their FP8-engine to reduce the number of bits (such as 32 to 8) without degrading the final accuracy of the model. Specifically, 🤗 Accelerate will find and replace the following layers with `TransformersEngine` versions: @@ -71,4 +71,4 @@ MS-AMP takes a different approach to `TransformersEngine` by providing three dif ## Combining the two -More experiments need to be performed but it's been noted that combining both MS-AMP and TransformersEngine can lead to the highest throughput by relying on NVIDIA's optimized FP8 operators and utilizing how MS-AMP reduces the memory overhead. \ No newline at end of file +More experiments need to be performed but it's been noted that combining both MS-AMP and TransformersEngine can lead to the highest throughput by relying on NVIDIA's optimized FP8 operators and utilizing how MS-AMP reduces the memory overhead. diff --git a/docs/source/quicktour.md b/docs/source/quicktour.md index ac053c5c887..0fb07d1004d 100644 --- a/docs/source/quicktour.md +++ b/docs/source/quicktour.md @@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer. --> @@ -27,6 +27,7 @@ This quicktour introduces the three main features of Accelerate: Accelerate automatically selects the appropriate configuration values for any given distributed training framework (DeepSpeed, FSDP, etc.) through a unified configuration file generated from the [`accelerate config`](../../docs/source/package_reference/cli#accelerate-config) command. You could also pass the configuration values explicitly to the command line which is helpful in certain situations like if you're using SLURM. + But in most cases, you should always run [`accelerate config`](../../docs/source/package_reference/cli#accelerate-config) first to help Accelerate learn about your training setup. ```bash diff --git a/docs/source/usage_guides/deepspeed.md b/docs/source/usage_guides/deepspeed.md index 2d5f7fede15..9c9888cfb48 100644 --- a/docs/source/usage_guides/deepspeed.md +++ b/docs/source/usage_guides/deepspeed.md @@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer. --> @@ -353,7 +353,7 @@ accelerate launch examples/by_feature/deepspeed_with_config_support.py \ ``` **ZeRO++ Config Example** -You can use the the features of ZeRO++ by using the appropriate config parameters. Note that ZeRO++ is an extension for ZeRO Stage 3. Here is how the config file can be modified, from [DeepSpeed's ZeRO++ tutorial](https://www.deepspeed.ai/tutorials/zeropp/): +You can use the features of ZeRO++ by using the appropriate config parameters. Note that ZeRO++ is an extension for ZeRO Stage 3. Here is how the config file can be modified, from [DeepSpeed's ZeRO++ tutorial](https://www.deepspeed.ai/tutorials/zeropp/): ```json { @@ -519,7 +519,7 @@ ValueError: When using `deepspeed_config_file`, the following accelerate config ['gradient_accumulation_steps', 'gradient_clipping', 'zero_stage', 'offload_optimizer_device', 'offload_param_device', 'zero3_save_16bit_model', 'mixed_precision']. Please specify them appropriately in the DeepSpeed config file. -If you are using an accelerate config file, remove others config variables mentioned in the above specified list. +If you are using an accelerate config file, remove other config variables mentioned in the above specified list. The easiest method is to create a new config following the questionnaire via `accelerate config`. It will only ask for the necessary config variables when using `deepspeed_config_file`. ``` diff --git a/docs/source/usage_guides/local_sgd.md b/docs/source/usage_guides/local_sgd.md index 11971519e01..5bee411433d 100644 --- a/docs/source/usage_guides/local_sgd.md +++ b/docs/source/usage_guides/local_sgd.md @@ -88,7 +88,7 @@ achieved by adding one `with LocalSGD` statement and one call `local_sgd.step()` + local_sgd.step() ``` -Under the hood, the Local SGD code **disables** automatic gradient synchornization (but accumulation still works as expected!). Instead it averages model parameters every `local_sgd_steps` steps (as well as in the end of the training loop). +Under the hood, the Local SGD code **disables** automatic gradient synchronization (but accumulation still works as expected!). Instead it averages model parameters every `local_sgd_steps` steps (as well as at the end of the training loop). ## Limitations diff --git a/docs/source/usage_guides/low_precision_training.md b/docs/source/usage_guides/low_precision_training.md index 258cab79bae..c68445892cd 100644 --- a/docs/source/usage_guides/low_precision_training.md +++ b/docs/source/usage_guides/low_precision_training.md @@ -57,7 +57,7 @@ Of the two, `MS-AMP` is traditionally the easier one to configure as there is on Currently two levels of optimization are supported in the 🤗 Accelerate integration, `"O1"` and `"O2"` (using the letter 'o', not zero). * `"O1"` will cast the weight gradients and `all_reduce` communications to happen in 8-bit, while the rest are done in 16 bit. This reduces the general GPU memory usage and speeds up communication bandwidths. -* `"O2"` will also cast first-order optimizer states into 8 bit, while the second order states are in FP16. (Currently just the `Adam` optimizer is supported). This tries it's best to minimize final accuracy degradation and will save the highest potential memory. +* `"O2"` will also cast first-order optimizer states into 8 bit, while the second order states are in FP16. (Currently just the `Adam` optimizer is supported). This tries its best to minimize final accuracy degradation and will save the highest potential memory. To specify an optimization level, pass it to the `FP8KwargsHandler` by setting the `optimization_level` argument: @@ -70,7 +70,7 @@ accelerator = Accelerator(mixed_precision="fp8", kwarg_handlers=kwargs) ## Configuring TransformersEngine -TransformersEngine has much more available for customizing how and what FP8 calculations are performed. A full list of supported arguments and what they mean are available in [NVIDIA's documentation](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html), however they are restated as part of [`FP8KwargsHandler`]'s docstring for your convience. +TransformersEngine has much more available for customizing how and what FP8 calculations are performed. A full list of supported arguments and what they mean are available in [NVIDIA's documentation](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html), however they are restated as part of [`FP8KwargsHandler`]'s docstring for your convenience. 🤗 Accelerate tries to set sensible defaults, but exploring and tweaking the various parameters yourself can lead to better performance potentially. @@ -83,10 +83,10 @@ kwargs = [FP8RecipeKwargs(backend="te", ...)] accelerator = Accelerator(mixed_precision="fp8", kwarg_handlers=kwargs) ``` -## Futher Reading +## Further Reading To learn more about training in FP8 please check out the following resources: * [Our concept guide](../concept_guides/low_precision_training.md) detailing into more about both TransformersEngine and MS-AMP * [The `transformers-engine` documentation](https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html) -* [The `MS-AMP` documentation](https://azure.github.io/MS-AMP/docs/) \ No newline at end of file +* [The `MS-AMP` documentation](https://azure.github.io/MS-AMP/docs/) diff --git a/docs/source/usage_guides/megatron_lm.md b/docs/source/usage_guides/megatron_lm.md index 487edc723e9..eb74a522260 100644 --- a/docs/source/usage_guides/megatron_lm.md +++ b/docs/source/usage_guides/megatron_lm.md @@ -9,7 +9,7 @@ Unless required by applicable law or agreed to in writing, software distributed an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer. --> @@ -542,7 +542,7 @@ megatron_lm_plugin = MegatronLMPlugin(other_megatron_args=other_megatron_args) This covers Decoder only, Encode only and Encoder-Decoder model classes. 2. Only loss is returned from model forward pass as -there is quite complex interplay of pipeline, tensor and data parallelsim behind the scenes. +there is quite complex interplay of pipeline, tensor and data parallelism behind the scenes. The `model(**batch_data)` call return loss(es) averaged across the data parallel ranks. This is fine for most cases wherein pre-training jobs are run using Megatron-LM features and you can easily compute the `perplexity` using the loss. @@ -580,4 +580,4 @@ b. Megatron-LM [GPTModel](https://github.com/NVIDIA/Megatron-LM/blob/main/megatr c. Megatron-LM [T5Model](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/t5_model.py) : 🤗 transformers models with `t5` in config's model type, e.g., [T5](https://huggingface.co/docs/transformers/model_doc/t5) and -[MT5](https://huggingface.co/docs/transformers/model_doc/mt5) \ No newline at end of file +[MT5](https://huggingface.co/docs/transformers/model_doc/mt5) diff --git a/docs/source/usage_guides/model_size_estimator.md b/docs/source/usage_guides/model_size_estimator.md index 70bef1ea54d..4e95b19875e 100644 --- a/docs/source/usage_guides/model_size_estimator.md +++ b/docs/source/usage_guides/model_size_estimator.md @@ -51,7 +51,7 @@ Below are a few gradio demos related to what was described above. The first is t > -A community member has taken the idea and expended it further, allowing you to filter models directly and see if you can run a particular LLM given GPU constraints and LoRA configurations. To play with it, see [here](https://huggingface.co/spaces/Vokturz/can-it-run-llm) for more details. +A community member has taken the idea and expanded it further, allowing you to filter models directly and see if you can run a particular LLM given GPU constraints and LoRA configurations. To play with it, see [here](https://huggingface.co/spaces/Vokturz/can-it-run-llm) for more details. ## The Command @@ -134,4 +134,4 @@ This calculator will tell you how much memory is needed to purely load the model This calculation is accurate within a few % of the actual value, so it is a very good view of just how much memory it will take. For instance loading `bert-base-cased` actually takes `413.68 MB` when loaded on CUDA in full precision, and the calculator estimates `413.18 MB`. When performing inference you can expect to add up to an additional 20% as found by [EleutherAI](https://blog.eleuther.ai/transformer-math/). We'll be conducting research into finding a more accurate estimate to these values, and will update -this calculator once done. \ No newline at end of file +this calculator once done. diff --git a/src/accelerate/commands/config/config_args.py b/src/accelerate/commands/config/config_args.py index 82b9832d4ae..4c52e7c707f 100644 --- a/src/accelerate/commands/config/config_args.py +++ b/src/accelerate/commands/config/config_args.py @@ -45,7 +45,7 @@ def load_config_from_file(config_file): if not os.path.isfile(config_file): raise FileNotFoundError( f"The passed configuration file `{config_file}` does not exist. " - "Please pass an existing file to `accelerate launch`, or use the the default one " + "Please pass an existing file to `accelerate launch`, or use the default one " "created through `accelerate config` and run `accelerate launch` " "without the `--config_file` argument." )