From b2b04e86e71159259333de2f8da85c08a712880d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=B9=9B=E9=9C=B2=E5=85=88=E7=94=9F?= Date: Thu, 2 Jan 2025 18:29:46 +0800 Subject: [PATCH] Fix docs typos. (#35465) Signed-off-by: zhanluxianshen --- docs/source/en/fsdp.md | 2 +- docs/source/zh/fsdp.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/fsdp.md b/docs/source/en/fsdp.md index 6b90ab5ad6d..2c4f114dec8 100644 --- a/docs/source/en/fsdp.md +++ b/docs/source/en/fsdp.md @@ -58,7 +58,7 @@ Otherwise, you can choose a size-based wrapping policy where FSDP is applied to ### Checkpointing -Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`]` method. +Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`] method. ```py # directory containing checkpoints diff --git a/docs/source/zh/fsdp.md b/docs/source/zh/fsdp.md index a322ec81e52..4688b021f74 100644 --- a/docs/source/zh/fsdp.md +++ b/docs/source/zh/fsdp.md @@ -74,7 +74,7 @@ FSDP 是通过包装网络中的每个层来应用的。通常,包装是以嵌 应该使用 `fsdp_state_dict_type: SHARDED_STATE_DICT` 来保存中间检查点, 因为在排名 0 上保存完整状态字典需要很长时间,通常会导致 `NCCL Timeout` 错误,因为在广播过程中会无限期挂起。 -您可以使用 [`~accelerate.Accelerator.load_state`]` 方法加载分片状态字典以恢复训练。 +您可以使用 [`~accelerate.Accelerator.load_state`] 方法加载分片状态字典以恢复训练。 ```py # 包含检查点的目录