Skip to content

Commit

Permalink
Fix docs typos. (#35465)
Browse files Browse the repository at this point in the history
Signed-off-by: zhanluxianshen <[email protected]>
  • Loading branch information
zhanluxianshen authored Jan 2, 2025
1 parent 6b1e86f commit b2b04e8
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/source/en/fsdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Otherwise, you can choose a size-based wrapping policy where FSDP is applied to

### Checkpointing

Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`]` method.
Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`] method.

```py
# directory containing checkpoints
Expand Down
2 changes: 1 addition & 1 deletion docs/source/zh/fsdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ FSDP 是通过包装网络中的每个层来应用的。通常,包装是以嵌

应该使用 `fsdp_state_dict_type: SHARDED_STATE_DICT` 来保存中间检查点,
因为在排名 0 上保存完整状态字典需要很长时间,通常会导致 `NCCL Timeout` 错误,因为在广播过程中会无限期挂起。
您可以使用 [`~accelerate.Accelerator.load_state`]` 方法加载分片状态字典以恢复训练。
您可以使用 [`~accelerate.Accelerator.load_state`] 方法加载分片状态字典以恢复训练。

```py
# 包含检查点的目录
Expand Down

0 comments on commit b2b04e8

Please sign in to comment.