Skip to content

Commit

Permalink
address comments
Browse files Browse the repository at this point in the history
  • Loading branch information
pacman100 committed Oct 12, 2023
1 parent 6618083 commit 8afc304
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/usage_guides/fsdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ all-gather while executing in the forward pass. only use with Static graphs.
Useful in cases such as parameter-efficient fine-tuning.
Please refer this [blog](https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019)
`CPU RAM Efficient Model loading`: If True, only the first process loads the pretrained model checkoint while all other processes have empty weights. Only applicable for 🤗 Transformers models. When using this, `Sync Module States` needs to True else all the processes expect the main process would have random empty weights leading to unexpected behaviour during training.
`CPU RAM Efficient Model loading`: If True, only the first process loads the pretrained model checkoint while all other processes have empty weights. Only applicable for 🤗 Transformers models. This should be set to False if experience errors when loading the pretrained model via `from_pretrained`. When using this, `Sync Module States` needs to be True else all the processes expect the main process would have random empty weights leading to unexpected behaviour during training.
`Sync Module States`: If True, each individually wrapped FSDP unit will broadcast module parameters from rank 0
```
Expand Down

0 comments on commit 8afc304

Please sign in to comment.