v0.32.0: Profilers, new hooks, speedups, and more!
Core
- Utilize shard saving from the
huggingface_hub
rather than our own implementation (#2795) - Refactor logging to use logger in
dispatch_model
(#2855) - The
Accelerator.step
number is now restored when usingsave_state
andload_state
(#2765) - A new profiler has been added allowing users to collect performance metrics during model training and inference, including detailed analysis of execution time and memory consumption. These can then be generated in Chrome's tracing tool. Read more about it here (#2883)
- Reduced import times for doing
import accelerate
and any other major core import by 68%, now should be only slightly longer than doingimport torch
(#2845) - Fixed a bug in
get_backend
and added aclear_device_cache
utility (#2857)
Distributed Data Parallelism
- Introduce DDP communication hooks to have more flexibility in how gradients are communicated across workers, overriding the standard
allreduce
. (#2841) - Make
log_line_prefix_template
optional thenotebook_launcher
(#2888)
FSDP
- If the output directory doesn't exist when using
accelerate merge-weights
, one will be automatically created (#2854) - When merging weights, the default is now
.safetensors
(#2853)
XPU
- Migrate to pytorch's native XPU backend on
torch>=2.4
(#2825) - Add
@require_triton
test decorator and enabletest_dynamo
work on xpu (#2878) - Fixed
load_state_dict
not working onxpu
and refine xpusafetensors
version check (#2879)
XLA
- Added support for XLA Dynamo backends for both training and inference (#2892)
Examples
- Added a new multi-cpu SLURM example using
accelerate launch
(#2902)
Full Changelog
- Use shard saving from huggingface_hub by @SunMarc in #2795
- doc: fix link by @imba-tjd in #2844
- Revert "Slight rename" by @SunMarc in #2850
- remove warning hook addede during dispatch_model by @SunMarc in #2843
- Remove underlines between badges by @novialriptide in #2851
- Auto create dir when merging FSDP weights by @helloworld1 in #2854
- Add DDP Communication Hooks by @yhna940 in #2841
- Refactor logging to use logger in
dispatch_model
by @panjd123 in #2855 - xpu: support xpu backend from stock pytorch (>=2.4) by @dvrogozh in #2825
- Drop torch re-imports in npu and mlu paths by @dvrogozh in #2856
- Default FSDP weights merge to safetensors by @helloworld1 in #2853
- [tests] fix bug in
test_tracking.ClearMLTest
by @faaany in #2863 - [tests] use
torch_device
instead of0
for device check by @faaany in #2861 - [tests] skip bnb-related tests instead of failing on xpu by @faaany in #2860
- Potentially fix tests by @muellerzr in #2862
- [tests] enable XPU backend for
test_zero3_integration
by @faaany in #2864 - Support saving and loading of step while saving and loading state by @bipinKrishnan in #2765
- Add Profiler Support for Performance Analysis by @yhna940 in #2883
- Speed up imports and add a CI by @muellerzr in #2845
- Make
log_line_prefix_template
Optional in Elastic Launcher for Backward Compatibility by @yhna940 in #2888 - Add XLA Dynamo backends for training and inference by @johnsutor in #2892
- Added a MultiCPU SLURM example using Accelerate Launch and MPIRun by @okhleif-IL in #2902
- make more cuda-only tests device-agnostic by @faaany in #2876
- fix mlu device longTensor bugs by @huismiling in #2887
- add
require_triton
and enabletest_dynamo
work on xpu by @faaany in #2878 - fix
load_state_dict
for xpu and refine xpu safetensor version check by @faaany in #2879 - Fix get_backend bug and add clear_device_cache function by @NurmaU in #2857
New Contributors
- @McPatate made their first contribution in #2836
- @imba-tjd made their first contribution in #2844
- @novialriptide made their first contribution in #2851
- @panjd123 made their first contribution in #2855
- @dvrogozh made their first contribution in #2825
- @johnsutor made their first contribution in #2892
- @okhleif-IL made their first contribution in #2902
- @NurmaU made their first contribution in #2857
Full Changelog: v0.31.0...v0.32.0