[core
/ DDP
] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs
in SFT & DPO
#211
Job | Run time |
---|---|
3s | |
3s |