-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core
/ DDP
] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs
in SFT & DPO
#912
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG! One minor comment.
examples/scripts/reward_modeling.py
Outdated
@@ -103,7 +103,7 @@ class ScriptArguments: | |||
|
|||
# Step 2: Load the dataset and pre-process it | |||
tokenizer = AutoTokenizer.from_pretrained(args.model_name) | |||
train_dataset = load_dataset(args.dataset_name, split="train") | |||
train_dataset = load_dataset(args.dataset_name, split="train[:50]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be not hardcoded? Maybe like split=args.split
core
/ DDP
] Fix RM trainer + DDP + quantization core
/ DDP
] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs
core
/ DDP
] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs
core
/ DDP
] Fix RM trainer + DDP + quantization + propagate gradient_checkpointing_kwargs
in SFT & DPO
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
* fix bug: generate_args-do_sample * fix gradient_checkpointing_kwargs bug see: huggingface/trl#912 and huggingface/transformers#26969 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…dient_checkpointing_kwargs` in SFT & DPO (huggingface#912) * make use of forward hooks * correctly delete attributes * fix RM DPP issues * revert unneeded changes * more fixes * fix diff * fix * propagate to SFT * Update examples/scripts/reward_modeling.py * propagate the fix on DPO trainer * add to example scripts * trigger CI
Needs : huggingface/peft#1036 / huggingface/transformers#27020
Fixes: #891
Fixes: #835
To avoid issues with PEFT + DDP, we need to call the gradient checkpointing method with
use_reentrant=False
that you can pass over the argumentgradient_checkpointing_kwargs
directly in trainer with huggingface/transformers#27020 .For users that do not have the correct transformers version they need to update transformers with the correct version to get that feature, otherwise
gradient_checkpointing_kwargs
will get ignored.cc @lvwerra