Enable gradient checkpointing to be disabled for reward modelling #725

lewtun · 2023-09-01T12:11:03Z

Edit: to be merged after #726

This PR allows users to disable gradient checkpointing for PEFT models in the RewardTrainer. The motivation for this is to bypass the RuntimeError: Expected to mark a variable ready only once. issue noted in #480 at the expense of using more vRAM. More generally, I think it's good when a library allows users to configure settings like these, since there are tradeoffs involved (i.e. computation time for memory).

With this change, it is now possible to train PEFT reward models in multi-GPU contexts as follows:

# The example script didn't work when num_gpus > 1
accelerate launch --multi_gpu --num_processes 4 examples/scripts/reward_trainer.py --use_peft --load_in_8bit --gradient_checkpointing false

When gradient checkpointing is activated the script still fails due to #480 (which requires a deeper-level fix):

# Current behaviour is the same
accelerate launch --multi_gpu --num_processes 4 examples/scripts/reward_trainer.py --use_peft --load_in_8bit --gradient_checkpointing true

Traceback (most recent call last):
  File "/fsx/lewis/git/trl/examples/scripts/reward_trainer.py", line 177, in <module>
    trainer.train()
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 2665, in training_step
    self.accelerator.backward(loss)
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/accelerate/accelerator.py", line 1923, in backward
    loss.backward(**kwargs)
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/fsx/lewis/miniconda/envs/trl/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations.
Parameter at index 93 has been marked as ready twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

Question: this change overrides the default value of the RewardTrainer from true to false which may lead to unexpected behaviour with user code. Should I add a warning in the logger?

Update: I think I can solve the question above with my proposal to have dedicated training arguments per trainer. I'll open a separate PR for that.

lewtun · 2023-09-01T12:16:08Z

examples/scripts/sentiment_tuning.py

@@ -149,7 +150,8 @@ def collator(data):
        task_type="CAUSAL_LM",
    )
    ref_model = None
-    device_map = {"": 0}
+    # Copy the model to each device
+    device_map = {"": Accelerator().local_process_index}


I noticed none of the example scripts actually work in multi-GPU settings unless we add this trick

Makes sense!

HuggingFaceDocBuilderDev · 2023-09-01T12:16:43Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

Thanks for adding this. Just a few small nits, looks generally good to me.

lvwerra · 2023-09-01T14:07:54Z

examples/scripts/reward_trainer.py

 train_dataset = train_dataset.map(
    preprocess_function,
    batched=True,
    num_proc=4,
 )
 train_dataset = train_dataset.filter(
    lambda x: len(x["input_ids_chosen"]) <= script_args.seq_length
-    and len(x["input_ids_rejected"]) <= script_args.seq_length
+    and len(x["input_ids_rejected"]) <= script_args.seq_length,
+    num_proc=4,


why do you change to four?

Because the HHH dataset is quite large and I was bored of waiting :) Will revert it back to 1

examples/scripts/reward_trainer.py

Co-authored-by: Leandro von Werra <[email protected]>

younesbelkada

Thanks a lot for workaround and for adding this! LGTM!

younesbelkada · 2023-09-04T07:29:45Z

examples/scripts/sentiment_tuning.py

@@ -149,7 +150,8 @@ def collator(data):
        task_type="CAUSAL_LM",
    )
    ref_model = None
-    device_map = {"": 0}
+    # Copy the model to each device
+    device_map = {"": Accelerator().local_process_index}


Makes sense!

examples/scripts/reward_trainer.py

Co-authored-by: Younes Belkada <[email protected]>

lewtun · 2023-09-05T08:56:04Z

docs/source/reward_trainer.mdx

@@ -6,29 +6,27 @@ Check out a complete flexible example inside [`examples/scripts`](https://github

 ## Expected dataset format

-The reward trainer expects a very specific format for the dataset. Since the model will be trained to predict which sentence is the most relevant, given two sentences. We provide an example from the [`Anthropic/hh-rlhf`](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset below:
+The [`RewardTrainer`] expects a very specific format for the dataset since the model will be trained on pairs of examples to predict which of the two is preferred. We provide an example from the [`Anthropic/hh-rlhf`](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset below:


@lvwerra @younesbelkada FYI I made a few minor tweaks to the docs since your last review

lewtun · 2023-09-05T08:56:53Z

trl/trainer/training_configs.py

@@ -39,3 +41,9 @@ class RewardConfig(TrainingArguments):
            "help": "The maximum length of the sequences in the batch. This argument is required if you want to use the default data collator."
        },
    )
+    gradient_checkpointing: Optional[bool] = field(
+        default=True,


Here we override the default value of transformers.TrainingArguments to be True for backwards compatibility with how PEFT prepares k-bit models

younesbelkada

Thanks a lot @lewtun !!

…ggingface#725) * Enable gradient checkpointing to be disabled for reward modelling * Update examples/scripts/reward_trainer.py Co-authored-by: Leandro von Werra <[email protected]> * Apply suggestions from code review Co-authored-by: Younes Belkada <[email protected]> * Tidy docs * Remove commas --------- Co-authored-by: Leandro von Werra <[email protected]> Co-authored-by: Younes Belkada <[email protected]>

Enable gradient checkpointing to be disabled for reward modelling

b6a033a

lewtun commented Sep 1, 2023

View reviewed changes

lvwerra approved these changes Sep 1, 2023

View reviewed changes

Update examples/scripts/reward_trainer.py

0f2cbc2

Co-authored-by: Leandro von Werra <[email protected]>

younesbelkada approved these changes Sep 4, 2023

View reviewed changes

lewtun and others added 4 commits September 5, 2023 09:38

Merge branch 'main' into disable-grad-ckpt

e9f05db

Apply suggestions from code review

de062d3

Co-authored-by: Younes Belkada <[email protected]>

Tidy docs

24f882d

Remove commas

3f106af

lewtun commented Sep 5, 2023

View reviewed changes

younesbelkada approved these changes Sep 5, 2023

View reviewed changes

lewtun merged commit 453c4ec into main Sep 6, 2023

lewtun deleted the disable-grad-ckpt branch September 6, 2023 12:08

younesbelkada mentioned this pull request Oct 16, 2023

trl v0.7.2 is causing CUDA out of memory for Mistral Instruct on A100 GPU (on Colab) #876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gradient checkpointing to be disabled for reward modelling #725

Enable gradient checkpointing to be disabled for reward modelling #725

lewtun commented Sep 1, 2023 •

edited

Loading

lewtun Sep 1, 2023

younesbelkada Sep 4, 2023

HuggingFaceDocBuilderDev commented Sep 1, 2023 •

edited

Loading

lvwerra left a comment

lvwerra Sep 1, 2023

lewtun Sep 1, 2023

younesbelkada left a comment

younesbelkada Sep 4, 2023

lewtun Sep 5, 2023

lewtun Sep 5, 2023

younesbelkada left a comment

Enable gradient checkpointing to be disabled for reward modelling #725

Enable gradient checkpointing to be disabled for reward modelling #725

Conversation

lewtun commented Sep 1, 2023 • edited Loading

lewtun Sep 1, 2023

Choose a reason for hiding this comment

younesbelkada Sep 4, 2023

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 1, 2023 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Sep 1, 2023

Choose a reason for hiding this comment

lewtun Sep 1, 2023

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Sep 4, 2023

Choose a reason for hiding this comment

lewtun Sep 5, 2023

Choose a reason for hiding this comment

lewtun Sep 5, 2023

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

lewtun commented Sep 1, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 1, 2023 •

edited

Loading