Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Abhishek-TAMU · 2024-10-03T01:31:13Z

What does this PR do?

This PR adds cu_seq_lens_q, cu_seq_lens_k, max_length_k, max_length_q to the batch in DataCollatorForCompletionOnlyLM. This, together with a PR in transformers (link to be added), removes graph breaks in padding-free tuning, allowing for maximum performance to be obtained.
Specifically, these parameters should be generated here (this PR change), outside of the transformers loop, as they incur a cpu-gpu sync that is unavoidable. Otherwise, this cpu-gpu sync happens here, inside the attention call which causes graph breaks and hence the transformers PR removes this call to remove all graph breaks when torch_compile flag is turned on in Training arguments to use in SFTTrainer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU · 2024-10-04T18:42:18Z

CC: @kashif @qgallouedec

HuggingFaceDocBuilderDev · 2024-10-06T19:35:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Abhishek <[email protected]>

qgallouedec · 2024-10-10T08:54:50Z

Hi, thanks for the PR.
Can you provide the link of the PR in transformers? Is it huggingface/transformers#33932?

qgallouedec · 2024-10-10T08:58:59Z

Could you provide a simple test to:

Confirm that it is a case of non-functioning.
Verify that this addition resolves it.

It might also be helpful to add a few comments, as these lines are unclear without context.

…llator_batch

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU · 2024-10-30T23:40:22Z

Thank you @qgallouedec for the review. This is the related transformers PR which is approved and merged.

I added 2 test cases. One where Tuning fails with Padding and another where doesn't fail without padding.

Abhishek-TAMU · 2024-11-06T21:01:18Z

@kashif @qgallouedec Could you possibly review this PR ? Thank you!

Abhishek-TAMU · 2024-11-12T17:09:53Z

Hi @kashif @qgallouedec, could you please take another look at this PR when you get the chance? The changes in this PR are urgent for making torch_compile flag in SFTTrainer work for Llama models (LlamaForCausalLM). This is important for users who need to compile the Llama model using SFTTrainer (in padding_free mode) without any graph breaks. Thank you!

Signed-off-by: Abhishek <[email protected]>

qgallouedec · 2024-11-19T13:39:44Z

tests/test_sft_trainer.py

+        formatted_dataset = lambda example: {
+            "output": f"### prompt:\n{example['prompt'].strip()}\n\n### completion:\n{example['completion'].strip()}{tokenizer.eos_token}"
+        }


is dataset formatting required here, or can we drop it?

dataset formatting is required because the SFTTrainer and DataCollatorForCompletionOnlyLM expect the dataset to have a specific format—a single text field that combines both the prompt and the completion in a way the model can understand. This function includes both the prompt and completion, ensuring the data collator can correctly identify where the completion starts using the response_template.

qgallouedec · 2024-11-19T13:40:53Z

tests/test_sft_trainer.py

@@ -654,6 +654,50 @@ def test_data_collator_completion_lm_with_multiple_text(self):
            result_text = tokenizer.decode(batch["input_ids"][i, last_pad_idx + 1 :])
            self.assertEqual(result_text, "I have not been masked correctly.")

+    def test_data_collator_completion_lm_without_padding(self):
+        os.environ["CUDA_VISIBLE_DEVICES"]="0"


Does the issue only occur with cuda device? In other words can we reproduce on cpu?

Due to usage of flash_attention_2 it would work only on GPU.

qgallouedec · 2024-11-19T15:13:12Z

Hey @Abhishek-TAMU, to keep you posted with the current status of the PR, I am struggling reproducing the initial error. Do you have a MRE by any chance? The code from the unittest gives

...
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 134, in __init__
    assert isinstance(
AssertionError: expected FunctionType found _lru_cache_wrapper <functools._lru_cache_wrapper object at 0x7f000e67fb60>

from user code:
   File "/fsx/qgallouedec/transformers/src/transformers/models/llama/modeling_llama.py", line 1224, in torch_dynamo_resume_in_forward_at_1199
    loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

  0%|          | 0/2 [00:07<?, ?it/s]

and it doesn't seem related

Abhishek-TAMU · 2024-11-19T22:31:19Z

Thank you @qgallouedec for looking into this. Sharing you the code which would produce graph break.
Using latest release version of transformers (which doesn't have huggingface/transformers#33932 changes) and latest trl including changes from this PR.

If this change huggingface/transformers#33932 is used in transformers then Graph break could be avoided.

import os, tempfile, torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer)
from trl import SFTConfig, SFTTrainer
from trl.trainer import DataCollatorForCompletionOnlyLM
from datasets import load_dataset

standard_prompt_completion_dataset = load_dataset(
    "trl-internal-testing/zen", "standard_prompt_completion"
)

os.environ["CUDA_VISIBLE_DEVICES"]="0"
os.environ["CUDA_HOME"]="/home/tuning/.local/cuda-12.1"
model_id = "trl-internal-testing/tiny-random-LlamaForCausalLM"
torch_dtype = getattr(torch, "bfloat16", None)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch_dtype, attn_implementation="flash_attention_2")
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

formatted_dataset = lambda example: {
"output": f"### prompt:\n{example['prompt'].strip()}\n\n### completion:\n{example['completion'].strip()}{tokenizer.eos_token}"
}

train_dataset = standard_prompt_completion_dataset["train"].map(formatted_dataset)

response_template = "### completion:\n"
data_collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer, padding_free=True)

with tempfile.TemporaryDirectory() as tmp_dir:
    training_args = SFTConfig(
        output_dir=tmp_dir,
        dataloader_drop_last=True,
        max_steps=2,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=1,
        save_steps=2,
        learning_rate=1e-5,
        dataset_text_field="output",
        torch_compile=True,
        torch_compile_backend="inductor",
        torch_compile_mode="default"
    )

    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=train_dataset,
        data_collator=data_collator,
        args=training_args,
    )

    # with assertRaises(Exception):
    trainer.train()
    del os.environ["CUDA_VISIBLE_DEVICES"]

Abhishek-TAMU · 2024-11-21T18:18:54Z

Hi @qgallouedec, were you able to reproduce the initial error with this MRE ?

Abhishek-TAMU added 2 commits October 2, 2024 14:18

feat: Add info to batch in DataCollatorForCompletionOnlyLM

4472501

Signed-off-by: Abhishek <[email protected]>

fix: formatting

6cfa171

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU changed the title ~~Add Sequence Lengths to Batch in DataCollatorForCompletionOnlyLM~~ Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM Oct 3, 2024

Abhishek-TAMU force-pushed the collator_batch branch from 6cfa171 to a27b3a2 Compare October 3, 2024 15:41

Abhishek-TAMU marked this pull request as ready for review October 3, 2024 15:42

Abhishek-TAMU mentioned this pull request Oct 3, 2024

Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned huggingface/transformers#33932

Merged

5 tasks

Abhishek-TAMU force-pushed the collator_batch branch from a27b3a2 to 4b34ec3 Compare October 4, 2024 18:42

kashif added ✨ enhancement New feature or request 🏋 SFT Related to SFT labels Oct 6, 2024

Abhishek-TAMU added 2 commits October 8, 2024 14:13

feat: Add info to batch in DataCollatorForCompletionOnlyLM

a821ce0

Signed-off-by: Abhishek <[email protected]>

fix: formatting

fb669b6

Signed-off-by: Abhishek <[email protected]>

Abhishek-TAMU force-pushed the collator_batch branch from f0afdb2 to fb669b6 Compare October 8, 2024 18:13

qgallouedec added 🐛 bug Something isn't working and removed ✨ enhancement New feature or request labels Oct 10, 2024

Abhishek-TAMU and others added 9 commits October 14, 2024 13:00

Merge branch 'huggingface:main' into collator_batch

f4b1955

Merge branch 'collator_batch' of github.com:Abhishek-TAMU/trl into co…

1b7c060

…llator_batch

Merge branch 'main' into collator_batch

c3578f8

fix: max_length_k to int

e83fc8a

Signed-off-by: Abhishek <[email protected]>

fix:Added comments

68554b1

Signed-off-by: Abhishek <[email protected]>

Merge remote-tracking branch 'trl/main' into collator_batch

2a7dd47

test cases

b0a52e2

Signed-off-by: Abhishek <[email protected]>

test cases

054a6ef

Signed-off-by: Abhishek <[email protected]>

test cases

376ad21

Signed-off-by: Abhishek <[email protected]>

Merge remote-tracking branch 'trl/main' into collator_batch

9a08ea3

Abhishek-TAMU added 9 commits November 12, 2024 17:17

feat: Add info to batch in DataCollatorForCompletionOnlyLM

a97045b

Signed-off-by: Abhishek <[email protected]>

fix: formatting

f31a780

Signed-off-by: Abhishek <[email protected]>

feat: Add info to batch in DataCollatorForCompletionOnlyLM

29ba8a3

Signed-off-by: Abhishek <[email protected]>

test cases

d1441e1

Signed-off-by: Abhishek <[email protected]>

test cases

d55a6e2

Signed-off-by: Abhishek <[email protected]>

test cases

7dccc2d

Signed-off-by: Abhishek <[email protected]>

unit test changes

5e5224e

Signed-off-by: Abhishek <[email protected]>

unit test changes

1b434b0

Signed-off-by: Abhishek <[email protected]>

Merge remote-tracking branch 'trl/main' into collator_batch

ef1e304

qgallouedec assigned kashif Nov 19, 2024

kashif approved these changes Nov 19, 2024

View reviewed changes

qgallouedec reviewed Nov 19, 2024

View reviewed changes

qgallouedec and others added 2 commits November 19, 2024 13:47

style

77894b1

Merge branch 'main' into collator_batch

911f60c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Abhishek-TAMU commented Oct 3, 2024 •

edited

Loading

Abhishek-TAMU commented Oct 4, 2024

HuggingFaceDocBuilderDev commented Oct 6, 2024

qgallouedec commented Oct 10, 2024 •

edited

Loading

qgallouedec commented Oct 10, 2024

Abhishek-TAMU commented Oct 30, 2024

Abhishek-TAMU commented Nov 6, 2024

Abhishek-TAMU commented Nov 12, 2024 •

edited

Loading

qgallouedec Nov 19, 2024

Abhishek-TAMU Nov 19, 2024

qgallouedec Nov 19, 2024

Abhishek-TAMU Nov 19, 2024

qgallouedec commented Nov 19, 2024

Abhishek-TAMU commented Nov 19, 2024

Abhishek-TAMU commented Nov 21, 2024

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Are you sure you want to change the base?

Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM #2158

Conversation

Abhishek-TAMU commented Oct 3, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Abhishek-TAMU commented Oct 4, 2024

HuggingFaceDocBuilderDev commented Oct 6, 2024

qgallouedec commented Oct 10, 2024 • edited Loading

qgallouedec commented Oct 10, 2024

Abhishek-TAMU commented Oct 30, 2024

Abhishek-TAMU commented Nov 6, 2024

Abhishek-TAMU commented Nov 12, 2024 • edited Loading

qgallouedec Nov 19, 2024

Choose a reason for hiding this comment

Abhishek-TAMU Nov 19, 2024

Choose a reason for hiding this comment

qgallouedec Nov 19, 2024

Choose a reason for hiding this comment

Abhishek-TAMU Nov 19, 2024

Choose a reason for hiding this comment

qgallouedec commented Nov 19, 2024

Abhishek-TAMU commented Nov 19, 2024

Abhishek-TAMU commented Nov 21, 2024

Abhishek-TAMU commented Oct 3, 2024 •

edited

Loading

qgallouedec commented Oct 10, 2024 •

edited

Loading

Abhishek-TAMU commented Nov 12, 2024 •

edited

Loading