-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deepcopy fails after accelerate==0.23.0 #2248
deepcopy fails after accelerate==0.23.0 #2248
Comments
Thanks for reporting. I can confirm that this fails with the current accelerate but works with (btw I tried removing all the PEFT-related code and got exactly the same error, so the snippet could be simplified for debugging purposes) |
Thanks @BenjaminBossan, I simplified the repro as you suggested. Please let me know if there is anything else I can do. |
@BenjaminBossan I found this related issue with an accelerate patch to workaround the issue. Does this help in identifying the root cause? |
Another related issue: huggingface/transformers#26801 |
Hi @prathikr, thanks for reporting. The breaking change is due to #1971. Before this PR, the hooks were not copied properly, meaning that they were still referencing to the forward of the original model. Hence, it looked like the model was copied properly. To make it work, we need to implement a import copy
from transformers import AutoTokenizer, AutoConfig, AutoModelForSequenceClassification, BitsAndBytesConfig
import torch
model_name_or_path = "meta-llama/Llama-2-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(
model_name_or_path,
max_seq_length=512,
pad_to_max_length=True,
)
tokenizer.pad_token = tokenizer.eos_token
config = AutoConfig.from_pretrained(
model_name_or_path,
num_labels=1,
finetuning_task="text-classification",
)
config.pad_token_id = config.eos_token_id
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="fp4",
bnb_4bit_compute_dtype=torch.float16
)
model = AutoModelForSequenceClassification.from_pretrained(
model_name_or_path,
config=config,
load_in_4bit=True,
quantization_config=nf4_config,
torch_dtype=torch.float16,
)
print(model.model.layers[0].self_attn.q_proj.weight.quant_state)
copied_module = copy.deepcopy(model.model.layers[0].self_attn.q_proj.weight)
print(copied_module.quant_state) |
Hey! Thanks for ccing me on this. Sure, I can put implementing a How critical is this fix? |
I had a quick chat with @SunMarc since this is the first issue about this in 2 months, we said that it's not critical (let me know if anyone disagrees). I'll put it on my list and try to add deepcopy on the bnb side in the next weeks. |
@Titus-von-Koeller within the next few weeks sounds good, I will check back in 2-3 weeks. |
@Titus-von-Koeller any updates on this bug? |
Hi @prathikr, I was mostly off the last two weeks, partly due to illness. This was among the things that got delayed and right now I'm catching up on stuff and doing some high impact work around FSDP that takes prio. Have this on my todo for next week. It's on my list, so I won't miss it. |
@Titus-von-Koeller no problem, thank you for the update. |
@Titus-von-Koeller a few others on my team have encountered this issue, any updates on a resolution? |
@SunMarc @Titus-von-Koeller any updates on this bug? |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Expected behavior
For accelerate versions beyond 0.23.0, model copies via
copy.deepcopy()
are failing to copy over the quant_state model parameter during QLoRA-enabled training. Thecopy.deepcopy()
operation is used for ONNX export when finetuningmeta-llama/Llama-2-7b-hf
using ONNX Runtime Training. Provided is a stand-alone script to reproduce the error.The text was updated successfully, but these errors were encountered: