Adalora: query_key_value.lora_B.default has been marked as ready twice #663

ryzn0518 · 2023-07-05T03:56:46Z

System Info

multiple 3090 GPU, had 4 3090 GPU

transformers == 4.30.2
peft == 0.4.0.dev

 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of
 a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple 
concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not 
change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple 
`checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by 
different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not 
support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not
 change over iterations.
Parameter at index 82 with name base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default has been marked as ready
 twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

when run the WORLD_SIZE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 test.py --ddp_find_unused_parameters=False xxxx, and had pass the parameters ddp_find_unused_parameters=False

model = AutoModel.from_pretrained(
        base_model,
        config=config,
        trust_remote_code=True,
        torch_dtype=torch_dtype,
        device_map=device_map
    )

lora_config = AdaLoraConfig(
    lora_config = AdaLoraConfig(
        init_r=6,
        target_r=4,
        tinit=50,
        tfinal=100,
        deltaT=5,
        beta1=0.3,
        beta2=0.3,

        orth_reg_weight=0.2,
        # lora_alpha=32,
        # lora_dropout=0.05,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
        target_modules=["query_key_value"],
        inference_mode=False,
        r=lora_r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
    )

    lora_model = get_peft_model(glm_model, lora_config)

Expected behavior

work and train success.

The text was updated successfully, but these errors were encountered:

younesbelkada · 2023-07-07T11:07:19Z

Sounds like a similar issue to huggingface/trl#480
Do you use the HF trainer to train your model?

ryzn0518 · 2023-07-08T12:50:33Z

Sounds like a similar issue to lvwerra/trl#480 Do you use the HF trainer to train your model?

Yes, I use the HF model and HF transformers.

younesbelkada · 2023-07-08T18:45:56Z

Perfect, I will properly dig into that beginning of the week if all goes well

github-actions · 2023-08-04T15:03:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Vincent-Li-9701 · 2023-08-18T17:08:29Z

Hi, I have encountered the same issue. Do we have any update / workaround on this? @younesbelkada

github-actions bot closed this as completed Aug 12, 2023

Vincent-Li-9701 mentioned this issue Aug 18, 2023

mlp.c_proj.lora_B.default.weight has been marked as ready twice #527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adalora: query_key_value.lora_B.default has been marked as ready twice #663

Adalora: query_key_value.lora_B.default has been marked as ready twice #663

ryzn0518 commented Jul 5, 2023 •

edited

Loading

younesbelkada commented Jul 7, 2023

ryzn0518 commented Jul 8, 2023 •

edited

Loading

younesbelkada commented Jul 8, 2023

github-actions bot commented Aug 4, 2023

Vincent-Li-9701 commented Aug 18, 2023

Adalora: query_key_value.lora_B.default has been marked as ready twice #663

Adalora: query_key_value.lora_B.default has been marked as ready twice #663

Comments

ryzn0518 commented Jul 5, 2023 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented Jul 7, 2023

ryzn0518 commented Jul 8, 2023 • edited Loading

younesbelkada commented Jul 8, 2023

github-actions bot commented Aug 4, 2023

Vincent-Li-9701 commented Aug 18, 2023

ryzn0518 commented Jul 5, 2023 •

edited

Loading

ryzn0518 commented Jul 8, 2023 •

edited

Loading