Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adalora: query_key_value.lora_B.default has been marked as ready twice #663

Closed
2 of 4 tasks
ryzn0518 opened this issue Jul 5, 2023 · 5 comments
Closed
2 of 4 tasks

Comments

@ryzn0518
Copy link

ryzn0518 commented Jul 5, 2023

System Info

multiple 3090 GPU, had 4 3090 GPU

transformers == 4.30.2
peft == 0.4.0.dev

 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of
 a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple 
concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not 
change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple 
`checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by 
different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not 
support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not
 change over iterations.
Parameter at index 82 with name base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default has been marked as ready
 twice. This means that multiple autograd engine  hooks have fired for this particular parameter during this iteration.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

when run the WORLD_SIZE=1 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 test.py --ddp_find_unused_parameters=False xxxx, and had pass the parameters ddp_find_unused_parameters=False

model = AutoModel.from_pretrained(
        base_model,
        config=config,
        trust_remote_code=True,
        torch_dtype=torch_dtype,
        device_map=device_map
    )
lora_config = AdaLoraConfig(
    lora_config = AdaLoraConfig(
        init_r=6,
        target_r=4,
        tinit=50,
        tfinal=100,
        deltaT=5,
        beta1=0.3,
        beta2=0.3,

        orth_reg_weight=0.2,
        # lora_alpha=32,
        # lora_dropout=0.05,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
        target_modules=["query_key_value"],
        inference_mode=False,
        r=lora_r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
    )

    lora_model = get_peft_model(glm_model, lora_config)

Expected behavior

work and train success.

@younesbelkada
Copy link
Contributor

Sounds like a similar issue to huggingface/trl#480
Do you use the HF trainer to train your model?

@ryzn0518
Copy link
Author

ryzn0518 commented Jul 8, 2023

Sounds like a similar issue to lvwerra/trl#480 Do you use the HF trainer to train your model?

Yes, I use the HF model and HF transformers.

@younesbelkada
Copy link
Contributor

Perfect, I will properly dig into that beginning of the week if all goes well

@github-actions
Copy link

github-actions bot commented Aug 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@Vincent-Li-9701
Copy link

Hi, I have encountered the same issue. Do we have any update / workaround on this? @younesbelkada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants