Deepspeed zero3 + LoRA: RuntimeError: Only Tensors of floating point and complex dtype can require gradients #2068

bursteratom · 2024-11-16T02:11:35Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I expect deepspeed zero3 and 8-bit LoRA to be compatible and runs without error

Current behaviour

When loading model with deepspeed zero3 and 8-bit LoRA enabled, I ran into the error RuntimeError: Only Tensors of floating point and complex dtype can require gradients :

However, if you use zero3 in tandem with 4-bit qLoRA, or just do full fine-tuning with zero3 enabled, it works fine.

Steps to reproduce

Set up LoRA
enable deepspeed zero3
RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Config yaml

base_model: NousResearch/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: mhenrichsen/alpaca_2k_test
    type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/lora-out

sequence_len: 4096
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
  - embed_tokens
  - lm_head

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>

Possible solution

(Putting on a tinfoil hat) I think it's a bug within axolotl code base as opposed to some deeper issue with deepspeed zero3, seeing as it works with qlora.

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

3.11.10

axolotl branch-commit

main

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

The text was updated successfully, but these errors were encountered:

bursteratom · 2024-11-16T15:24:37Z

this is being worked on via PR#1852 and huggingface/transformers#32943

bursteratom added bug Something isn't working wip labels Nov 16, 2024

bursteratom added the waiting on upstream label Nov 18, 2024

bursteratom mentioned this issue Nov 18, 2024

lora example not working with deepspeed zero3 #1481

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepspeed zero3 + LoRA: RuntimeError: Only Tensors of floating point and complex dtype can require gradients #2068

Deepspeed zero3 + LoRA: RuntimeError: Only Tensors of floating point and complex dtype can require gradients #2068

bursteratom commented Nov 16, 2024

bursteratom commented Nov 16, 2024

Deepspeed zero3 + LoRA: RuntimeError: Only Tensors of floating point and complex dtype can require gradients #2068

Deepspeed zero3 + LoRA: RuntimeError: Only Tensors of floating point and complex dtype can require gradients #2068

Comments

bursteratom commented Nov 16, 2024

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

bursteratom commented Nov 16, 2024