semi-weekly 8bit lora zero3 check #1852

winglian · 2024-08-22T16:19:44Z

adds a check on our semi-weekly GHA against 8bit lora with ds zero3. Depends on huggingface/transformers#32943 to be merged.

winglian · 2024-10-31T16:12:49Z

this also needs this upstream PR huggingface/transformers#32943

winglian · 2024-11-19T07:13:07Z

this one needs a deeper dive into why the train loss is larger by an order of magnitude

winglian · 2024-12-08T04:45:51Z

I also tried deepspeed 0.16.1, but that is blocked too for gradient accumulation issues, see #2154.

Also trying deepspeed==0.16.1 with grad_accum=1, also results in train/loss of ~13 on L3-3B.
Also tried disabling sample packing, and the loss was still incorrect @ ~13.
Also tried with zero2 and zero1 for 8bit LoRAs, and those were fine.

winglian · 2024-12-12T16:53:47Z

upstream issue opened here bitsandbytes-foundation/bitsandbytes#1451

winglian force-pushed the zero3-8bit-lora branch from ea0d058 to 4e64549 Compare October 30, 2024 18:51

bursteratom added wip and removed wip labels Nov 16, 2024

bursteratom mentioned this pull request Nov 16, 2024

Deepspeed zero3 + LoRA: RuntimeError: Only Tensors of floating point and complex dtype can require gradients #2068

Open

8 tasks

winglian force-pushed the zero3-8bit-lora branch from 958d25c to 97b4529 Compare November 19, 2024 05:45

winglian and others added 4 commits November 19, 2024 00:45

bi-weekly 8bit lora zero3 check

ef60e3e

reduce number of steps

920ea77

zero3 can'y use 8bit optimizer

127953a

monkeypatch for zero3 w 8bit lora

613a217

winglian force-pushed the zero3-8bit-lora branch from 97b4529 to 613a217 Compare November 19, 2024 05:45

winglian added 2 commits November 19, 2024 01:28

remove temp_dir decorator as we're using fixtures now

1ff78d6

fix the monkeypatch

afb8218

winglian added the hold don't merge this yet label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semi-weekly 8bit lora zero3 check #1852

semi-weekly 8bit lora zero3 check #1852

winglian commented Aug 22, 2024

winglian commented Oct 31, 2024

winglian commented Nov 19, 2024

winglian commented Dec 8, 2024

winglian commented Dec 12, 2024

semi-weekly 8bit lora zero3 check #1852

Are you sure you want to change the base?

semi-weekly 8bit lora zero3 check #1852

Conversation

winglian commented Aug 22, 2024

winglian commented Oct 31, 2024

winglian commented Nov 19, 2024

winglian commented Dec 8, 2024

winglian commented Dec 12, 2024