Initial kernel changes to support GaLore #1137

matthewdouglas · 2024-03-18T23:46:45Z

This is a draft containing some of the initial changes to support GaLore. So far this covers 2-state optimizers.

Optimizer2State.update_step() now contains an additional argument return_updates. When provided a tensor to hold the updates, they're returned here and p is not changed. Additionally, no weight decay is applied.

Needs tests, feedback welcome.

cc: @TimDettmers @jiaweizzhao @Titus-von-Koeller

github-actions · 2024-03-18T23:50:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2024-03-19T00:28:10Z

bitsandbytes/optim/adamw.py

+                self.prefetch_state(p)
+
+                if "rank" in group:
+                    self.update_step(group, p, gindex, pindex, return_updates=lor_update)


The main addition in this PR is the new return_updates kwarg. This will give us the update from AdamW in lor_update and p.data will not be changed.

Corresponds to this step in Algorithm 1 from the paper:
lor_update = update(lor_grad)

matthewdouglas · 2024-03-19T00:29:41Z

bitsandbytes/optim/adamw.py

+                    self.update_step(group, p, gindex, pindex, return_updates=lor_update)
+
+                    # GaLore Projection Back
+                    p.data.add_(state["projector"].project_back(lor_update))


From Algorithm 1 in the paper:

update = project_back(lor_update) weight.data += update

Titus-von-Koeller · 2024-04-05T10:20:02Z

@matthewdouglas Tim said he could review your work this weekend.

matthewdouglas · 2024-04-07T13:20:58Z

Updated with changes added for 1-state optimizers (Momentum, RMSProp, Adagrad, Lion).

TimDettmers

This looks like a solid straightforward implementation. Good work, @matthewdouglas!

I wanted to overhaul the optimizers since changing or adding implementations is a pain when everything is separated into 1state and 2state optimizers. You probably encountered this too, Matthew. However, probably it is okay to keep it like this for the time being and refactor if we add another change.

Separating the update computation and the actual update in general could also have benefits to implement some new optimizers more easily. But I think we can leave that to future work and favor getting Galore out quickly together with the QLoRA fix.

The last remaining thing would be testing. The original tests are all green, but what would be good is a galore test. The best would be to use the original repo code and test it against the bitsandbytes implementation.

Steps for that would be to add galore-torch to the dev dependencies and only execute the tests when the dependencies are met to prevent other devs from needing to run this if they do not have galore-torch installed.

Otherwise, the tests can mirror the other tests that already exist and check if the gradients are close to each other. For that you can probably just add the original galore and your galore to the dictionary of optimizers in test_optim.py and see if the errors are approximately similar compared to other optimizer comparison.

Let me know if you have any other concerns with this, but I think with a test this is all ready to go.

jiaweizzhao · 2024-09-27T23:46:25Z

Hi @matthewdouglas, thanks for your great effort! I would like to follow up this PR and finalize our integration as soon as possible.

To test galore benchmark easily, I created a new branch: https://github.com/jiaweizzhao/GaLore/tree/bitsandbytes, where I integrated your GaLore implementation into the most recent GaLoreAdamW8bit: https://github.com/jiaweizzhao/GaLore/blob/bitsandbytes/galore_torch/adamw8bit.py

For environments, I installed both your bitsandbytes and modified GaLore locally. However, I tried to test a baseline but the following error comes up:

175 [rank0]: File "/data/home/jwzhao/.conda/envs/galore_new/lib/python3.8/site-packages/torch/optim/optimizer.py", line 484, in wrapper
176 [rank0]: out = func(*args, **kwargs)
177 [rank0]: File "/data/home/jwzhao/.conda/envs/galore_new/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
178 [rank0]: return func(*args, **kwargs)
179 [rank0]: File "/opt/hpcaas/.mounts/fs-0565f60d669b6a2d3/home/jwzhao/projects/bitsandbytes/bitsandbytes/optim/optimizer.py", line 288, in step
180 [rank0]: self.update_step(group, p, gindex, pindex)
181 [rank0]: File "/data/home/jwzhao/.conda/envs/galore_new/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
182 [rank0]: return func(*args, **kwargs)
183 [rank0]: File "/opt/hpcaas/.mounts/fs-0565f60d669b6a2d3/home/jwzhao/projects/bitsandbytes/bitsandbytes/optim/optimizer.py", line 552, in update_step
184 [rank0]: F.optimizer_update_8bit_blockwise(
185 [rank0]: File "/opt/hpcaas/.mounts/fs-0565f60d669b6a2d3/home/jwzhao/projects/bitsandbytes/bitsandbytes/functional.py", line 1789, in optimizer_update_8bit_blockwise
186 [rank0]: and len(str2optimizer8bit_blockwise[optimizer_name]) == 3
187 [rank0]: NameError: name 'str2optimizer8bit_blockwise' is not defined

Do you have any ideas why it occurred? Seems it only happen in this old PR. I also tried latest bitsandbytes with regular GaLore and it works.

matthewdouglas · 2024-10-01T14:13:43Z

Thanks @jiaweizzhao! This indicates there was a problem loading the CUDA library. Were you able to build this part on this branch?

pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=cuda -S .
cmake --build .
pip install -e .

If you try python -m bitsandbytes you may have a better idea of why it did not load.

I plan to follow up shortly and rebase with main!

jiaweizzhao · 2024-10-04T22:48:48Z

Not sure why I can not correctly load CUDA library using your scripts @matthewdouglas. Maybe it is due to the machine I am using. Could you actually try to run a simple galore benchmark on your end? I have packed everything in this branch: https://github.com/jiaweizzhao/GaLore/tree/bitsandbytes. Once installed, you can simply run sh scripts/verify_bitsandbytes/llama_60m_galore_adam8bit_new.sh to verify if the new galore_adam8bit works. Once it works I will made the changes across all optimizers.

jiaweizzhao · 2024-10-04T22:53:40Z

Seems the problem is I have no root access to compile from source (with pip install -e .) on my machine. Another way is if you could give me a complied galore version bitsandbytes package I can also try from my end @matthewdouglas

Titus-von-Koeller · 2024-10-28T15:04:23Z

Seems the problem is I have no root access to compile from source (with pip install -e .) on my machine. Another way is if you could give me a complied galore version bitsandbytes package I can also try from my end @matthewdouglas

@jiaweizzhao Which tool is throwing an error? With conda/mamba you can install things like gcc/cmake/make/ninja in userland and if you want to install them globally (not just in your dev env), you can use condax. Imo, you shouldn't need root with that approach. Let us know if that helped 🤗

Also, if you just want to install from main, without editing the code, just to run the tests, you may do so simply with:

# for Linux
pip install 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-0.44.2.dev0-py3-none-manylinux_2_24_x86_64.whl'

# or for Windows:
pip install 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-0.44.2.dev0-py3-none-win_amd64.whl'

matthewdouglas commented Mar 19, 2024

View reviewed changes

TimDettmers marked this pull request as ready for review July 17, 2024 23:35

TimDettmers reviewed Jul 18, 2024

View reviewed changes

Titus-von-Koeller force-pushed the main branch 2 times, most recently from 9b72679 to 7800734 Compare July 27, 2024 13:16

matthewdouglas added the enhancement New feature or request label Aug 15, 2024

matthewdouglas self-assigned this Aug 15, 2024

matthewdouglas force-pushed the galore branch from 61189fc to 91ea416 Compare October 30, 2024 18:02

matthewdouglas added 6 commits October 30, 2024 14:25

Initial kernel changes for 2-state optimizers to support GaLore

a5f9552

Experimental implementation for bnb.optim.GaLoreAdamW8bit

57de0d6

Cleanup

759c78d

Fix mistake

b2a01ec

One more time

31854da

Support eturn_outputs buffer option for 1-state optimizers

b1fb85b

matthewdouglas force-pushed the galore branch from 91ea416 to b1fb85b Compare October 30, 2024 18:26

matthewdouglas added 2 commits October 30, 2024 15:02

fix wheel build on windows

0c87b70

Introducte GaLoreWrappedParameter to decouple grad from param

7e3b5ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial kernel changes to support GaLore #1137

Initial kernel changes to support GaLore #1137

matthewdouglas commented Mar 18, 2024 •

edited

Loading

github-actions bot commented Mar 18, 2024

matthewdouglas Mar 19, 2024

matthewdouglas Mar 19, 2024

Titus-von-Koeller commented Apr 5, 2024

matthewdouglas commented Apr 7, 2024

TimDettmers left a comment

jiaweizzhao commented Sep 27, 2024 •

edited

Loading

matthewdouglas commented Oct 1, 2024

jiaweizzhao commented Oct 4, 2024

jiaweizzhao commented Oct 4, 2024

Titus-von-Koeller commented Oct 28, 2024 •

edited

Loading

Initial kernel changes to support GaLore #1137

Are you sure you want to change the base?

Initial kernel changes to support GaLore #1137

Conversation

matthewdouglas commented Mar 18, 2024 • edited Loading

github-actions bot commented Mar 18, 2024

matthewdouglas Mar 19, 2024

Choose a reason for hiding this comment

matthewdouglas Mar 19, 2024

Choose a reason for hiding this comment

Titus-von-Koeller commented Apr 5, 2024

matthewdouglas commented Apr 7, 2024

TimDettmers left a comment

Choose a reason for hiding this comment

jiaweizzhao commented Sep 27, 2024 • edited Loading

matthewdouglas commented Oct 1, 2024

jiaweizzhao commented Oct 4, 2024

jiaweizzhao commented Oct 4, 2024

Titus-von-Koeller commented Oct 28, 2024 • edited Loading

matthewdouglas commented Mar 18, 2024 •

edited

Loading

jiaweizzhao commented Sep 27, 2024 •

edited

Loading

Titus-von-Koeller commented Oct 28, 2024 •

edited

Loading