-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low Bit Optim Instability. #1218
Comments
Regarding "per layer selection", I planned for this feature before but forgot about it. Should be easy to add. Do you have an idea how you want this API to look like? I'm thinking like this optim = AdamW8bit(model.parameters(), exclude_low_bit_optim_params=[model.output.weight]) Regarding Would you be interested in contributing a PR for the 1st feature? |
hi @gau-nernst yes i believe this seems like a decent API. sure i can take it up if you can give some pointers |
We currently have some checks to only apply low-bit optim for certain params ao/torchao/prototype/low_bit_optim/adam.py Lines 42 to 57 in 8c07d22
You can simply add an extra check if the param is not in |
sounds good! i'll create a PR! |
Hi will the torchao low bit optim allow for per layer selection of switching to 32-bit adam for stability? also stablemebedding layers 1. Reference:https://huggingface.co/docs/bitsandbytes/main/en/optimizers#optimize-unstable-parameters
2. Stable Embeddings https://huggingface.co/docs/bitsandbytes/main/en/reference/nn/embeddings#bitsandbytes.nn.StableEmbedding
i see some divergence with torchao optimizer and don't with bitsandbytes
The text was updated successfully, but these errors were encountered: