-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train_loss is 0.0 in 7.0 but works fine on 7.5 and 8.6 #343
Comments
Based on the survey of the similar issues found in internet and our own experiments, the root cause can be traced to that V100 does not support int8 tensorcore, so bitsandbytes(bnb) cannot apply native int8 matrix mult in V100. However, bnb adopts a workaround in this version:
Compare to the native support of int8 mat-mult, this workaround may accumulate larger errors as the fine-tune goes on, thus lead to the unstable loss of either a very huge value or 0. Currently, we have 2 methods to ease the issue:
from transformers import (
AutoModel
BitsAndBytesConfig
)
device_map = "auto"
llm_int8_threshold = 3.5
model = AutoModel.from_pretrained(
base_model,
cache_dir=cache_dir,
load_in_8bit=True,
quantization_config=BitsAndBytesConfig(load_in_8bit=True, llm_int8_threshold=llm_int8_threshold),
torch_dtype=torch.float16,
device_map=device_map,
trust_remote_code=True
)
Bear in mind that both of the above methods cannot solve the issue completely. |
unfortunately what you suggested did not work, I am still getting nan values |
You can also the optimizer |
check the above solution. It shall be fixed already. |
For training, I get loss the first time it is logged, but from second time the loss is 0. |
Hi, do you solve the problem now? @adibMosharrof I encounter with the similar issue. I even can not load the model (BLOOM) into 8*V100. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Hello,
I have the same python environment in different machines, but when I run my code in the machine with Tesla V100-SXM2-32GB GPU, which has compute cability of 7.0, I get a train_loss of 0.0.
On machines with 7.5 (Nvidia Titan RTX) and 8.6 (RTX 3090) the train_loss is not 0.0.
I used
pip install bitsandbytes==0.38.0
I had to manually provide the fix from #300 into
bitsandbytes/cuda_setup/main.py
.It is mentioned that in v0.37.0, all GPUs are supported.
#240 also talks about train loss becoming 0.0
I am using peft to do some work, and that has led me here. I also have an issue in peft, which is
huggingface/peft#334
and a sample code that shows what I am doing is in this notebook
https://colab.research.google.com/drive/16qKy92cGoNPWrlQ4zlvntVGeSgjrknVF?usp=sharing
Here is the output of bitsandbytes
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /project/msi290_uksr/generative_tod/myenv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so
CUDA SETUP: CUDA runtime path found: /project/msi290_uksr/generative_tod/myenv/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 116
The text was updated successfully, but these errors were encountered: