RuntimeError: probability tensor contains either inf
, nan
or element < 0
#3337
Replies: 2 comments
-
Sounds to me like an 8-bit quantization problem. If you have sufficient GPU memory please try loading in 16 bit. Could you try if the error also occurs without any padding (for single element batches you don't need to pad the input) .. if you pad you need to use left padding! Please try if using |
Beta Was this translation helpful? Give feedback.
-
I assume 16-bit is the default quantization. If so, I'm trying to load it in 8-bit because I run into CUDA out-of-memory errors when trying to load up the model under my current configurations otherwise. When I loaded up this exact model with the exact same cloud GPU with 8-bit (along with splitting the model across my GPU and CPU) with text-generation-webui (a GUI used for running LLMs like OpenAssistant), it worked fine, but I suppose it's different when trying to build an API. If your other suggestions don't work out, I might have to switch GPUs after all.. |
Beta Was this translation helpful? Give feedback.
-
Hey. So I'm trying to make an OpenAssistant API, in order to use OpenAssistant as a fallback for a chatbot I'm trying to make (I'm using IBM Watson for the chatbot for what it's worth). To do so, I'm trying to get the Pythia 12B model (OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5) up and running on a cloud GPU on Google Cloud. I'm using a NVIDIA L4 GPU, and the machine I'm using has 16 vCPUs and 64 GB memory.
Here's the current code I have for my API right now
I also created a file to test the API to see if it's working, it can be seen below
The logs I'm getting for the error can be found below:
I have tried to debug what's going on by printing the values of both my "input_ids" and "attention_mask" tensors, as demonstrated from a snippet of my API code
The output I get is
Now I don't think that the mins and maxes of either of my tensors should be the same, nor should the values be strictly '0' or '1', so I'm led to believe that something is wrong when transferring my values to the GPU. If anyone can please help me out, I would gladly appreciate it!
Beta Was this translation helpful? Give feedback.
All reactions