Alternatives to BitsAndBytes for HF models #1337
Unanswered
Timelessprod
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:
*I understood that
bnb_4bit_quant_storage
is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?
I looked around on the web but couldn't find anything fitting my needs.
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions