Alternatives to BitsAndBytes for HF models #1337

Timelessprod · 2024-08-28T12:03:15Z

Timelessprod
Aug 28, 2024

Hello,

I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:

bnb_config: BitsAndBytesConfig = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_storage="float16"
)

*I understood that bnb_4bit_quant_storage is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.

Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?

I looked around on the web but couldn't find anything fitting my needs.

Thank you very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternatives to BitsAndBytes for HF models #1337

{{title}}

Replies: 0 comments

Select a reply

Alternatives to BitsAndBytes for HF models #1337

Timelessprod Aug 28, 2024

Replies: 0 comments

Timelessprod
Aug 28, 2024