"torch.bfloat16 is not supported for quantization method awq. Supported dtypes: [torch.float16]" error even after trying dtypr=auto/half/float16/bfloat16 #9116

lavishasharma · 2024-10-07T05:52:41Z

lavishasharma
Oct 7, 2024

from vllm import LLM, SamplingParams
llm = LLM(model="TheBloke/Llama-2-7b-Chat-AWQ", device="cpu", dtype="half", quantization="AWQ")

I am also facing the cuda error, "no nivida drivers found". I really wish to mititgate these two errors. I tried everything I could. Please help.