"torch.bfloat16 is not supported for quantization method awq. Supported dtypes: [torch.float16]" error even after trying dtypr=auto/half/float16/bfloat16 #9116
Unanswered
lavishasharma
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
from vllm import LLM, SamplingParams
llm = LLM(model="TheBloke/Llama-2-7b-Chat-AWQ", device="cpu", dtype="half", quantization="AWQ")
I am also facing the cuda error, "no nivida drivers found". I really wish to mititgate these two errors. I tried everything I could. Please help.
Beta Was this translation helpful? Give feedback.
All reactions