Build vllm from source with CPU support #6188
Unanswered
parkesorgua
asked this question in
Q&A
Replies: 2 comments
-
I think vLLM still does not support inference for quantized models on CPUs, I'm also looking for the same, I was able to run llama3 8b bf16 model following the instructions from the docs, I did setup a docker instance and able get response from the model. If you have any info about quantized models running on vLLM, please let me know !! Thanks |
Beta Was this translation helpful? Give feedback.
0 replies
-
check first line of https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html# |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to run small quantized model on my notebook. As I understood to run on CPU building from source required. I have follwed instructions on https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html. After running
VLLM_TARGET_DEVICE=cpu python setup.py install
no binary found in build folder only 3 other folders:How to build vllm from source to get binary file? OS: Ubuntu 24
Beta Was this translation helpful? Give feedback.
All reactions