Build vllm from source with CPU support #6188

parkesorgua · 2024-07-07T12:11:27Z

parkesorgua
Jul 7, 2024

I am trying to run small quantized model on my notebook. As I understood to run on CPU building from source required. I have follwed instructions on https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html. After running VLLM_TARGET_DEVICE=cpu python setup.py install no binary found in build folder only 3 other folders:

bdist.linux-x86_64
lib.linux-x86_64-cpython-39
temp.linux-x86_64-cpython-39

How to build vllm from source to get binary file? OS: Ubuntu 24

akhilreddy0703 · 2024-07-25T09:17:15Z

akhilreddy0703
Jul 25, 2024

I think vLLM still does not support inference for quantized models on CPUs, I'm also looking for the same, I was able to run llama3 8b bf16 model following the instructions from the docs, I did setup a docker instance and able get response from the model.

If you have any info about quantized models running on vLLM, please let me know !!

Thanks

0 replies

Rook1eHan · 2024-08-14T03:15:23Z

Rook1eHan
Aug 14, 2024

check first line of https://docs.vllm.ai/en/latest/getting_started/cpu-installation.html#
vLLm for cpu current only support FP32 and BF16

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build vllm from source with CPU support #6188

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Build vllm from source with CPU support #6188

parkesorgua Jul 7, 2024

Replies: 2 comments

akhilreddy0703 Jul 25, 2024

Rook1eHan Aug 14, 2024

parkesorgua
Jul 7, 2024

akhilreddy0703
Jul 25, 2024

Rook1eHan
Aug 14, 2024