Does vllm support CPU? #999

gokul427 · 2023-09-09T13:19:48Z

gokul427
Sep 9, 2023

Can we use vllm only on CPU without GPU machine?

mspronesti · 2023-09-30T22:06:54Z

mspronesti
Sep 30, 2023

I think the short answer is no, as vLLM's engine relies on custom kernels written in CUDA.

0 replies

BrightXiaoHan · 2023-10-27T02:26:52Z

BrightXiaoHan
Oct 27, 2023

You can try ctranslate2 or llama.cpp.

1 reply

VpkPrasanna Jan 3, 2024

obviously i know this library has been designed on top of cuda kernels for better GPU utilization , but the inference will pretty slow right on CPU? is there any workaround to load these quantized models on CPU to server faster ?
#2326

hughesadam87 · 2024-02-08T18:23:31Z

hughesadam87
Feb 8, 2024

This was not clear to me either - any way to highlight this in bold somewhere on main docs? Sorry if I overlooked.

I am trying to do some local testing - that's my use case

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does vllm support CPU? #999

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Does vllm support CPU? #999

gokul427 Sep 9, 2023

Replies: 3 comments · 1 reply

mspronesti Sep 30, 2023

BrightXiaoHan Oct 27, 2023

VpkPrasanna Jan 3, 2024

hughesadam87 Feb 8, 2024

gokul427
Sep 9, 2023

Replies: 3 comments 1 reply

mspronesti
Sep 30, 2023

BrightXiaoHan
Oct 27, 2023

hughesadam87
Feb 8, 2024