Expected speed up over HuggingFace #662

deepakdalakoti · 2023-08-03T06:08:02Z

deepakdalakoti
Aug 3, 2023

Hi,

As per the vllm homepage the results show a huge speed up over huggingface API. However, my benchmarking results show only a modest improvement over Huggingface (~15% ). I am using Llama-2-7b-hf for testing. I used the code and prompt described here for testing. Is anyone seeing similar results? Are there some settings which should improve the results significantly? I am using the following libraries

CUDA: 11.8
vllm: 0.1.3

deelipvenkat · 2023-08-04T05:30:58Z

deelipvenkat
Aug 4, 2023

same here , not seeing much improvement in latency , although batch inferencing is much faster with vllm compared huggingface.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected speed up over HuggingFace #662

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Expected speed up over HuggingFace #662

deepakdalakoti Aug 3, 2023

Replies: 1 comment

deelipvenkat Aug 4, 2023

deepakdalakoti
Aug 3, 2023

deelipvenkat
Aug 4, 2023