Expected speed up over HuggingFace #662
Unanswered
deepakdalakoti
asked this question in
Q&A
Replies: 1 comment
-
same here , not seeing much improvement in latency , although batch inferencing is much faster with vllm compared huggingface. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
As per the vllm homepage the results show a huge speed up over huggingface API. However, my benchmarking results show only a modest improvement over Huggingface (~15% ). I am using
Llama-2-7b-hf
for testing. I used the code and prompt described here for testing. Is anyone seeing similar results? Are there some settings which should improve the results significantly? I am using the following librariesCUDA: 11.8
vllm: 0.1.3
Beta Was this translation helpful? Give feedback.
All reactions