KV cache usage on CPU #7431
Unanswered
akhilreddy0703
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all, @tmm1, @zhouyuan
Can anyone please help me understand the memory utilization for KV cache by the vLLM server ?
I ran a test to take inference using vLLM server on CPU ( as a docker container) with this --env "VLLM_CPU_KVCACHE_SPACE=40"
I've observed the memory usage by KV cache from the server logs
The below image shows the vllm server logs
What is the meaning of GPU KV cache usage here, though I deployed the container instance on only CPU ??
Beta Was this translation helpful? Give feedback.
All reactions