vLLM CPU Phi 3 mini 128K instruct - OOM issues #5059
-
Hi y'all, I'm trying out vLLM on Phi 3 with no GPU, and I seem to be hitting some OOM issues with the model. These are the configurations that I am running with:
I'm running in docker with 32GB of memory available and 12 CPU cores. I've looked at the memory requirements for the model, and I can't quite fathom how this model is not able to not OOM on me. If I do not set the
It never seems to have enough memory. 🤔 |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
Oh, in case it helps, I am running vLLM from commit: 2ba80bed2732edf42b1014ea4e34757849fc93d0. |
Beta Was this translation helpful? Give feedback.
-
Wow, okay, so an interesting followup. I bumped this down to microsoft/Phi-3-mini-4k-instruct, and it still OOMs with 32GB of RAM available to it. 😂 😭 |
Beta Was this translation helpful? Give feedback.
-
Okay, same issue with commit 8e192ff967b44b186ea02d30e49fddf656fdfe50. Backing off to v0.4.2 and trying again. |
Beta Was this translation helpful? Give feedback.
-
Okay, same issue with version v0.4.2 of vLLM. Any ideas of what to try next? |
Beta Was this translation helpful? Give feedback.
-
I gave the container twice the amount of memory that is given to the KV cache with the |
Beta Was this translation helpful? Give feedback.
Sorry for the delay, this is what I ended up with in my jsonnet k8s template: