Skip to content

vLLM CPU Phi 3 mini 128K instruct - OOM issues #5059

Closed Answered by thealmightygrant
thealmightygrant asked this question in Q&A
Discussion options

You must be logged in to vote

Sorry for the delay, this is what I ended up with in my jsonnet k8s template:

local kvCacheSpace = std.toString(std.round(totalMemory * 0.5)),

Replies: 5 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@anencore94
Comment options

@thealmightygrant
Comment options

Answer selected by thealmightygrant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants