Why do we need more RAM over VRAM when we have multiple instances of GPUs? #679
Unanswered
amoghmishra-sl
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was benchmarking the inference for LLaMA-2 -7B on
n1-standard-16 (8vCPUs, 30 GB RAM)
machine with 1xV100 and 2xV100 GPUs. While the model ran smoothly for 1xV100, it ran OOM for 2xV100. How does sharing of model weights happen when using multiple instances of GPU? I assumed the model weights would be distributed across the 2 GPUs but I don't understand why we need more RAM?Note: 2xV100 ran smoothly after I increased my machine to 16vCPUs, 60 GB RAM.
Beta Was this translation helpful? Give feedback.
All reactions