Why do we need more RAM over VRAM when we have multiple instances of GPUs? #679

amoghmishra-sl · 2023-08-05T18:14:55Z

amoghmishra-sl
Aug 5, 2023

I was benchmarking the inference for LLaMA-2 -7B on n1-standard-16 (8vCPUs, 30 GB RAM) machine with 1xV100 and 2xV100 GPUs. While the model ran smoothly for 1xV100, it ran OOM for 2xV100. How does sharing of model weights happen when using multiple instances of GPU? I assumed the model weights would be distributed across the 2 GPUs but I don't understand why we need more RAM?

Note: 2xV100 ran smoothly after I increased my machine to 16vCPUs, 60 GB RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we need more RAM over VRAM when we have multiple instances of GPUs? #679

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Why do we need more RAM over VRAM when we have multiple instances of GPUs? #679

amoghmishra-sl Aug 5, 2023

Replies: 0 comments

amoghmishra-sl
Aug 5, 2023