Replies: 1 comment
-
@WoosukKwon can someone give me suggestions to run vllm with multi-gpu ,thanks? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I attempted multi-GPU inference (8 GPU inference on A100) on Llama-13B. I first running $ ray start --head , ray start --address='30.152.83.253:6379 and then modify offline_inference.py code with llm = LLM(model="./models/open_llama_13b", tensor_parallel_size=4), and then run python3 offline_inference.py
i got the flowing output:
INFO worker.py:1452 -- Connecting to existing Ray cluster at address: 30.152.83.253:6379...
INFO worker.py:1636 -- Connected to Ray cluster.
then it blocked , no more log output, how to judge my program is running success? i can not find the generated output result.
Beta Was this translation helpful? Give feedback.
All reactions