-
Hello, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
See perf result here #2192. In certain cases, the custom topology drastically boosts performance compared to nccl's implementation. vLLM still uses nccl in majority of cases. |
Beta Was this translation helpful? Give feedback.
-
In a set-up where 4 GPUs are connected by PCIe, but each pair of GPUs are connected by NVLink (112 GB/s bi-directional). Is there a way to specify a reduction first on each pairwise bound set of GPUs before reducing across the slower PCIe link? |
Beta Was this translation helpful? Give feedback.
See perf result here #2192. In certain cases, the custom topology drastically boosts performance compared to nccl's implementation. vLLM still uses nccl in majority of cases.