-
Notifications
You must be signed in to change notification settings - Fork 12
RDMA network setup for Pytorch Applications
Zhaobo edited this page Jan 9, 2023
·
5 revisions
- CX-5 and driver (ofed 5.4)
- GPU and driver (510)
- linux kernel version (5.15)
-
Install Mellanox ofed driver, check version
ofed_info
- Install Nvidia GPU driver,
apt-get install nvidia-driver-510
- Install Nvidia container runtime, add nvidia repo addr and keyring, and install
apt-get install nvidia-docker2
ibstats
ib_send_bw -a -b -R -d mlx5_2
- change docker default runtime to nvidia, and reload daemon and restart docker, if not done yet
- Mellanox tcpudump special container