-
Notifications
You must be signed in to change notification settings - Fork 69
Bluefog Docker Test Notes
$ sudo docker run -it --gpus all bluefog:latest
root@Gyes:/examples# horovodrun -np 4 -H localhost:4 python ./blue-fog-examples/pytorch_cifar10_resnet.py {--no-bluefog}
Add additional argument --no-bluefog to disable bluefog and use horovod instead.
WARNING: you have to use privileged mode to run the docker, otherwise all win_ops would not be able to execute correctly.
For easier testing in the docker environment, it is better to mount the host directory into docker container. To build the test docker image (you may not need to run unless it is the firt time):
$ sudo docker build -t bluefog_gpu:devel . -f dockerfile.gpu.test
Running the following command under root folder to mount the bluefog folder:
$ sudo docker run --privileged -it --gpus all --name bluefog_gpu_devtest \
--mount type=bind,source="$(pwd)",target=/bluefog bluefog_gpu:devel
Remember to remove the devtest container if you need it
$ sudo docker container rm bluefog_gpu_devtest
The following error may pop up when running a docker container with GPUs.
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
In order to properly run docker with GPUs, Nvidia container runtime needs to be installed using following commands for Ubuntu.
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
$ sudo apt-get update
$ sudo apt-get install nvidia-container-runtime
More details can be found on https://github.com/NVIDIA/nvidia-container-runtime and https://nvidia.github.io/nvidia-container-runtime.
It takes a similar approach like GPU version:
$ sudo docker build -t bluefog_cpu:devel . -f dockerfile.cpu.test
Running the following command under root folder to mount the bluefog folder:
$ sudo docker run --privileged -it --mount type=bind,source="$(pwd)",target=/bluefog \
--name bluefog_cpu_devtest bluefog_cpu:devel
Remember to remove the devtest container if you need it
$ sudo docker container rm bluefog_cpu_devtest
In slave:
$ sudo docker run -it --gpus all --privileged --network=host -v /mnt/share/ssh:/root/.ssh \
--mount type=bind,source="$(pwd)",target=/bluefog bluefog_gpu:devel
$ /usr/sbin/sshd -p 40000
In master:
$ sudo docker run -it --gpus all --privileged --network=host -v /mnt/share/ssh:/root/.ssh \
--mount type=bind,source="$(pwd)",target=/bluefog bluefog_gpu:devel
$ ssh labg -p 40000 date
$ bfrun -np 8 -H localhost:4,labg:4 -p 40000 python examples/pytorch_mnist.py {--epochs=3} {--no-bluefog}