Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCX 1.15 upgrade #9824

Merged
merged 2 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,23 +17,26 @@
#
# The parameters are:
# - CUDA_VER: 11.8.0 by default
# - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - UCX_VER, UCX_CUDA_VER, and UCX_ARCH:
# Used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - ROCKY_VER: Rocky Linux OS version

ARG CUDA_VER=11.8.0
ARG UCX_VER=1.14.0
ARG UCX_VER=1.15.0
ARG UCX_CUDA_VER=11
ARG UCX_ARCH=x86_64
ARG ROCKY_VER=8
FROM nvidia/cuda:${CUDA_VER}-runtime-rockylinux${ROCKY_VER}
ARG UCX_VER
ARG UCX_CUDA_VER
ARG UCX_ARCH

RUN yum update -y && yum install -y wget bzip2 numactl-libs libgomp
RUN ls /usr/lib
RUN mkdir /tmp/ucx_install && cd /tmp/ucx_install && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
tar -xvf *.bz2 && \
rpm -i ucx-$UCX_VER*.rpm && \
rpm -i ucx-cuda-$UCX_VER*.rpm --nodeps && \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,22 +17,25 @@
#
# The parameters are:
# - CUDA_VER: 11.8.0 by default
# - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - UCX_VER, UCX_CUDA_VER, and UCX_ARCH:
# Used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - ROCKY_VER: Rocky Linux OS version

ARG CUDA_VER=11.8.0
ARG UCX_VER=1.14.0
ARG UCX_VER=1.15.0
ARG UCX_CUDA_VER=11
ARG UCX_ARCH=x86_64
ARG ROCKY_VER=8
FROM nvidia/cuda:${CUDA_VER}-runtime-rockylinux${ROCKY_VER}
ARG UCX_VER
ARG UCX_CUDA_VER
ARG UCX_ARCH

RUN yum update -y && yum install -y wget bzip2 rdma-core numactl-libs libgomp libibverbs librdmacm
RUN mkdir /tmp/ucx_install && cd /tmp/ucx_install && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-centos8-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
tar -xvf *.bz2 && \
rpm -i ucx-$UCX_VER*.rpm && \
rpm -i ucx-cuda-$UCX_VER*.rpm --nodeps && \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,24 @@
#
# The parameters are:
# - CUDA_VER: 11.8.0 by default
# - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - UCX_VER, UCX_CUDA_VER, and UCX_ARCH:
# Used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - UBUNTU_VER: 20.04 by default
#

ARG CUDA_VER=11.8.0
ARG UCX_VER=1.14.0
ARG UCX_VER=1.15.0
ARG UCX_CUDA_VER=11
ARG UCX_ARCH=x86_64
ARG UBUNTU_VER=20.04

FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
ARG UCX_VER
ARG UCX_CUDA_VER
ARG UBUNTU_VER
ARG UCX_ARCH

RUN apt-get update && apt-get install -y gnupg2
# https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771
Expand All @@ -41,7 +44,7 @@ RUN CUDA_UBUNTU_VER=`echo "$UBUNTU_VER"| sed -s 's/\.//'` && \
RUN apt update
RUN apt-get install -y wget
RUN mkdir /tmp/ucx_install && cd /tmp/ucx_install && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
apt install -y /tmp/ucx_install/*.deb && \
rm -rf /tmp/ucx_install
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@
# - RDMA_CORE_VERSION: Set to 32.1 to match the rdma-core line in the latest
# released MLNX_OFED 5.x driver
# - CUDA_VER: 11.8.0 by default
# - UCX_VER and UCX_CUDA_VER: these are used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - UCX_VER, UCX_CUDA_VER, and UCX_ARCH:
# Used to pick a package matching a specific UCX version and
# CUDA runtime from the UCX github repo.
# See: https://github.com/openucx/ucx/releases/
# - UBUNTU_VER: 20.04 by default
#
# The Dockerfile first fetches and builds `rdma-core` to satisfy requirements for
Expand All @@ -34,15 +35,17 @@

ARG RDMA_CORE_VERSION=32.1
ARG CUDA_VER=11.8.0
ARG UCX_VER=1.14.0
ARG UCX_VER=1.15.0
ARG UCX_CUDA_VER=11
ARG UCX_ARCH=x86_64
ARG UBUNTU_VER=20.04

# Throw away image to build rdma_core
FROM ubuntu:${UBUNTU_VER} as rdma_core
ARG RDMA_CORE_VERSION
ARG UBUNTU_VER
ARG CUDA_VER
ARG UCX_ARCH

RUN apt-get update && apt-get install -y gnupg2
# https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771
Expand All @@ -61,6 +64,7 @@ RUN tar -xvf *.tar.gz && cd rdma-core*/ && dpkg-buildpackage -b -d
FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
ARG UCX_VER
ARG UCX_CUDA_VER
ARG UCX_ARCH
ARG UBUNTU_VER

RUN mkdir /tmp/ucx_install
Expand All @@ -70,7 +74,7 @@ COPY --from=rdma_core /*.deb /tmp/ucx_install/
RUN apt update
RUN apt-get install -y wget
RUN cd /tmp/ucx_install && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER.tar.bz2 && \
wget https://github.com/openucx/ucx/releases/download/v$UCX_VER/ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
tar -xvf ucx-$UCX_VER-ubuntu$UBUNTU_VER-mofed5-cuda$UCX_CUDA_VER-$UCX_ARCH.tar.bz2 && \
apt install -y /tmp/ucx_install/*.deb && \
rm -rf /tmp/ucx_install
2 changes: 1 addition & 1 deletion jenkins/Dockerfile-blossom.multi
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

ARG CUDA_VER=11.8.0
ARG UBUNTU_VER=20.04
ARG UCX_VER=1.15.0-rc6
ARG UCX_VER=1.15.0
# multi-platform build with: docker buildx build --platform linux/arm64,linux/amd64 <ARGS> on either amd64 or arm64 host
# check available official arm-based docker images at https://hub.docker.com/r/nvidia/cuda/tags (OS/ARCH)
FROM --platform=$TARGETPLATFORM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
Expand Down
5 changes: 3 additions & 2 deletions jenkins/Dockerfile-blossom.ubuntu
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,14 @@

ARG CUDA_VER=11.0.3
ARG UBUNTU_VER=20.04
ARG UCX_VER=1.14.0
ARG UCX_VER=1.15.0
ARG UCX_CUDA_VER=11
FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VER}
ARG CUDA_VER
ARG UBUNTU_VER
ARG UCX_VER
ARG UCX_CUDA_VER
ARG UCX_ARCH=x86_64

# https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771
RUN UB_VER=$(echo ${UBUNTU_VER} | tr -d '.') && \
Expand Down Expand Up @@ -65,7 +66,7 @@ RUN apt install -y inetutils-ping expect wget libnuma1 libgomp1

RUN mkdir -p /tmp/ucx && \
cd /tmp/ucx && \
wget https://github.com/openucx/ucx/releases/download/v${UCX_VER}/ucx-${UCX_VER}-ubuntu${UBUNTU_VER}-mofed5-cuda${UCX_CUDA_VER}.tar.bz2 && \
wget https://github.com/openucx/ucx/releases/download/v${UCX_VER}/ucx-${UCX_VER}-ubuntu${UBUNTU_VER}-mofed5-cuda${UCX_CUDA_VER}-${UCX_ARCH}.tar.bz2 && \
tar -xvf *.bz2 && \
dpkg -i *.deb && \
rm -rf /tmp/ucx
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -648,7 +648,7 @@
https://github.com/openjdk/jdk17/blob/4afbcaf55383ec2f5da53282a1547bac3d099e9d/src/jdk.compiler/share/classes/com/sun/tools/javac/resources/compiler.properties#L1993-L1994
-->
<scala.javac.args>-Xlint:all,-serial,-path,-try,-processing|-Werror</scala.javac.args>
<ucx.version>1.14</ucx.version>
<ucx.version>1.15.0</ucx.version>
<rapids.compressed.artifact>true</rapids.compressed.artifact>
<rapids.default.jar.excludePattern/>
<rapids.default.jar.phase>package</rapids.default.jar.phase>
Expand Down
2 changes: 1 addition & 1 deletion scala2.13/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -648,7 +648,7 @@
https://github.com/openjdk/jdk17/blob/4afbcaf55383ec2f5da53282a1547bac3d099e9d/src/jdk.compiler/share/classes/com/sun/tools/javac/resources/compiler.properties#L1993-L1994
-->
<scala.javac.args>-Xlint:all,-serial,-path,-try,-processing|-Werror</scala.javac.args>
<ucx.version>1.14</ucx.version>
<ucx.version>1.15.0</ucx.version>
<rapids.compressed.artifact>true</rapids.compressed.artifact>
<rapids.default.jar.excludePattern/>
<rapids.default.jar.phase>package</rapids.default.jar.phase>
Expand Down