Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add torch 2.1.0 #2602

Merged
merged 9 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions composer/optim/decoupled_weight_decay.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,14 @@ class DecoupledSGDW(SGD):
nesterov (bool, optional): Enables Nesterov momentum updates. Default: ``False``.
"""

def __init__(self,
params: Union[Iterable[torch.Tensor], Iterable[dict]],
lr: float = required,
momentum: float = 0,
dampening: float = 0,
weight_decay: float = 0,
nesterov: bool = False):
def __init__(
self,
params: Union[Iterable[torch.Tensor], Iterable[dict]],
lr: float = required, # type: ignore
momentum: float = 0,
dampening: float = 0,
weight_decay: float = 0,
nesterov: bool = False):
if weight_decay >= 1e-3:
log.warning(
f'You are using a high value of `weight_decay={weight_decay}` for the `DecoupledSGDW` optimizer. Are you sure you want to do this? '
Expand Down
2 changes: 1 addition & 1 deletion composer/trainer/dist_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,7 @@ def _auto_wrap_policy_old(module: torch.nn.Module, recurse: bool, unwrapped_para
fsdp_obj = FullyShardedDataParallel(
obj,
sharding_strategy=sharding_strategy,
auto_wrap_policy=_auto_wrap_policy,
auto_wrap_policy=_auto_wrap_policy, # type: ignore FSDP type bug
cpu_offload=cpu_offload,
mixed_precision=mixed_precision,
backward_prefetch=backward_prefetch,
Expand Down
81 changes: 6 additions & 75 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,6 @@ ARG PYTORCH_VERSION=1.13.1
# version that corresponds to the PyTorch version
ARG TORCHVISION_VERSION=0.14.1

# The Torchtext version to install.
# Reference https://github.com/pytorch/text#installation to determine the Torchtext
# version that corresponds to the PyTorch version
ARG TORCHTEXT_VERSION=0.14.1

# In the Dockerimage, Pillow-SIMD is installed instead of Pillow. To trick pip into thinking that
# Pillow is also installed (so it won't override it with a future pip install), a Pillow stub is included
# PILLOW_PSEUDOVERSION is the Pillow version that pip thinks is installed
Expand All @@ -54,26 +49,12 @@ ARG IPYTHON_VERSION='>=8.10.0'
# Upgrade urllib to resolve CVE-2021-33503
ARG URLLIB3_VERSION='>=1.26.5,<2'

########################
# Vision Image Arguments
########################

# Build the vision image on the pytorch stage
ARG VISION_BASE=pytorch_stage

# Pip version strings of dependencies to install
ARG MMCV_VERSION='==1.4.8'
ARG OPENCV_VERSION='>=4.5.5.64,<4.6'
ARG NUMBA_VERSION='>=0.55.0,<0.56'
ARG MMSEGMENTATION_VERSION='>=0.22.0,<0.23'
ARG CUPY_VERSION='>=10.2.0'

##########################
# Composer Image Arguments
##########################

# Build the composer image on the vision image
ARG COMPOSER_BASE=vision_stage
# Build the composer image on the pytorch image
ARG COMPOSER_BASE=pytorch_stage

# The command that is passed to `pip install` -- e.g. `pip install "${COMPOSER_INSTALL_COMMAND}"`
ARG COMPOSER_INSTALL_COMMAND='mosaicml[all]'
Expand Down Expand Up @@ -213,9 +194,8 @@ ARG PYTORCH_VERSION
ARG PYTORCH_NIGHTLY_URL
ARG PYTORCH_NIGHTLY_VERSION
ARG TORCHVISION_VERSION
ARG TORCHTEXT_VERSION

# Set so supporting PyTorch packages such as Torchvision, Torchaudio, Torchtext pin PyTorch version
# Set so supporting PyTorch packages such as Torchvision pin PyTorch version
ENV PYTORCH_VERSION=${PYTORCH_VERSION}
ENV PYTORCH_NIGHTLY_URL=${PYTORCH_NIGHTLY_URL}
ENV PYTORCH_NIGHTLY_VERSION=${PYTORCH_NIGHTLY_VERSION}
Expand All @@ -224,13 +204,11 @@ RUN if [ -z "$PYTORCH_NIGHTLY_URL" ] ; then \
CUDA_VERSION_TAG=$(python${PYTHON_VERSION} -c "print('cu' + ''.join('${CUDA_VERSION}'.split('.')[:2]) if '${CUDA_VERSION}' else 'cpu')") && \
pip${PYTHON_VERSION} install --no-cache-dir --find-links https://download.pytorch.org/whl/torch_stable.html \
torch==${PYTORCH_VERSION}+${CUDA_VERSION_TAG} \
torchvision==${TORCHVISION_VERSION}+${CUDA_VERSION_TAG} \
torchtext==${TORCHTEXT_VERSION} ; \
torchvision==${TORCHVISION_VERSION}+${CUDA_VERSION_TAG} ; \
else \
pip${PYTHON_VERSION} install --no-cache-dir --pre --index-url ${PYTORCH_NIGHTLY_URL} \
torch==${PYTORCH_VERSION}.${PYTORCH_NIGHTLY_VERSION} \
torchvision==${TORCHVISION_VERSION}.${PYTORCH_NIGHTLY_VERSION} \
torchtext ; \
torchvision==${TORCHVISION_VERSION}.${PYTORCH_NIGHTLY_VERSION} ; \
fi
RUN
#####################################
Expand Down Expand Up @@ -316,10 +294,7 @@ RUN pip${PYTHON_VERSION} install --no-cache-dir cmake==3.26.3
###########################
# Install Pandoc Dependency
###########################
# Pandoc is needed for the documentation build and is a nuisance to install via pip so just installing via dpkg
RUN wget https://github.com/jgm/pandoc/releases/download/2.19.2/pandoc-2.19.2-1-amd64.deb && \
dpkg -i pandoc-2.19.2-1-amd64.deb && \
rm pandoc-2.19.2-1-amd64.deb
RUN pip${PYTHON_VERSION} install --no-cache-dir pandoc==2.3

################################
# Use the correct python version
Expand Down Expand Up @@ -370,50 +345,6 @@ RUN pip install --no-cache-dir --upgrade \
urllib3${URLLIB3_VERSION}


######################
# PyTorch Vision Image
######################

FROM ${VISION_BASE} AS vision_stage
ARG DEBIAN_FRONTEND=noninteractive

RUN sudo apt-get update && \
sudo apt-get install -y --no-install-recommends \
# For FFCV:
pkg-config \
libturbojpeg-dev \
libopencv-dev \
# For deeplabv3:
ffmpeg \
libsm6 \
libxext6 && \
sudo apt-get autoclean && \
sudo apt-get clean && \
sudo rm -rf /var/lib/apt/lists/*

ARG MMCV_VERSION
ARG OPENCV_VERSION
ARG NUMBA_VERSION
ARG MMSEGMENTATION_VERSION
ARG PYTHON_VERSION
ARG CUPY_VERSION
ARG CUDA_VERSION

RUN CUDA_VERSION_TAG=$(python${PYTHON_VERSION} -c "print('cu' + ''.join('${CUDA_VERSION}'.split('.')[:2]) if '${CUDA_VERSION}' else 'cpu')") && \
MMCV_TORCH_VERSION=$(python -c "print('torch' + ''.join('${PYTORCH_VERSION}'.split('.')[:2]) + '.0')") && \
sudo pip${PYTHON_VERSION} install --no-cache-dir \
"ffcv<1.0.3" \
"opencv-python${OPENCV_VERSION}" \
"numba${NUMBA_VERSION}" \
"mmsegmentation${MMSEGMENTATION_VERSION}" && \
sudo pip${PYTHON_VERSION} install --no-cache-dir \
"mmcv-full${MMCV_VERSION}" \
-f https://download.openmmlab.com/mmcv/dist/${CUDA_VERSION_TAG}/${MMCV_TORCH_VERSION}/index.html && \
if [ -n "$CUDA_VERSION" ] ; then \
sudo pip${PYTHON_VERSION} install --no-cache-dir cupy-cuda11x${CUPY_VERSION}; \
fi


################
# Composer Image
################
Expand Down
24 changes: 11 additions & 13 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,19 +29,17 @@ The base flavor contains PyTorch pre-installed; the vision flavor also includes
To install composer, once inside the image, run `pip install mosaicml`.

<!-- BEGIN_PYTORCH_BUILD_MATRIX -->
| Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags |
|----------------|----------|-------------------|---------------------|------------------|---------------------------------------------------------------------------------------------------|
| Ubuntu 20.04 | Base | 2.1.0 | 12.1.0 (Infiniband) | 3.10 | `mosaicml/pytorch:2.1.0_cu121-nightly20230827-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.0.1 | 11.8.0 (Infiniband) | 3.10 | `mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.0.1 | 11.8.0 (EFA) | 3.10 | `mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.0.1 | cpu | 3.10 | `mosaicml/pytorch:2.0.1_cpu-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 1.13.1 | 11.7.1 (Infiniband) | 3.10 | `mosaicml/pytorch:latest`, `mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 1.13.1 | 11.7.1 (EFA) | 3.10 | `mosaicml/pytorch:latest-aws`, `mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 1.13.1 | cpu | 3.10 | `mosaicml/pytorch:latest_cpu`, `mosaicml/pytorch:1.13.1_cpu-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Vision | 2.0.1 | 11.8.0 (Infiniband) | 3.10 | `mosaicml/pytorch_vision:2.0.1_cu118-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Vision | 2.0.1 | cpu | 3.10 | `mosaicml/pytorch_vision:2.0.1_cpu-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Vision | 1.13.1 | 11.7.1 (Infiniband) | 3.10 | `mosaicml/pytorch_vision:latest`, `mosaicml/pytorch_vision:1.13.1_cu117-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Vision | 1.13.1 | cpu | 3.10 | `mosaicml/pytorch_vision:latest_cpu`, `mosaicml/pytorch_vision:1.13.1_cpu-python3.10-ubuntu20.04` |
| Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags |
|----------------|----------|-------------------|---------------------|------------------|-------------------------------------------------------------------------------------------|
| Ubuntu 20.04 | Base | 2.1.0 | 12.1.0 (Infiniband) | 3.10 | `mosaicml/pytorch:2.1.0_cu121-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.1.0 | 12.1.0 (EFA) | 3.10 | `mosaicml/pytorch:2.1.0_cu121-python3.10-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.1.0 | cpu | 3.10 | `mosaicml/pytorch:2.1.0_cpu-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.0.1 | 11.8.0 (Infiniband) | 3.10 | `mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.0.1 | 11.8.0 (EFA) | 3.10 | `mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.0.1 | cpu | 3.10 | `mosaicml/pytorch:2.0.1_cpu-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 1.13.1 | 11.7.1 (Infiniband) | 3.10 | `mosaicml/pytorch:latest`, `mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04` |
| Ubuntu 20.04 | Base | 1.13.1 | 11.7.1 (EFA) | 3.10 | `mosaicml/pytorch:latest-aws`, `mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 1.13.1 | cpu | 3.10 | `mosaicml/pytorch:latest_cpu`, `mosaicml/pytorch:1.13.1_cpu-python3.10-ubuntu20.04` |
<!-- END_PYTORCH_BUILD_MATRIX -->

**Note**: The `mosaicml/pytorch:latest`, `mosaicml/pytorch:latest_cpu`,`mosaicml/pytorch_vision:latest` and `mosaicml/pytorch_vision:latest_cpu`
j316chuck marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
113 changes: 36 additions & 77 deletions docker/build_matrix.yaml
Original file line number Diff line number Diff line change
@@ -1,73 +1,81 @@
# This file is automatically generated by generate_build_matrix.py. DO NOT EDIT!
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 11.8.0
IMAGE_NAME: torch-2-0-1-cu118
BASE_IMAGE: nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.0
IMAGE_NAME: torch-2-1-0-cu121
MOFED_VERSION: 5.5-1.0.3.2
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.0.1
PYTORCH_VERSION: 2.1.0
TAGS:
- mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04
- mosaicml/pytorch:2.1.0_cu121-python3.10-ubuntu20.04
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.15.2
TORCHVISION_VERSION: 0.15.2
TORCHVISION_VERSION: 0.16.0
- AWS_OFI_NCCL_VERSION: v1.5.0-aws
BASE_IMAGE: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 11.8.0
IMAGE_NAME: torch-2-0-1-cu118-aws
BASE_IMAGE: nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.0
IMAGE_NAME: torch-2-1-0-cu121-aws
MOFED_VERSION: ''
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.0.1
PYTORCH_VERSION: 2.1.0
TAGS:
- mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04-aws
- mosaicml/pytorch:2.1.0_cu121-python3.10-ubuntu20.04-aws
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.15.2
TORCHVISION_VERSION: 0.15.2
TORCHVISION_VERSION: 0.16.0
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
CUDA_VERSION: ''
IMAGE_NAME: torch-2-1-0-cpu
MOFED_VERSION: ''
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.1.0
TAGS:
- mosaicml/pytorch:2.1.0_cpu-python3.10-ubuntu20.04
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.16.0
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 11.8.0
IMAGE_NAME: torch-vision-2-0-1-cu118
IMAGE_NAME: torch-2-0-1-cu118
MOFED_VERSION: 5.5-1.0.3.2
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.0.1
TAGS:
- mosaicml/pytorch_vision:2.0.1_cu118-python3.10-ubuntu20.04
TARGET: vision_stage
TORCHTEXT_VERSION: 0.15.2
- mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.15.2
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
CUDA_VERSION: ''
IMAGE_NAME: torch-2-0-1-cpu
- AWS_OFI_NCCL_VERSION: v1.5.0-aws
BASE_IMAGE: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 11.8.0
IMAGE_NAME: torch-2-0-1-cu118-aws
MOFED_VERSION: ''
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.0.1
TAGS:
- mosaicml/pytorch:2.0.1_cpu-python3.10-ubuntu20.04
- mosaicml/pytorch:2.0.1_cu118-python3.10-ubuntu20.04-aws
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.15.2
TORCHVISION_VERSION: 0.15.2
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
CUDA_VERSION: ''
IMAGE_NAME: torch-vision-2-0-1-cpu
IMAGE_NAME: torch-2-0-1-cpu
MOFED_VERSION: ''
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.0.1
TAGS:
- mosaicml/pytorch_vision:2.0.1_cpu-python3.10-ubuntu20.04
TARGET: vision_stage
TORCHTEXT_VERSION: 0.15.2
- mosaicml/pytorch:2.0.1_cpu-python3.10-ubuntu20.04
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.15.2
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
Expand All @@ -82,7 +90,6 @@
- mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04
- mosaicml/pytorch:latest
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
- AWS_OFI_NCCL_VERSION: v1.5.0-aws
BASE_IMAGE: nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
Expand All @@ -97,22 +104,6 @@
- mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04-aws
- mosaicml/pytorch:latest-aws
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 11.7.1
IMAGE_NAME: torch-vision-1-13-1-cu117
MOFED_VERSION: 5.5-1.0.3.2
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 1.13.1
TAGS:
- mosaicml/pytorch_vision:1.13.1_cu117-python3.10-ubuntu20.04
- mosaicml/pytorch_vision:latest
TARGET: vision_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
Expand All @@ -127,37 +118,7 @@
- mosaicml/pytorch:1.13.1_cpu-python3.10-ubuntu20.04
- mosaicml/pytorch:latest_cpu
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
CUDA_VERSION: ''
IMAGE_NAME: torch-vision-1-13-1-cpu
MOFED_VERSION: ''
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 1.13.1
TAGS:
- mosaicml/pytorch_vision:1.13.1_cpu-python3.10-ubuntu20.04
- mosaicml/pytorch_vision:latest_cpu
TARGET: vision_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.0
IMAGE_NAME: torch-nightly-2-1-0-20230827-cu121
MOFED_VERSION: 5.5-1.0.3.2
PYTHON_VERSION: '3.10'
PYTORCH_NIGHTLY_URL: https://download.pytorch.org/whl/nightly/cu121
PYTORCH_NIGHTLY_VERSION: dev20230827+cu121
PYTORCH_VERSION: 2.1.0
TAGS:
- mosaicml/pytorch:2.1.0_cu121-nightly20230827-python3.10-ubuntu20.04
TARGET: pytorch_stage
TORCHTEXT_VERSION: 0.16.0
TORCHVISION_VERSION: 0.16.0
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04
COMPOSER_INSTALL_COMMAND: mosaicml[all]==0.16.3
Expand All @@ -172,7 +133,6 @@
- mosaicml/composer:0.16.3
- mosaicml/composer:latest
TARGET: composer_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
Expand All @@ -188,5 +148,4 @@
- mosaicml/composer:0.16.3_cpu
- mosaicml/composer:latest_cpu
TARGET: composer_stage
TORCHTEXT_VERSION: 0.14.1
TORCHVISION_VERSION: 0.14.1
Loading
Loading