Skip to content

Commit

Permalink
Bump docker images to torch 2.3.1 (#3366)
Browse files Browse the repository at this point in the history
* Revert "Autoresume Validation with Max Duration (#3358)"

This reverts commit f0eae8a.

* bump images to 2.3.1

* bump torchvision
  • Loading branch information
mvpatel2000 authored Jun 5, 2024
1 parent 74a5f78 commit 86dfaf7
Show file tree
Hide file tree
Showing 3 changed files with 24 additions and 24 deletions.
6 changes: 3 additions & 3 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ To install composer, once inside the image, run `pip install mosaicml`.
<!-- BEGIN_PYTORCH_BUILD_MATRIX -->
| Linux Distro | Flavor | PyTorch Version | CUDA Version | Python Version | Docker Tags |
|----------------|----------|-------------------|---------------------|------------------|------------------------------------------------------------------------------------------|
| Ubuntu 20.04 | Base | 2.3.0 | 12.1.1 (Infiniband) | 3.11 | `mosaicml/pytorch:latest`, `mosaicml/pytorch:2.3.0_cu121-python3.11-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.3.0 | 12.1.1 (EFA) | 3.11 | `mosaicml/pytorch:latest-aws`, `mosaicml/pytorch:2.3.0_cu121-python3.11-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.3.0 | cpu | 3.11 | `mosaicml/pytorch:latest_cpu`, `mosaicml/pytorch:2.3.0_cpu-python3.11-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.3.1 | 12.1.1 (Infiniband) | 3.11 | `mosaicml/pytorch:latest`, `mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.3.1 | 12.1.1 (EFA) | 3.11 | `mosaicml/pytorch:latest-aws`, `mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.3.1 | cpu | 3.11 | `mosaicml/pytorch:latest_cpu`, `mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.2.2 | 12.1.1 (Infiniband) | 3.11 | `mosaicml/pytorch:2.2.2_cu121-python3.11-ubuntu20.04` |
| Ubuntu 20.04 | Base | 2.2.2 | 12.1.1 (EFA) | 3.11 | `mosaicml/pytorch:2.2.2_cu121-python3.11-ubuntu20.04-aws` |
| Ubuntu 20.04 | Base | 2.2.2 | cpu | 3.11 | `mosaicml/pytorch:2.2.2_cpu-python3.11-ubuntu20.04` |
Expand Down
32 changes: 16 additions & 16 deletions docker/build_matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.1
IMAGE_NAME: torch-2-3-0-cu121
IMAGE_NAME: torch-2-3-1-cu121
MOFED_VERSION: latest-23.10
NVIDIA_REQUIRE_CUDA_OVERRIDE: cuda>=12.1 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471
brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471
Expand All @@ -21,16 +21,16 @@
PYTHON_VERSION: '3.11'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.3.0
PYTORCH_VERSION: 2.3.1
TAGS:
- mosaicml/pytorch:2.3.0_cu121-python3.11-ubuntu20.04
- mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04
- mosaicml/pytorch:latest
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.18.0
TORCHVISION_VERSION: 0.18.1
- AWS_OFI_NCCL_VERSION: v1.9.1-aws
BASE_IMAGE: nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.1
IMAGE_NAME: torch-2-3-0-cu121-aws
IMAGE_NAME: torch-2-3-1-cu121-aws
MOFED_VERSION: ''
NVIDIA_REQUIRE_CUDA_OVERRIDE: cuda>=12.1 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471
brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471
Expand All @@ -49,27 +49,27 @@
PYTHON_VERSION: '3.11'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.3.0
PYTORCH_VERSION: 2.3.1
TAGS:
- mosaicml/pytorch:2.3.0_cu121-python3.11-ubuntu20.04-aws
- mosaicml/pytorch:2.3.1_cu121-python3.11-ubuntu20.04-aws
- mosaicml/pytorch:latest-aws
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.18.0
TORCHVISION_VERSION: 0.18.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
CUDA_VERSION: ''
IMAGE_NAME: torch-2-3-0-cpu
IMAGE_NAME: torch-2-3-1-cpu
MOFED_VERSION: ''
NVIDIA_REQUIRE_CUDA_OVERRIDE: ''
PYTHON_VERSION: '3.11'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.3.0
PYTORCH_VERSION: 2.3.1
TAGS:
- mosaicml/pytorch:2.3.0_cpu-python3.11-ubuntu20.04
- mosaicml/pytorch:2.3.1_cpu-python3.11-ubuntu20.04
- mosaicml/pytorch:latest_cpu
TARGET: pytorch_stage
TORCHVISION_VERSION: 0.18.0
TORCHVISION_VERSION: 0.18.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
CUDA_VERSION: 12.1.1
Expand Down Expand Up @@ -229,12 +229,12 @@
PYTHON_VERSION: '3.11'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.3.0
PYTORCH_VERSION: 2.3.1
TAGS:
- mosaicml/composer:0.23.0
- mosaicml/composer:latest
TARGET: composer_stage
TORCHVISION_VERSION: 0.18.0
TORCHVISION_VERSION: 0.18.1
- AWS_OFI_NCCL_VERSION: ''
BASE_IMAGE: ubuntu:20.04
COMPOSER_INSTALL_COMMAND: mosaicml[all]==0.23.0
Expand All @@ -245,9 +245,9 @@
PYTHON_VERSION: '3.11'
PYTORCH_NIGHTLY_URL: ''
PYTORCH_NIGHTLY_VERSION: ''
PYTORCH_VERSION: 2.3.0
PYTORCH_VERSION: 2.3.1
TAGS:
- mosaicml/composer:0.23.0_cpu
- mosaicml/composer:latest_cpu
TARGET: composer_stage
TORCHVISION_VERSION: 0.18.0
TORCHVISION_VERSION: 0.18.1
10 changes: 5 additions & 5 deletions docker/generate_build_matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@
import yaml

PRODUCTION_PYTHON_VERSION = '3.11'
PRODUCTION_PYTORCH_VERSION = '2.3.0'
PRODUCTION_PYTORCH_VERSION = '2.3.1'


def _get_torchvision_version(pytorch_version: str):
if pytorch_version == '2.3.0':
return '0.18.0'
if pytorch_version == '2.3.1':
return '0.18.1'
if pytorch_version == '2.2.2':
return '0.17.2'
if pytorch_version == '2.1.2':
Expand All @@ -42,7 +42,7 @@ def _get_cuda_version(pytorch_version: str, use_cuda: bool):
# From https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/
if not use_cuda:
return ''
if pytorch_version == '2.3.0':
if pytorch_version == '2.3.1':
return '12.1.1'
if pytorch_version == '2.2.2':
return '12.1.1'
Expand Down Expand Up @@ -167,7 +167,7 @@ def _write_table(table_tag: str, table_contents: str):


def _main():
python_pytorch_versions = [('3.11', '2.3.0'), ('3.11', '2.2.2'), ('3.10', '2.1.2')]
python_pytorch_versions = [('3.11', '2.3.1'), ('3.11', '2.2.2'), ('3.10', '2.1.2')]
cuda_options = [True, False]
stages = ['pytorch_stage']
interconnects = ['mellanox', 'EFA'] # mellanox is default, EFA needed for AWS
Expand Down

0 comments on commit 86dfaf7

Please sign in to comment.