Skip to content

Commit

Permalink
update doc to use PJRT_DEVICE=CUDA instead of PJRT_DEVICE=GPU (#5754)
Browse files Browse the repository at this point in the history
* update doc to use PJRT_DEVICE=CUDA instead of PJRT_DEVICE=GPU

* add warning message.

* fix comment and test failure.

* skip dynamic shape model test on cuda.
  • Loading branch information
vanbasten23 authored Nov 4, 2023
1 parent a93e14b commit f01cdb6
Show file tree
Hide file tree
Showing 10 changed files with 32 additions and 13 deletions.
2 changes: 2 additions & 0 deletions .circleci/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@ function run_torch_xla_python_tests() {
if [ -x "$(command -v nvidia-smi)" ]; then
# These tests fail on CUDA with 03/30 TF-pin update (https://github.com/pytorch/xla/pull/4840)
PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
# TODO(xiowei replace gpu with cuda): remove the test below with PJRT_DEVICE=GPU because PJRT_DEVICE=GPU is being deprecated.
PJRT_DEVICE=GPU python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --auto_wrap_policy type_based --use_small_fake_sample --num_epochs=1
XLA_DISABLE_FUNCTIONALIZATION=1 PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
# Syncfree SGD optimizer tests
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ To run the tests, follow __one__ of the options below:
* Run on GPU:

```Shell
export PJRT_DEVICE=GPU GPU_NUM_DEVICES=${NUM_GPU}
export PJRT_DEVICE=CUDA GPU_NUM_DEVICES=${NUM_GPU}
```

For more detail on configuring the runtime, please refer to [this doc](https://github.com/pytorch/xla/blob/master/docs/pjrt.md#quickstart)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ If you're using `DistributedDataParallel`, make the following changes:
Additional information on PyTorch/XLA, including a description of its semantics
and functions, is available at [PyTorch.org](http://pytorch.org/xla/). See the
[API Guide](API_GUIDE.md) for best practices when writing networks that run on
XLA devices (TPU, GPU, CPU and...).
XLA devices (TPU, CUDA, CPU and...).

Our comprehensive user guides are available at:

Expand Down
2 changes: 1 addition & 1 deletion configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ variables:
PJRT_DEVICE:
description:
- Indicates which device is being used with PJRT. It can be either CPU,
TPU, or GPU
TPU, or CUDA
type: string
PJRT_SELECT_DEFAULT_DEVICE:
description:
Expand Down
2 changes: 1 addition & 1 deletion docs/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/cuda/117/torch_xl
In order to run below examples, you need to clone the pytorch/xla repo to access the imagenet example(We already clone it in our docker).

```
(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=GPU
(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=CUDA
(pytorch) root@20ab2c7a2d06:/# git clone --recursive https://github.com/pytorch/xla.git
(pytorch) root@20ab2c7a2d06:/# python xla/test/test_train_mp_imagenet.py --fake_data
==> Preparing data..
Expand Down
12 changes: 6 additions & 6 deletions docs/pjrt.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,17 +196,17 @@ for more information.

### Single-node GPU training

To use GPUs with PJRT, simply set `PJRT_DEVICE=GPU` and configure
To use GPUs with PJRT, simply set `PJRT_DEVICE=CUDA` and configure
`GPU_NUM_DEVICES` to the number of devices on the host. For example:

```
PJRT_DEVICE=GPU GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
PJRT_DEVICE=CUDA GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
```

You can also use `torchrun` to initiate the single-node multi-GPU training. For example,

```
PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
PJRT_DEVICE=CUDA torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
```

In the above example, `--nnodes` means how many machines (physical machines or VMs) to be used (it is 1 since we do single-node training). `--nproc-per-node` means how many GPU devices to be used.
Expand All @@ -216,7 +216,7 @@ In the above example, `--nnodes` means how many machines (physical machines or V
**Note that this feature only works for cuda 12+**. Similar to how PyTorch uses multi-node training, you can run the command as below:

```
PJRT_DEVICE=GPU torchrun \
PJRT_DEVICE=CUDA torchrun \
--nnodes=${NUMBER_GPU_VM} \
--node_rank=${CURRENT_NODE_RANK} \
--nproc_per_node=${NUMBER_LOCAL_GPU_DEVICES} \
Expand All @@ -231,7 +231,7 @@ PJRT_DEVICE=GPU torchrun \
For example, if you want to train on 2 GPU machines: machine_0 and machine_1, on the first GPU machine machine_0, run

```
# PJRT_DEVICE=GPU torchrun \
# PJRT_DEVICE=CUDA torchrun \
--nnodes=2 \
--node_rank=0 \
--nproc_per_node=4 \
Expand All @@ -241,7 +241,7 @@ For example, if you want to train on 2 GPU machines: machine_0 and machine_1, on
On the second GPU machine, run

```
# PJRT_DEVICE=GPU torchrun \
# PJRT_DEVICE=CUDA torchrun \
--nnodes=2 \
--node_rank=1 \
--nproc_per_node=4 \
Expand Down
4 changes: 3 additions & 1 deletion test/ds/test_dynamic_shape_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ def forward(self, x):


@unittest.skipIf(
not xm.get_xla_supported_devices("GPU") and
# Currently a change break this test on CUDA. Another change is trying to
# roll back it. Will uncomment the line below once it is rolled back.
# not xm.get_xla_supported_devices("CUDA") and
not xm.get_xla_supported_devices("TPU"),
f"The tests fail on CPU. See https://github.com/pytorch/xla/issues/4298 for more detail."
)
Expand Down
2 changes: 1 addition & 1 deletion test/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ function run_torchrun {
if [ -x "$(command -v nvidia-smi)" ] && [ "$XLA_CUDA" != "0" ]; then
echo "Running torchrun test for GPU $@"
num_devices=$(nvidia-smi --list-gpus | wc -l)
PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node $num_devices $@
PJRT_DEVICE=CUDA torchrun --nnodes 1 --nproc-per-node $num_devices $@
fi
}

Expand Down
8 changes: 8 additions & 0 deletions torch_xla/core/xla_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re
import threading
import time
import warnings
from typing import List, Optional
import torch
import torch.distributed._functional_collectives
Expand Down Expand Up @@ -88,6 +89,13 @@ def get_xla_supported_devices(devkind=None, max_devices=None):
Returns:
The list of device strings.
"""
# TODO(xiowei replace gpu with cuda): Remove the below if statement after r2.2 release.
if devkind and devkind.casefold() == 'gpu':
warnings.warn(
"GPU as a device name is being deprecate. Please replace it with CUDA such as get_xla_supported_devices(devkind='CUDA'). Similarly, please replace PJRT_DEVICE=GPU with PJRT_DEVICE=CUDA."
)
devkind = 'CUDA'

xla_devices = _DEVICES.value
devkind = [devkind] if devkind else [
'TPU', 'GPU', 'XPU', 'NEURON', 'CPU', 'CUDA', 'ROCM'
Expand Down
9 changes: 8 additions & 1 deletion torch_xla/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def _maybe_select_default_device():
os.environ[xenv.PJRT_DEVICE] = 'TPU'
# TODO(wcromar): Detect GPU device
elif xu.getenv_as(xenv.GPU_NUM_DEVICES, int, 0) > 0:
logging.warning('GPU_NUM_DEVICES is set. Setting PJRT_DEVICE=GPU')
logging.warning('GPU_NUM_DEVICES is set. Setting PJRT_DEVICE=CUDA')
os.environ[xenv.PJRT_DEVICE] = 'CUDA'
else:
logging.warning('Defaulting to PJRT_DEVICE=CPU')
Expand Down Expand Up @@ -107,6 +107,13 @@ def xla_device(n: Optional[int] = None,
Returns:
A `torch.device` representing an XLA device.
"""
# TODO(xiowei replace gpu with cuda): Remove the warning message at r2.2 release.
pjrt_device = xu.getenv_as(xenv.PJRT_DEVICE, str)
if pjrt_device.casefold() == 'gpu':
warnings.warn(
'PJRT_DEVICE=GPU is being deprecate. Please replace PJRT_DEVICE=GPU with PJRT_DEVICE=CUDA.'
)

if n is None:
return torch.device(torch_xla._XLAC._xla_get_default_device())

Expand Down

0 comments on commit f01cdb6

Please sign in to comment.