Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update doc to use PJRT_DEVICE=CUDA instead of PJRT_DEVICE=GPU #5754

Merged
merged 4 commits into from
Nov 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .circleci/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,8 @@ function run_torch_xla_python_tests() {
if [ -x "$(command -v nvidia-smi)" ]; then
# These tests fail on CUDA with 03/30 TF-pin update (https://github.com/pytorch/xla/pull/4840)
PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
# TODO(xiowei replace gpu with cuda): remove the test below with PJRT_DEVICE=GPU because PJRT_DEVICE=GPU is being deprecated.
PJRT_DEVICE=GPU python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --auto_wrap_policy type_based --use_small_fake_sample --num_epochs=1
XLA_DISABLE_FUNCTIONALIZATION=1 PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
# Syncfree SGD optimizer tests
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ To run the tests, follow __one__ of the options below:
* Run on GPU:

```Shell
export PJRT_DEVICE=GPU GPU_NUM_DEVICES=${NUM_GPU}
export PJRT_DEVICE=CUDA GPU_NUM_DEVICES=${NUM_GPU}
```

For more detail on configuring the runtime, please refer to [this doc](https://github.com/pytorch/xla/blob/master/docs/pjrt.md#quickstart)
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ If you're using `DistributedDataParallel`, make the following changes:
Additional information on PyTorch/XLA, including a description of its semantics
and functions, is available at [PyTorch.org](http://pytorch.org/xla/). See the
[API Guide](API_GUIDE.md) for best practices when writing networks that run on
XLA devices (TPU, GPU, CPU and...).
XLA devices (TPU, CUDA, CPU and...).

Our comprehensive user guides are available at:

Expand Down
2 changes: 1 addition & 1 deletion configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ variables:
PJRT_DEVICE:
description:
- Indicates which device is being used with PJRT. It can be either CPU,
TPU, or GPU
TPU, or CUDA
type: string
PJRT_SELECT_DEFAULT_DEVICE:
description:
Expand Down
2 changes: 1 addition & 1 deletion docs/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/cuda/117/torch_xl
In order to run below examples, you need to clone the pytorch/xla repo to access the imagenet example(We already clone it in our docker).

```
(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=GPU
(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=CUDA
(pytorch) root@20ab2c7a2d06:/# git clone --recursive https://github.com/pytorch/xla.git
(pytorch) root@20ab2c7a2d06:/# python xla/test/test_train_mp_imagenet.py --fake_data
==> Preparing data..
Expand Down
12 changes: 6 additions & 6 deletions docs/pjrt.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,17 +196,17 @@ for more information.

### Single-node GPU training

To use GPUs with PJRT, simply set `PJRT_DEVICE=GPU` and configure
To use GPUs with PJRT, simply set `PJRT_DEVICE=CUDA` and configure
`GPU_NUM_DEVICES` to the number of devices on the host. For example:

```
PJRT_DEVICE=GPU GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
PJRT_DEVICE=CUDA GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
```

You can also use `torchrun` to initiate the single-node multi-GPU training. For example,

```
PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
PJRT_DEVICE=CUDA torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
```

In the above example, `--nnodes` means how many machines (physical machines or VMs) to be used (it is 1 since we do single-node training). `--nproc-per-node` means how many GPU devices to be used.
Expand All @@ -216,7 +216,7 @@ In the above example, `--nnodes` means how many machines (physical machines or V
**Note that this feature only works for cuda 12+**. Similar to how PyTorch uses multi-node training, you can run the command as below:

```
PJRT_DEVICE=GPU torchrun \
PJRT_DEVICE=CUDA torchrun \
--nnodes=${NUMBER_GPU_VM} \
--node_rank=${CURRENT_NODE_RANK} \
--nproc_per_node=${NUMBER_LOCAL_GPU_DEVICES} \
Expand All @@ -231,7 +231,7 @@ PJRT_DEVICE=GPU torchrun \
For example, if you want to train on 2 GPU machines: machine_0 and machine_1, on the first GPU machine machine_0, run

```
# PJRT_DEVICE=GPU torchrun \
# PJRT_DEVICE=CUDA torchrun \
--nnodes=2 \
--node_rank=0 \
--nproc_per_node=4 \
Expand All @@ -241,7 +241,7 @@ For example, if you want to train on 2 GPU machines: machine_0 and machine_1, on
On the second GPU machine, run

```
# PJRT_DEVICE=GPU torchrun \
# PJRT_DEVICE=CUDA torchrun \
--nnodes=2 \
--node_rank=1 \
--nproc_per_node=4 \
Expand Down
4 changes: 3 additions & 1 deletion test/ds/test_dynamic_shape_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,9 @@ def forward(self, x):


@unittest.skipIf(
not xm.get_xla_supported_devices("GPU") and
# Currently a change break this test on CUDA. Another change is trying to
# roll back it. Will uncomment the line below once it is rolled back.
# not xm.get_xla_supported_devices("CUDA") and
not xm.get_xla_supported_devices("TPU"),
f"The tests fail on CPU. See https://github.com/pytorch/xla/issues/4298 for more detail."
)
Expand Down
2 changes: 1 addition & 1 deletion test/run_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ function run_torchrun {
if [ -x "$(command -v nvidia-smi)" ] && [ "$XLA_CUDA" != "0" ]; then
echo "Running torchrun test for GPU $@"
num_devices=$(nvidia-smi --list-gpus | wc -l)
PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node $num_devices $@
PJRT_DEVICE=CUDA torchrun --nnodes 1 --nproc-per-node $num_devices $@
fi
}

Expand Down
8 changes: 8 additions & 0 deletions torch_xla/core/xla_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re
import threading
import time
import warnings
from typing import List, Optional
import torch
import torch.distributed._functional_collectives
Expand Down Expand Up @@ -88,6 +89,13 @@ def get_xla_supported_devices(devkind=None, max_devices=None):
Returns:
The list of device strings.
"""
# TODO(xiowei replace gpu with cuda): Remove the below if statement after r2.2 release.
if devkind and devkind.casefold() == 'gpu':
warnings.warn(
"GPU as a device name is being deprecate. Please replace it with CUDA such as get_xla_supported_devices(devkind='CUDA'). Similarly, please replace PJRT_DEVICE=GPU with PJRT_DEVICE=CUDA."
)
devkind = 'CUDA'

xla_devices = _DEVICES.value
devkind = [devkind] if devkind else [
'TPU', 'GPU', 'XPU', 'NEURON', 'CPU', 'CUDA', 'ROCM'
Expand Down
9 changes: 8 additions & 1 deletion torch_xla/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def _maybe_select_default_device():
os.environ[xenv.PJRT_DEVICE] = 'TPU'
# TODO(wcromar): Detect GPU device
elif xu.getenv_as(xenv.GPU_NUM_DEVICES, int, 0) > 0:
logging.warning('GPU_NUM_DEVICES is set. Setting PJRT_DEVICE=GPU')
logging.warning('GPU_NUM_DEVICES is set. Setting PJRT_DEVICE=CUDA')
os.environ[xenv.PJRT_DEVICE] = 'CUDA'
else:
logging.warning('Defaulting to PJRT_DEVICE=CPU')
Expand Down Expand Up @@ -107,6 +107,13 @@ def xla_device(n: Optional[int] = None,
Returns:
A `torch.device` representing an XLA device.
"""
# TODO(xiowei replace gpu with cuda): Remove the warning message at r2.2 release.
pjrt_device = xu.getenv_as(xenv.PJRT_DEVICE, str)
if pjrt_device.casefold() == 'gpu':
warnings.warn(
'PJRT_DEVICE=GPU is being deprecate. Please replace PJRT_DEVICE=GPU with PJRT_DEVICE=CUDA.'
)

if n is None:
return torch.device(torch_xla._XLAC._xla_get_default_device())

Expand Down