update doc to use PJRT_DEVICE=CUDA instead of PJRT_DEVICE=GPU (#5754)

* update doc to use PJRT_DEVICE=CUDA instead of PJRT_DEVICE=GPU * add warning message. * fix comment and test failure. * skip dynamic shape model test on cuda.
pytorch · Nov 4, 2023 · f01cdb6 · f01cdb6
1 parent a93e14b
commit f01cdb6
Show file tree

Hide file tree

Showing 10 changed files with 32 additions and 13 deletions.
diff --git a/.circleci/common.sh b/.circleci/common.sh
@@ -131,6 +131,8 @@ function run_torch_xla_python_tests() {
       if [ -x "$(command -v nvidia-smi)" ]; then
         # These tests fail on CUDA with 03/30 TF-pin update (https://github.com/pytorch/xla/pull/4840)
         PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
+        # TODO(xiowei replace gpu with cuda): remove the test below with PJRT_DEVICE=GPU because PJRT_DEVICE=GPU is being deprecated.
+        PJRT_DEVICE=GPU python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
         PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --auto_wrap_policy type_based --use_small_fake_sample --num_epochs=1
         XLA_DISABLE_FUNCTIONALIZATION=1 PJRT_DEVICE=CUDA python test/test_train_mp_imagenet_fsdp.py --fake_data --use_nested_fsdp --use_small_fake_sample --num_epochs=1
         # Syncfree SGD optimizer tests

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -94,7 +94,7 @@ To run the tests, follow __one__ of the options below:
 * Run on GPU:
 
   ```Shell
-  export PJRT_DEVICE=GPU GPU_NUM_DEVICES=${NUM_GPU}
+  export PJRT_DEVICE=CUDA GPU_NUM_DEVICES=${NUM_GPU}
   ```
 
 For more detail on configuring the runtime, please refer to [this doc](https://github.com/pytorch/xla/blob/master/docs/pjrt.md#quickstart)

diff --git a/README.md b/README.md
@@ -111,7 +111,7 @@ If you're using `DistributedDataParallel`, make the following changes:
 Additional information on PyTorch/XLA, including a description of its semantics
 and functions, is available at [PyTorch.org](http://pytorch.org/xla/). See the
 [API Guide](API_GUIDE.md) for best practices when writing networks that run on
-XLA devices (TPU, GPU, CPU and...).
+XLA devices (TPU, CUDA, CPU and...).
 
 Our comprehensive user guides are available at:
 

diff --git a/configuration.yaml b/configuration.yaml
@@ -4,7 +4,7 @@ variables:
     PJRT_DEVICE:
       description:
         - Indicates which device is being used with PJRT. It can be either CPU,
-          TPU, or GPU
+          TPU, or CUDA
       type: string
     PJRT_SELECT_DEFAULT_DEVICE:
       description:

diff --git a/docs/gpu.md b/docs/gpu.md
@@ -59,7 +59,7 @@ pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/cuda/117/torch_xl
 In order to run below examples, you need to clone the pytorch/xla repo to access the imagenet example(We already clone it in our docker).
 
 ```
-(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=GPU
+(pytorch) root@20ab2c7a2d06:/# export GPU_NUM_DEVICES=1 PJRT_DEVICE=CUDA
 (pytorch) root@20ab2c7a2d06:/# git clone --recursive https://github.com/pytorch/xla.git
 (pytorch) root@20ab2c7a2d06:/# python xla/test/test_train_mp_imagenet.py --fake_data
 ==> Preparing data..

diff --git a/docs/pjrt.md b/docs/pjrt.md
@@ -196,17 +196,17 @@ for more information.
 
 ### Single-node GPU training
 
-To use GPUs with PJRT, simply set `PJRT_DEVICE=GPU` and configure
+To use GPUs with PJRT, simply set `PJRT_DEVICE=CUDA` and configure
 `GPU_NUM_DEVICES` to the number of devices on the host. For example:
 
 ```
-PJRT_DEVICE=GPU GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
+PJRT_DEVICE=CUDA GPU_NUM_DEVICES=4 python3 xla/test/test_train_mp_imagenet.py --fake_data --batch_size=128 --num_epochs=1
 ```
 
 You can also use `torchrun` to initiate the single-node multi-GPU training. For example,
 
 ```
-PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
+PJRT_DEVICE=CUDA torchrun --nnodes 1 --nproc-per-node ${NUM_GPU_DEVICES} xla/test/test_train_mp_imagenet.py --fake_data --pjrt_distributed --batch_size=128 --num_epochs=1
 ```
 
 In the above example, `--nnodes` means how many machines (physical machines or VMs) to be used (it is 1 since we do single-node training). `--nproc-per-node` means how many GPU devices to be used.
@@ -216,7 +216,7 @@ In the above example, `--nnodes` means how many machines (physical machines or V
 **Note that this feature only works for cuda 12+**. Similar to how PyTorch uses multi-node training, you can run the command as below:
 
 ```
-PJRT_DEVICE=GPU torchrun \
+PJRT_DEVICE=CUDA torchrun \
 --nnodes=${NUMBER_GPU_VM} \
 --node_rank=${CURRENT_NODE_RANK} \
 --nproc_per_node=${NUMBER_LOCAL_GPU_DEVICES} \
@@ -231,7 +231,7 @@ PJRT_DEVICE=GPU torchrun \
 For example, if you want to train on 2 GPU machines: machine_0 and machine_1, on the first GPU machine machine_0, run
 
 ```
-# PJRT_DEVICE=GPU torchrun \
+# PJRT_DEVICE=CUDA torchrun \
 --nnodes=2 \
 --node_rank=0 \
 --nproc_per_node=4 \
@@ -241,7 +241,7 @@ For example, if you want to train on 2 GPU machines: machine_0 and machine_1, on
 On the second GPU machine, run
 
 ```
-# PJRT_DEVICE=GPU torchrun \
+# PJRT_DEVICE=CUDA torchrun \
 --nnodes=2 \
 --node_rank=1 \
 --nproc_per_node=4 \

diff --git a/test/ds/test_dynamic_shape_models.py b/test/ds/test_dynamic_shape_models.py
@@ -43,7 +43,9 @@ def forward(self, x):
 
 
 @unittest.skipIf(
-    not xm.get_xla_supported_devices("GPU") and
+    # Currently a change break this test on CUDA. Another change is trying to
+    # roll back it. Will uncomment the line below once it is rolled back.
+    # not xm.get_xla_supported_devices("CUDA") and
     not xm.get_xla_supported_devices("TPU"),
     f"The tests fail on CPU. See https://github.com/pytorch/xla/issues/4298 for more detail."
 )

diff --git a/test/run_tests.sh b/test/run_tests.sh
@@ -127,7 +127,7 @@ function run_torchrun {
   if [ -x "$(command -v nvidia-smi)" ] && [ "$XLA_CUDA" != "0" ]; then
     echo "Running torchrun test for GPU $@"
     num_devices=$(nvidia-smi --list-gpus | wc -l)
-    PJRT_DEVICE=GPU torchrun --nnodes 1 --nproc-per-node $num_devices $@
+    PJRT_DEVICE=CUDA torchrun --nnodes 1 --nproc-per-node $num_devices $@
   fi 
 }
 

diff --git a/torch_xla/core/xla_model.py b/torch_xla/core/xla_model.py
@@ -5,6 +5,7 @@
 import re
 import threading
 import time
+import warnings
 from typing import List, Optional
 import torch
 import torch.distributed._functional_collectives
@@ -88,6 +89,13 @@ def get_xla_supported_devices(devkind=None, max_devices=None):
   Returns:
     The list of device strings.
   """
+  # TODO(xiowei replace gpu with cuda): Remove the below if statement after r2.2 release.
+  if devkind and devkind.casefold() == 'gpu':
+    warnings.warn(
+        "GPU as a device name is being deprecate. Please replace it with CUDA such as get_xla_supported_devices(devkind='CUDA'). Similarly, please replace PJRT_DEVICE=GPU with PJRT_DEVICE=CUDA."
+    )
+    devkind = 'CUDA'
+
   xla_devices = _DEVICES.value
   devkind = [devkind] if devkind else [
       'TPU', 'GPU', 'XPU', 'NEURON', 'CPU', 'CUDA', 'ROCM'

diff --git a/torch_xla/runtime.py b/torch_xla/runtime.py
@@ -40,7 +40,7 @@ def _maybe_select_default_device():
     os.environ[xenv.PJRT_DEVICE] = 'TPU'
   # TODO(wcromar): Detect GPU device
   elif xu.getenv_as(xenv.GPU_NUM_DEVICES, int, 0) > 0:
-    logging.warning('GPU_NUM_DEVICES is set. Setting PJRT_DEVICE=GPU')
+    logging.warning('GPU_NUM_DEVICES is set. Setting PJRT_DEVICE=CUDA')
     os.environ[xenv.PJRT_DEVICE] = 'CUDA'
   else:
     logging.warning('Defaulting to PJRT_DEVICE=CPU')
@@ -107,6 +107,13 @@ def xla_device(n: Optional[int] = None,
   Returns:
     A `torch.device` representing an XLA device.
   """
+  # TODO(xiowei replace gpu with cuda): Remove the warning message at r2.2 release.
+  pjrt_device = xu.getenv_as(xenv.PJRT_DEVICE, str)
+  if pjrt_device.casefold() == 'gpu':
+    warnings.warn(
+        'PJRT_DEVICE=GPU is being deprecate. Please replace PJRT_DEVICE=GPU with PJRT_DEVICE=CUDA.'
+    )
+
   if n is None:
     return torch.device(torch_xla._XLAC._xla_get_default_device())