From c27a493e97ede0f80eb4a724af43c26e94c13e9e Mon Sep 17 00:00:00 2001 From: Will Cromar Date: Tue, 11 Jul 2023 18:03:02 -0700 Subject: [PATCH] Make README more actionable (#5262) * Make README more actionable * move profiling guide link * text wrapping --- README.md | 298 +++++++++++++++++++++++++++++++--------------------- docs/gpu.md | 11 +- 2 files changed, 188 insertions(+), 121 deletions(-) diff --git a/README.md b/README.md index d27440b500d..a2b50e1df7f 100644 --- a/README.md +++ b/README.md @@ -1,125 +1,133 @@ # PyTorch/XLA -Current CI status: [![CircleCI](https://circleci.com/gh/pytorch/xla.svg?style=svg)](https://circleci.com/gh/pytorch/xla) +Current CI status: ![GitHub Actions +status](https://github.com/pytorch/xla/actions/workflows/build_and_test.yml/badge.svg) -PyTorch/XLA is a Python package that uses the -[XLA deep learning compiler](https://www.tensorflow.org/xla) -to connect the [PyTorch deep learning framework](https://pytorch.org/) and -[Cloud TPUs](https://cloud.google.com/tpu/). You can try it right now, for free, -on a single Cloud TPU with [Google Colab](https://colab.research.google.com/), -and use it in production and on Cloud TPU Pods -with [Google Cloud](https://cloud.google.com/gcp). +PyTorch/XLA is a Python package that uses the [XLA deep learning +compiler](https://www.tensorflow.org/xla) to connect the [PyTorch deep learning +framework](https://pytorch.org/) and [Cloud +TPUs](https://cloud.google.com/tpu/). You can try it right now, for free, on a +single Cloud TPU VM with +[Kaggle](https://www.kaggle.com/discussions/product-feedback/369338)! -Take a look at one of our Colab notebooks to quickly try different PyTorch networks -running on Cloud TPUs and learn how to use Cloud TPUs as PyTorch devices: +Take a look at one of our [Kaggle +notebooks](https://github.com/pytorch/xla/tree/master/contrib/kaggle) to get +started: -* [Getting Started with PyTorch on Cloud TPUs](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/getting-started.ipynb) -* [Training AlexNet on Fashion MNIST with a single Cloud TPU Core](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/single-core-alexnet-fashion-mnist.ipynb) -* [Training AlexNet on Fashion MNIST with multiple Cloud TPU Cores](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/multi-core-alexnet-fashion-mnist.ipynb) -* [Fast Neural Style Transfer (NeurIPS 2019 Demo)](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/style_transfer_inference.ipynb) -* [Training A Simple Convolutional Network on MNIST](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/mnist-training.ipynb) -* [Training a ResNet18 Network on CIFAR10](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/resnet18-training.ipynb) -* [ImageNet Inference with ResNet50](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/resnet50-inference.ipynb) -* [Training DC-GAN using Colab Cloud TPU](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/DC-GAN.ipynb) +* [Stable Diffusion with PyTorch/XLA + 2.0](https://github.com/pytorch/xla/blob/master/contrib/kaggle/pytorch-xla-2-0-on-kaggle.ipynb) +* [Distributed PyTorch/XLA + Basics](https://github.com/pytorch/xla/blob/master/contrib/kaggle/distributed-pytorch-xla-basics-with-pjrt.ipynb) -The rest of this README covers: +## Getting Started -* [User Guide & Best Practices](#user-guide--best-practices) -* [Running PyTorch on Cloud TPUs and GPU](#running-pytorchxla-on-cloud-tpu-and-gpu) -Google Cloud also runs networks faster than Google Colab. -* [Available docker images and wheels](#available-docker-images-and-wheels) -* [Performance Profiling and Auto-Metrics Analysis](#performance-profiling-and-auto-metrics-analysis) -* [Troubleshooting](#troubleshooting) -* [Providing Feedback](#providing-feedback) -* [Building and Contributing to PyTorch/XLA](#contributing) -* [Additional Reads](#additional-reads) +To install PyTorch/XLA a new VM: +``` +pip install torch~=2.0.0 https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-2.0-cp38-cp38-linux_x86_64.whl +``` +To update your existing training loop, make the following changes: -Additional information on PyTorch/XLA, including a description of its -semantics and functions, is available at [PyTorch.org](http://pytorch.org/xla/). - -## User Guide & Best Practices - -Our comprehensive user guides are available at: - -[Documentation for the latest release](https://pytorch.org/xla) - -[Documentation for master branch](https://pytorch.org/xla/master) +``` +-import torch.multiprocessing as mp ++import torch_xla.core.xla_model as xm ++import torch_xla.distributed.parallel_loader as pl ++import torch_xla.distributed.xla_multiprocessing as xmp + + def _mp_fn(index): + ... + ++ # Move the model paramters to your XLA device ++ model.to(xm.xla_device()) ++ ++ # MpDeviceLoader preloads data to the XLA device ++ xla_train_loader = pl.MpDeviceLoader(train_loader, xm.xla_device()) + +- for inputs, labels in train_loader: ++ for inputs, labels in xla_train_loader: + optimizer.zero_grad() + outputs = model(inputs) + loss = loss_fn(outputs, labels) + loss.backward() +- optimizer.step() ++ ++ # `xm.optimizer_step` combines gradients across replocas ++ xm.optimizer_step() + + if __name__ == '__main__': +- mp.spawn(_mp_fn, args=(), nprocs=world_size) ++ # xmp.spawn automatically selects the correct world size ++ xmp.spawn(_mp_fn, args=()) +``` -See the [API Guide](API_GUIDE.md) for best practices when writing networks that -run on XLA devices(TPU, GPU, CPU and...) +If you're using `DistributedDataParallel`, make the following changes: -## Running PyTorch/XLA on Cloud TPU and GPU -* [Running on a single Cloud TPU](#running-on-a-single-cloud-tpu-vm) -* [Running on a Cloud TPU Pod](#how-to-run-on-tpu-vm-pods-distributed-training) -* [Running on a Cloud GPU](docs/gpu.md) +``` + import torch.distributed as dist +-import torch.multiprocessing as mp ++import torch_xla.core.xla_model as xm ++import torch_xla.distributed.parallel_loader as pl ++import torch_xla.distributed.xla_multiprocessing as xmp + + def _mp_fn(rank, world_size): + ... + +- os.environ['MASTER_ADDR'] = 'localhost' +- os.environ['MASTER_PORT'] = '12355' +- dist.init_process_group("gloo", rank=rank, world_size=world_size) ++ # Rank and world size are inferred from the XLA device runtime ++ dist.init_process_group("xla", init_method='pjrt://') ++ ++ model.to(xm.xla_device()) ++ # `gradient_as_bucket_view=tpu` required for XLA ++ ddp_model = DDP(model, gradient_as_bucket_view=True) + +- model = model.to(rank) +- ddp_model = DDP(model, device_ids=[rank]) ++ xla_train_loader = pl.MpDeviceLoader(train_loader, xm.xla_device()) + +- for inputs, labels in train_loader: ++ for inputs, labels in xla_train_loader: + optimizer.zero_grad() + outputs = ddp_model(inputs) + loss = loss_fn(outputs, labels) + loss.backward() + optimizer.step() + + if __name__ == '__main__': +- mp.spawn(_mp_fn, args=(), nprocs=world_size) ++ xmp.spawn(_mp_fn, args=()) +``` ---- -## Running on a Single Cloud TPU VM +Additional information on PyTorch/XLA, including a description of its semantics +and functions, is available at [PyTorch.org](http://pytorch.org/xla/). See the +[API Guide](API_GUIDE.md) for best practices when writing networks that run on +XLA devices (TPU, GPU, CPU and...). -Google Cloud offers TPU VMs for more transparent and easier access to the TPU hardware. This is our **recommended way** of running PyTorch/XLA on Cloud TPU. Please check out our [Cloud TPU VM User Guide](https://cloud.google.com/tpu/docs/pytorch-xla-ug-tpu-vm). To learn more about the Cloud TPU System Architecture, please check out [this doc](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm#tpu_vms). +Our comprehensive user guides are available at: +[Documentation for the latest release](https://pytorch.org/xla) ---- +[Documentation for master branch](https://pytorch.org/xla/master) -## How to Run on TPU VM Pods (distributed training) -If a single TPU VM does not suit your requirement, you can consider using TPU Pod. TPU Pod is a collection of TPU devices connected by dedicated high-speed network interfaces. Please checkout our [Cloud TPU VM Pod User Guide](https://cloud.google.com/tpu/docs/pytorch-pods). +## PyTorch/XLA tutorials +* [Cloud TPU VM + quickstart](https://cloud.google.com/tpu/docs/run-calculation-pytorch) +* [Cloud TPU Pod slice + quickstart](https://cloud.google.com/tpu/docs/pytorch-pods) +* [Profiling on TPU + VM](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm) +* [GPU guide](docs/gpu.md) ## Available docker images and wheels -### Docker -The following pre-built docker images are available. For running dockers, check [this doc](https://cloud.google.com/tpu/docs/pytorch-xla-ug-tpu-vm#docker-tpuvm) for TPUVM and [this doc](https://github.com/pytorch/xla/blob/master/docs/gpu.md#docker) for GPU. - -| Version | Cloud TPU VMs Docker | -| --- | ----------- | -2.0 | `gcr.io/tpu-pytorch/xla:r2.0_3.8_tpuvm` | -1.13 | `gcr.io/tpu-pytorch/xla:r1.13_3.8_tpuvm` | -nightly python 3.10 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm` | -nightly python 3.8 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_tpuvm` | -nightly python 3.10(>= 2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm_YYYYMMDD` | -nightly python 3.8(>= 2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_tpuvm_YYYYMMDD` | -nightly at date(< 2023/04/25) | `gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm_YYYYMMDD` | - -
- -| Version | GPU CUDA 11.8 + Python 3.8 Docker | -| --- | ----------- | -| 2.0 | `gcr.io/tpu-pytorch/xla:r2.0_3.8_cuda_11.8` | -| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.8` | -| nightly at date(>=2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.8_YYYYMMDD` | -| nightly at date(<2023/04/25) | `gcr.io/tpu-pytorch/xla:nightly_3.8_cuda_11.8_YYYYMMDD` | - -
- -| Version | GPU CUDA 11.7 + Python 3.8 Docker | -| --- | ----------- | -| 2.0 | `gcr.io/tpu-pytorch/xla:r2.0_3.8_cuda_11.7` | -| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7` | -| nightly at date(>=2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7_YYYYMMDD` | -| nightly at date(<2023/04/25) | `gcr.io/tpu-pytorch/xla:nightly_3.8_cuda_11.7_YYYYMMDD` | - -
- -| Version | GPU CUDA 11.2 + Python 3.8 Docker | -| --- | ----------- | -| 1.13 | `gcr.io/tpu-pytorch/xla:r1.13_3.8_cuda_11.2` | - -
- -| Version | GPU CUDA 11.2 + Python 3.7 Docker | -| --- | ----------- | -1.13 | `gcr.io/tpu-pytorch/xla:r1.13_3.7_cuda_11.2` | -1.12 | `gcr.io/tpu-pytorch/xla:r1.12_3.7_cuda_11.2` | - - - -To run on [compute instances with GPUs](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). ### Wheel + | Version | Cloud TPU VMs Wheel | | --- | ----------- | | 2.0 | `https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-2.0-cp38-cp38-linux_x86_64.whl` | @@ -132,7 +140,9 @@ To run on [compute instances with GPUs](https://cloud.google.com/compute/docs/gp
-Note: For TPU Pod customers using XRT (our legacy runtime), we have custom wheels for `torch`, `torchvision`, and `torch_xla` at `https://storage.googleapis.com/tpu-pytorch/wheels/xrt`. +Note: For TPU Pod customers using XRT (our legacy runtime), we have custom +wheels for `torch`, `torchvision`, and `torch_xla` at +`https://storage.googleapis.com/tpu-pytorch/wheels/xrt`. | Package | Cloud TPU VMs Wheel (XRT on Pod, Legacy Only) | | --- | ----------- | @@ -167,11 +177,14 @@ Note: For TPU Pod customers using XRT (our legacy runtime), we have custom wheel | --- | ----------- | | 2.0 | `https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl` | -You can also add `+yyyymmdd` after `torch_xla-nightly` to get the nightly wheel of a specified date. To get the companion pytorch and torchvision nightly wheel, replace the `torch_xla` with `torch` or `torchvision` on above wheel links. +You can also add `+yyyymmdd` after `torch_xla-nightly` to get the nightly wheel +of a specified date. To get the companion pytorch and torchvision nightly wheel, +replace the `torch_xla` with `torch` or `torchvision` on above wheel links. -### Installing libtpu +#### Installing libtpu (before PyTorch/XLA 2.0) -For PyTorch/XLA release r2.0 and older and when developing PyTorch/XLA, install the `libtpu` pip package with the following command: +For PyTorch/XLA release r2.0 and older and when developing PyTorch/XLA, install +the `libtpu` pip package with the following command: ``` pip3 install torch_xla[tpuvm] @@ -179,36 +192,87 @@ pip3 install torch_xla[tpuvm] This is only required on Cloud TPU VMs. -## Performance Profiling and Auto-Metrics Analysis +### Docker -With PyTorch/XLA we provide a set of performance profiling tooling and auto-metrics analysis which you can check the following resources: -* [Official tutorial](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm) -* [Colab notebook](https://colab.research.google.com/github/pytorch/xla/blob/master/contrib/colab/pytorch-xla-profiling-colab.ipynb) -* [Sample MNIST training script with profiling](https://github.com/pytorch/xla/blob/master/test/test_profile_mp_mnist.py) -* [Utility script for capturing performance profiles](https://github.com/pytorch/xla/blob/master/scripts/capture_profile.py) +| Version | Cloud TPU VMs Docker | +| --- | ----------- | +2.0 | `gcr.io/tpu-pytorch/xla:r2.0_3.8_tpuvm` | +1.13 | `gcr.io/tpu-pytorch/xla:r1.13_3.8_tpuvm` | +nightly python 3.10 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm` | +nightly python 3.8 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_tpuvm` | +nightly python 3.10(>= 2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm_YYYYMMDD` | +nightly python 3.8(>= 2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_tpuvm_YYYYMMDD` | +nightly at date(< 2023/04/25) | `gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm_YYYYMMDD` | + +
+ +| Version | GPU CUDA 11.8 + Python 3.8 Docker | +| --- | ----------- | +| 2.0 | `gcr.io/tpu-pytorch/xla:r2.0_3.8_cuda_11.8` | +| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.8` | +| nightly at date(>=2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.8_YYYYMMDD` | +| nightly at date(<2023/04/25) | `gcr.io/tpu-pytorch/xla:nightly_3.8_cuda_11.8_YYYYMMDD` | + +
+ +| Version | GPU CUDA 11.7 + Python 3.8 Docker | +| --- | ----------- | +| 2.0 | `gcr.io/tpu-pytorch/xla:r2.0_3.8_cuda_11.7` | +| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7` | +| nightly at date(>=2023/04/25) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7_YYYYMMDD` | +| nightly at date(<2023/04/25) | `gcr.io/tpu-pytorch/xla:nightly_3.8_cuda_11.7_YYYYMMDD` | + +
+ +| Version | GPU CUDA 11.2 + Python 3.8 Docker | +| --- | ----------- | +| 1.13 | `gcr.io/tpu-pytorch/xla:r1.13_3.8_cuda_11.2` | + +
+ +| Version | GPU CUDA 11.2 + Python 3.7 Docker | +| --- | ----------- | +1.13 | `gcr.io/tpu-pytorch/xla:r1.13_3.7_cuda_11.2` | +1.12 | `gcr.io/tpu-pytorch/xla:r1.12_3.7_cuda_11.2` | + + +To run on [compute instances with +GPUs](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). ## Troubleshooting -If PyTorch/XLA isn't performing as expected, see the -[troubleshooting guide](TROUBLESHOOTING.md), which has suggestions for -debugging and optimizing your network(s). +If PyTorch/XLA isn't performing as expected, see the [troubleshooting +guide](TROUBLESHOOTING.md), which has suggestions for debugging and optimizing +your network(s). ## Providing Feedback The PyTorch/XLA team is always happy to hear from users and OSS contributors! -The best way to reach out is by filing an issue on this Github. Questions, -bug reports, feature requests, build issues, etc. are all welcome! +The best way to reach out is by filing an issue on this Github. Questions, bug +reports, feature requests, build issues, etc. are all welcome! ## Contributing See the [contribution guide](CONTRIBUTING.md). ## Disclaimer -This repository is jointly operated and maintained by Google, Facebook and a number of individual contributors listed in the [CONTRIBUTORS](https://github.com/pytorch/xla/graphs/contributors) file. For questions directed at Facebook, please send an email to opensource@fb.com. For questions directed at Google, please send an email to pytorch-xla@googlegroups.com. For all other questions, please open up an issue in this repository [here](https://github.com/pytorch/xla/issues). + +This repository is jointly operated and maintained by Google, Facebook and a +number of individual contributors listed in the +[CONTRIBUTORS](https://github.com/pytorch/xla/graphs/contributors) file. For +questions directed at Facebook, please send an email to opensource@fb.com. For +questions directed at Google, please send an email to +pytorch-xla@googlegroups.com. For all other questions, please open up an issue +in this repository [here](https://github.com/pytorch/xla/issues). ## Additional Reads + You can find additional useful reading materials in -* [Performance debugging on Cloud TPU VM](https://cloud.google.com/blog/topics/developers-practitioners/pytorchxla-performance-debugging-tpu-vm-part-1) -* [Lazy tensor intro](https://pytorch.org/blog/understanding-lazytensor-system-performance-with-pytorch-xla-on-cloud-tpu/) -* [Scaling deep learning workloads with PyTorch / XLA and Cloud TPU VM](https://cloud.google.com/blog/topics/developers-practitioners/scaling-deep-learning-workloads-pytorch-xla-and-cloud-tpu-vm) -* [Scaling PyTorch models on Cloud TPUs with FSDP](https://pytorch.org/blog/scaling-pytorch-models-on-cloud-tpus-with-fsdp/) +* [Performance debugging on Cloud TPU + VM](https://cloud.google.com/blog/topics/developers-practitioners/pytorchxla-performance-debugging-tpu-vm-part-1) +* [Lazy tensor + intro](https://pytorch.org/blog/understanding-lazytensor-system-performance-with-pytorch-xla-on-cloud-tpu/) +* [Scaling deep learning workloads with PyTorch / XLA and Cloud TPU + VM](https://cloud.google.com/blog/topics/developers-practitioners/scaling-deep-learning-workloads-pytorch-xla-and-cloud-tpu-vm) +* [Scaling PyTorch models on Cloud TPUs with + FSDP](https://pytorch.org/blog/scaling-pytorch-models-on-cloud-tpus-with-fsdp/) diff --git a/docs/gpu.md b/docs/gpu.md index 99ff8655661..afb8291ba30 100644 --- a/docs/gpu.md +++ b/docs/gpu.md @@ -1,11 +1,14 @@ # How to run with PyTorch/XLA:GPU -PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, CPU and … This doc will go over the basic steps to run PyTorch/XLA on a nvidia gpu instance +PyTorch/XLA enables PyTorch users to utilize the XLA compiler which supports accelerators including TPU, GPU, and CPU This doc will go over the basic steps to run PyTorch/XLA on a nvidia gpu instance ## Create a GPU instance Pytorch/XLA currently publish prebuilt docker images and wheels with cuda11.7/8 and python 3.8. We recommend users to create a GPU instance with corresponding config. For a full list of docker images and wheels, please refer to [this doc](https://github.com/pytorch/xla/tree/jackcao/gpu_doc#-available-images-and-wheels). ## Environment Setup + +To create a GPU VM in Google Compute Engine, follow the [Google Cloud documentation](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus). + ### Docker ``` sudo docker pull us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_11.7 @@ -23,7 +26,7 @@ Note that you need to restart the docker to make gpu devices visible in the dock ``` (pytorch) root@20ab2c7a2d06:/# nvidia-smi -Thu Dec 8 06:24:29 2022 +Thu Dec 8 06:24:29 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ @@ -35,7 +38,7 @@ Thu Dec 8 06:24:29 2022 | N/A 36C P0 38W / 300W | 0MiB / 16384MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ - + +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | @@ -70,4 +73,4 @@ Epoch 1 train begin 06:12:38 | Training Device=xla:0/0 Epoch=1 Step=120 Loss=2.68816 Rate=388.35 GlobalRate=169.49 Time=06:14:09 ``` ## AMP (AUTOMATIC MIXED PRECISION) -AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You can checkout our [mnist example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_amp.py) and [imagenet example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_amp.py). Note that we also used a modified version of [optimizers](https://github.com/pytorch/xla/tree/master/torch_xla/amp/syncfree) to avoid the additional sync between device and host. \ No newline at end of file +AMP is very useful on GPU training and PyTorch/XLA reuse Cuda's AMP rule. You can checkout our [mnist example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist_amp.py) and [imagenet example](https://github.com/pytorch/xla/blob/master/test/test_train_mp_imagenet_amp.py). Note that we also used a modified version of [optimizers](https://github.com/pytorch/xla/tree/master/torch_xla/amp/syncfree) to avoid the additional sync between device and host.