diff --git a/.circleci/README.md b/.circleci/README.md deleted file mode 100644 index d01e6138317..00000000000 --- a/.circleci/README.md +++ /dev/null @@ -1,19 +0,0 @@ -# CircleCI Overview -PyTorch and PyTorch/XLA use CircleCI to lint, build, and test each PR that is submitted. All CircleCI tests should succeed before the PR is merged into master. PyTorch CircleCI pins PyTorch/XLA to a specific commit. On the other hand, PyTorch/XLA CircleCI pulls PyTorch from master unless a pin is manually provided. This README will go through the reasons of these pins, how to pin a PyTorch/XLA PR to an upstream PyTorch PR, and how to coordinate a merge for breaking PyTorch changes. - -## Why does PyTorch CircleCI pin PyTorch/XLA? -As mentioned above, [PyTorch CircleCI pins PyTorch/XLA](https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/common_utils.sh#L119) to a "known good" commit to prevent accidental changes from PyTorch/XLA to break PyTorch CircleCI without warning. PyTorch has hundreds of commits each week, and this pin ensures that PyTorch/XLA as a downstream package does not cause failures in PyTorch CircleCI. - -## Why does PyTorch/XLA CircleCI pull from PyTorch master? -[PyTorch/XLA CircleCI pulls PyTorch from master](https://github.com/pytorch/xla/blob/f3415929683880192b63b285921c72439af55bf0/.circleci/common.sh#L15) unless a PyTorch pin is manually provided. PyTorch/XLA is a downstream package to PyTorch, and pulling from master ensures that PyTorch/XLA will stay up-to-date and works with the latest PyTorch changes. - -## Pinning PyTorch PR in PyTorch/XLA PR -Sometimes a PyTorch/XLA PR needs to be pinned to a specific PyTorch PR to test new featurues, fix breaking changes, etc. Since PyTorch/XLA CircleCI pulls from PyTorch master by default, we need to manually provided a PyTorch pin. In a PyTorch/XLA PR, PyTorch an be manually pinned by creating a `.torch_pin` under `/torch_patches`. The `.torch_pin` should have the corresponding PyTorch PR number prefixed by "#". Take a look at [example here](https://github.com/pytorch/xla/pull/3792/commits/40f41fb98b0f2386d287eeac0bae86e873d4a9d8). Before the PyTorch/XLA PR gets merged, the `.torch_pin` must be deleted. - -## Coodinating merges for breaking PyTorch PRs -When PyTorch PR introduces a breaking change, its PyTorch/XLA CircleCI tests will fail. Steps for fixing and merging such breaking PyTorch change is as following: -1. Create a PyTorch/XLA PR to fix this issue with `.torch_pin` and rebase with master to ensure the PR is up-to-date with the latest commit on PyTorch/XLA. Once this PR is created, it'll create a commit hash that will be used in step 2. If you have multiple commits in the PR, use the last one's hash. **Important note: When you rebase this PR, it'll create a new commit hash and make the old hash obsolete. Be cautious about rebasing, and if you rebase, make sure you inform the PyTorch PR's author.** -2. Rebase (or ask the PR owner to rebase) the PyTorch PR with master. Update the PyTorch PR to pin the PyTorch/XLA to the commit hash created in step 1 by updating `pytorch/.github/ci_commit_pins/xla.txt`. -3. Once CircleCI tests are green on both ends, merge PyTorch PR. -4. Remove the `.torch_pin` in PyTorch/XLA PR and merge. To be noted, `git commit --amend` should be avoided in this step as PyTorch CI will keep using the commit hash created in step 1 until other PRs update that manually or the nightly buildbot updates that automatically. -5. Finally, don't delete your branch until 2 days later. See step 4 for explanations. diff --git a/.circleci/setup_ci_environment.sh b/.circleci/setup_ci_environment.sh index eba2c373b8a..87a61524e7e 100755 --- a/.circleci/setup_ci_environment.sh +++ b/.circleci/setup_ci_environment.sh @@ -58,7 +58,7 @@ sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic d # How to figure out what the correct versions of these packages are? # My preferred method is to start a Docker instance of the correct # Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask -# apt what the packages you need are. Note that the CircleCI image +# apt what the packages you need are. Note that the CI image # comes with Docker. # # Using 'retry' here as belt-and-suspenders even though we are diff --git a/.github/README.md b/.github/README.md new file mode 100644 index 00000000000..c2f4d37426c --- /dev/null +++ b/.github/README.md @@ -0,0 +1,19 @@ +# CI Overview +PyTorch and PyTorch/XLA use CI to lint, build, and test each PR that is submitted. All CI tests should succeed before the PR is merged into master. PyTorch CI pins PyTorch/XLA to a specific commit. On the other hand, PyTorch/XLA CI pulls PyTorch from master unless a pin is manually provided. This README will go through the reasons of these pins, how to pin a PyTorch/XLA PR to an upstream PyTorch PR, and how to coordinate a merge for breaking PyTorch changes. + +## Why does PyTorch CI pin PyTorch/XLA? +As mentioned above, [PyTorch CI pins PyTorch/XLA](https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/common_utils.sh#L119) to a "known good" commit to prevent accidental changes from PyTorch/XLA to break PyTorch CI without warning. PyTorch has hundreds of commits each week, and this pin ensures that PyTorch/XLA as a downstream package does not cause failures in PyTorch CI. + +## Why does PyTorch/XLA CI pull from PyTorch master? +[PyTorch/XLA CI pulls PyTorch from master](https://github.com/pytorch/xla/blob/f3415929683880192b63b285921c72439af55bf0/.circleci/common.sh#L15) unless a PyTorch pin is manually provided. PyTorch/XLA is a downstream package to PyTorch, and pulling from master ensures that PyTorch/XLA will stay up-to-date and works with the latest PyTorch changes. + +## Pinning PyTorch PR in PyTorch/XLA PR +Sometimes a PyTorch/XLA PR needs to be pinned to a specific PyTorch PR to test new featurues, fix breaking changes, etc. Since PyTorch/XLA CI pulls from PyTorch master by default, we need to manually provided a PyTorch pin. In a PyTorch/XLA PR, PyTorch an be manually pinned by creating a `.torch_pin` file at the root of the repository. The `.torch_pin` should have the corresponding PyTorch PR number prefixed by "#". Take a look at [example here](https://github.com/pytorch/xla/pull/3792/commits/40f41fb98b0f2386d287eeac0bae86e873d4a9d8). Before the PyTorch/XLA PR gets merged, the `.torch_pin` must be deleted. + +## Coodinating merges for breaking PyTorch PRs +When PyTorch PR introduces a breaking change, its PyTorch/XLA CI tests will fail. Steps for fixing and merging such breaking PyTorch change is as following: +1. Create a PyTorch/XLA PR to fix this issue with `.torch_pin` and rebase with master to ensure the PR is up-to-date with the latest commit on PyTorch/XLA. Once this PR is created, it'll create a commit hash that will be used in step 2. If you have multiple commits in the PR, use the last one's hash. **Important note: When you rebase this PR, it'll create a new commit hash and make the old hash obsolete. Be cautious about rebasing, and if you rebase, make sure you inform the PyTorch PR's author.** +2. Rebase (or ask the PR owner to rebase) the PyTorch PR with master. Update the PyTorch PR to pin the PyTorch/XLA to the commit hash created in step 1 by updating `pytorch/.github/ci_commit_pins/xla.txt`. +3. Once CI tests are green on both ends, merge PyTorch PR. +4. Remove the `.torch_pin` in PyTorch/XLA PR and merge. To be noted, `git commit --amend` should be avoided in this step as PyTorch CI will keep using the commit hash created in step 1 until other PRs update that manually or the nightly buildbot updates that automatically. +5. Finally, don't delete your branch until 2 days later. See step 4 for explanations. diff --git a/.github/workflows/_build_torch_xla.yml b/.github/workflows/_build_torch_xla.yml index 3e85b7c4c98..c3200b76ef1 100644 --- a/.github/workflows/_build_torch_xla.yml +++ b/.github/workflows/_build_torch_xla.yml @@ -38,7 +38,6 @@ jobs: repository: pytorch/pytorch path: pytorch submodules: recursive - # TODO: correct pin - name: Checkout PyTorch/XLA Repo uses: actions/checkout@v4 with: diff --git a/.github/workflows/lintercheck.yml b/.github/workflows/lintercheck.yml index 6598b98da32..b17c608f883 100644 --- a/.github/workflows/lintercheck.yml +++ b/.github/workflows/lintercheck.yml @@ -24,7 +24,7 @@ jobs: if: github.event_name == 'push' && github.event.ref == 'refs/heads/master' shell: bash run: | - TORCH_PIN=./torch_patches/.torch_pin + TORCH_PIN=./.torch_pin if [[ -f "${TORCH_PIN}" ]]; then echo "Please remove ${TORCH_PIN} before landing." exit 1 diff --git a/OP_LOWERING_GUIDE.md b/OP_LOWERING_GUIDE.md index b445a1d8998..535d7cf596c 100644 --- a/OP_LOWERING_GUIDE.md +++ b/OP_LOWERING_GUIDE.md @@ -25,7 +25,7 @@ All file mentioned below lives under the `xla/torch_xla/csrc` folder, with the e 7. `ops/` directory contains all `ir::ops` declaration and definition. Smaller nodes can be put in `ops/ops.h/.cpp`. More complicated nodes can be put into a separate file. All ops inherit from `ir::ops::Node` and provide a way to lower input `ir::Value` to a sequence of `XlaOp`. ## Unit Test -Our CircleCI runs PyTorch native python tests for every change and every day. Those tests will use XLA implementation if we provide a lowering. We usually don’t need to add additional python tests for PyTorch/XLA unless we want to verify some xla behaviors(like dynamic shape) or we skipped the pytorch native test for some reason. The python test should be added to `xla/test/test_operations.py` if it is required. We also need to add CPP tests in `xla/test/cpp/test_aten_xla_tensor.cpp`. This test should call PyTorch c++ API and verify our implementation yields the same result as PyTorch native implementation. We also need to verify if the xla implementation is called when the tensor is a XLA tensor by checking the `aten::op` and `xla::op` counters. +Our CI runs PyTorch native python tests for every change and every day. Those tests will use XLA implementation if we provide a lowering. We usually don’t need to add additional python tests for PyTorch/XLA unless we want to verify some xla behaviors(like dynamic shape) or we skipped the pytorch native test for some reason. The python test should be added to `xla/test/test_operations.py` if it is required. We also need to add CPP tests in `xla/test/cpp/test_aten_xla_tensor.cpp`. This test should call PyTorch c++ API and verify our implementation yields the same result as PyTorch native implementation. We also need to verify if the xla implementation is called when the tensor is a XLA tensor by checking the `aten::op` and `xla::op` counters. ## Tips The process of lowering is breaking down the PyTorch operations into a sequence of XlaOp. To provide a good lowering of the PyTorch operation, one needs to have a good grasp of what XLA is capable of. Reading the XlaOp document and looking into how similar ops is lowered is the best way to achieve that. You can find a minimal Op lowering example in [this pr](https://github.com/pytorch/xla/pull/2969). You can also find a slightly more complicated example with backward lowering in [this pr](https://github.com/pytorch/xla/pull/2972). diff --git a/benchmarks/run_benchmark.sh b/benchmarks/run_benchmark.sh index fd8a055bccc..e4e483947d9 100644 --- a/benchmarks/run_benchmark.sh +++ b/benchmarks/run_benchmark.sh @@ -5,7 +5,7 @@ LOGFILE=/tmp/benchmark_test.log # Note [Keep Going] # -# Set the `CONTINUE_ON_ERROR` flag to `true` to make the CircleCI tests continue on error. +# Set the `CONTINUE_ON_ERROR` flag to `true` to make the CI tests continue on error. # This will allow you to see all the failures on your PR, not stopping with the first # test failure like the default behavior. CONTINUE_ON_ERROR="${CONTINUE_ON_ERROR:-0}" diff --git a/docs/README.md b/docs/README.md index 33a0ce5bc36..a405597c798 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,12 +1,12 @@ ## Publish documentation for a new release. -CircleCI job `pytorch_xla_linux_debian11_and_push_doc` is specified to run on `release/*` branches, but it was not +CI job `pytorch_xla_linux_debian11_and_push_doc` is specified to run on `release/*` branches, but it was not run on release branches due to "Only build pull requests" setting. Turning off "Only build pull requests" will result in much larger volumes in jobs which is often unnecessary. We're waiting for [this feature request](https://ideas.circleci.com/ideas/CCI-I-215) to be implemented so that we could override this setting on some branches. Before the feature is available on CircleCi side, we'll use a manual process to publish documentation for release. -[Documentation for master branch](http://pytorch.org/xla/master/) is still updated automatically by the CircleCI job. +[Documentation for master branch](http://pytorch.org/xla/master/) is still updated automatically by the CI job. But we'll need to manually commit the new versioned doc and point http://pytorch.org/xla to the documentation of new stable release. @@ -22,4 +22,4 @@ cd /tmp/xla git add . git commit -m "Publish 1.5 documentation." git push origin gh-pages -``` \ No newline at end of file +``` diff --git a/infra/ansible/roles/build_srcs/tasks/main.yaml b/infra/ansible/roles/build_srcs/tasks/main.yaml index d945f150d38..87adde1ed21 100644 --- a/infra/ansible/roles/build_srcs/tasks/main.yaml +++ b/infra/ansible/roles/build_srcs/tasks/main.yaml @@ -1,3 +1,18 @@ +- name: Read PyTorch pin + ansible.builtin.command: cat {{ (src_root, 'pytorch/xla/.torch_pin') | path_join }} + register: torch_pin + # Pin may not exist + ignore_errors: true + +- name: Checkout PyTorch pin + # ansible.builtin.git wants to fetch the entire history, so check out the pin manually + ansible.builtin.shell: + cmd: | + git fetch origin {{ torch_pin.stdout }} + git checkout --recurse-submodules {{ torch_pin.stdout }} + chdir: "{{ (src_root, 'pytorch') | path_join }}" + when: torch_pin is succeeded + - name: Build PyTorch ansible.builtin.command: cmd: python setup.py bdist_wheel diff --git a/scripts/apply_patches.sh b/scripts/apply_patches.sh index 923b68c79d4..7ba0a3ef8e3 100755 --- a/scripts/apply_patches.sh +++ b/scripts/apply_patches.sh @@ -7,7 +7,7 @@ XDIR=$CDIR/.. PTDIR=$XDIR/.. OPENXLADIR=$XDIR/third_party/xla -TORCH_PIN="$XDIR/torch_patches/.torch_pin" +TORCH_PIN="$XDIR/.torch_pin" if [ -f "$TORCH_PIN" ]; then CID=$(cat "$TORCH_PIN") # If starts with # and it's not merged into master, fetch from origin diff --git a/test/benchmarks/run_tests.sh b/test/benchmarks/run_tests.sh index 3832b21ed22..fce6140a4fe 100755 --- a/test/benchmarks/run_tests.sh +++ b/test/benchmarks/run_tests.sh @@ -9,7 +9,7 @@ export PYTHONPATH=$PYTHONPATH:$CDIR/../../benchmarks/ # Note [Keep Going] # -# Set the `CONTINUE_ON_ERROR` flag to `true` to make the CircleCI tests continue on error. +# Set the `CONTINUE_ON_ERROR` flag to `true` to make the CI tests continue on error. # This will allow you to see all the failures on your PR, not stopping with the first # test failure like the default behavior. CONTINUE_ON_ERROR="${CONTINUE_ON_ERROR:-0}" diff --git a/test/run_tests.sh b/test/run_tests.sh index e263b64daa7..1c5095baa5a 100755 --- a/test/run_tests.sh +++ b/test/run_tests.sh @@ -8,7 +8,7 @@ VERBOSITY=2 # Note [Keep Going] # -# Set the `CONTINUE_ON_ERROR` flag to `true` to make the CircleCI tests continue on error. +# Set the `CONTINUE_ON_ERROR` flag to `true` to make the CI tests continue on error. # This will allow you to see all the failures on your PR, not stopping with the first # test failure like the default behavior. CONTINUE_ON_ERROR="${CONTINUE_ON_ERROR:-0}" diff --git a/torch_patches/README.md b/torch_patches/README.md deleted file mode 100644 index f6476f64ca5..00000000000 --- a/torch_patches/README.md +++ /dev/null @@ -1,32 +0,0 @@ -# Guidelines For Patch File Names - -Files with extension '.diff' are consider as git patches by apply script. - -A file for PyTorch PR _N_ needs to be named 'N.diff'. - -Patch files which are not related to PyTorch PRs, should begin with an 'X' character, -followed by a two digit number, followed by a dash ('-'), a name, and '.diff'. -Example: - -``` -X10-optimizer.diff -``` - -Patch file are alphabetically ordered, so PyTorch PR patches are always applied -before the non PyTorch ones. - - -There's a special file `torch_patches/.torch_pin`, which is used to coordinate landing PRs in -`pytorch/pytorch` and `pytorch/xla`. - -To test a `pytorch/xla` PR against a `pytorch/pytorch` PR or branch, -put the PR number or branch name in this file. -Example: - -``` -#32451 -# or -my_awesome_branch # (must live in `pytorch/pytorch`) -``` - -In the case where the pytorch/pytorch PR also depends on the pytorch/xla PR, you will also need to update the https://github.com/pytorch/pytorch/blob/main/.github/ci_commit_pins/xla.txt to match the latest hash of your pytorch/xla PR. To be noted, the hash from a PR produced by a fork won't work in this case. Then you need to find someone from the pytorch/xla team to produe a branch PR for you.