-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CI workflow for tests that requires pytorch CUDA. #7073
Conversation
5f27b23
to
f79aee7
Compare
Note to myself: it seems that BUILD XLA CUDA plugin requires the env var
|
9a251c7
to
91abb38
Compare
Seems without installing the cuda plugin, the tests would fail with error https://gist.github.com/vanbasten23/7dd6ddeaad93843e57653990c43cf476 |
9c59552
to
723f9d1
Compare
test/test_operations.py
Outdated
@@ -2574,15 +2574,16 @@ def test_dlpack_non_default_layout(self): | |||
cuda_t = torch.arange(25, device=torch.device('cuda')).reshape(5, 5) | |||
|
|||
t1 = cuda_t.t() | |||
print('xw32 t1.device=', t1.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
BAZEL_REMOTE_CACHE: 1 | ||
BUILD_CPP_TESTS: 1 | ||
steps: | ||
- name: Setup gcloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setup is repetitive. I was already on the fence about encapsulating it in a new action (since it already appears the build action, test actions, and the docs push. If we add another copy, it really should be encapsulated so we don't have to update a bunch of places at once.
name: torch-with-cuda-xla-with-cuda-wheels | ||
path: /tmp/wheels/ | ||
pattern: torch-*.whl | ||
- name: Fetch CUDA plugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setup is repetitive. I was already on the fence about encapsulating it in a new action (since it already appears the build action, test actions, and the docs push). If we add another copy that's more-or-less identical, it really should be encapsulated so we don't have to update a bunch of places at once.
Stepping back, is there a way to merge this with _test.yml
? You would need to add a parameter for some of the test groups to install the torch
CUDA build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see your point. Let me give it a try.
shell: bash | ||
run: | | ||
cd pytorch/xla/infra/ansible | ||
ansible-playbook playbook.yaml -vvv -e "stage=build arch=amd64 accelerator=cuda cuda_compute_capabilities=5.2,7.5 src_root=${GITHUB_WORKSPACE} build_cpp_tests=1 git_versioned_xla_build=1 cache_suffix=-ci build_pytorch_with_cuda=1" --skip-tags=fetch_srcs,install_deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this goes against my philosophy of "put everything in ansible", but what if you just build the PyTorch CUDA wheel directly here? I don't think we should build multiple copies of torch_xla
and torchvision
.
Building PyTorch with CUDA support is only part of our CI workflow, and it will never be part of our release workflow. It's okay in my mind to just directly USE_CUDA=1 python setup.py bdist_wheel
here and upload only the torch
GPU wheel as an artifact.
The test workflow can then use the same torch-xla
, torch-xla-cuda-plugin
, and torchvision
.
7576407
to
bdb37d7
Compare
Thanks for working on this. Added myself as a reviewer as I also need this for Triton tests. |
cbd190a
to
1e910fb
Compare
1e910fb
to
bcd007c
Compare
close it in favor of #7140 |
This PR adds a new CI workflow that build pytorch with CUDA enabled from source, build pytorch/xla with CUDA enabled from source, then run tests. The intention is to run tests that requires pytorch with CUDA.
In detail, this PR add 2 more jobs to .github/workflows/build_and_test.yml