Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bring TPU CI bakc to green by reinstall torch/torchvision #6458

Merged
merged 7 commits into from
Feb 5, 2024

Conversation

ManfeiBai
Copy link
Collaborator

@ManfeiBai ManfeiBai commented Feb 2, 2024

the TPU CI has used package_version as 2.1.0 for a long time, and we update it to 2.3.0 since PyTorch/XLA has announced 2.2 release

add short term hack pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu to the TPU CI script to bring TPU CI green before replaced with the new TPU CI

@ManfeiBai ManfeiBai marked this pull request as ready for review February 2, 2024 18:38
@ManfeiBai
Copy link
Collaborator Author

ok, according to the log of TPUCI(https://github.com/pytorch/xla/pull/6458/checks?check_run_id=21166603603), rename the package_version didn't change a lot, the chaneg would only rename the wheel, and this error might be a real error: RuntimeError: operator torchvision::nms does not exist

@ManfeiBai
Copy link
Collaborator Author

so, successed in other tests and failed in TPU CI, so there are some different when we built the wheel used in test, maybe like some env variable?

@ManfeiBai
Copy link
Collaborator Author

ManfeiBai commented Feb 2, 2024

let's build on TPU and test to see whether we could see this failure or not


update: met RuntimeError: operator torchvision::nms does not exist when built on TPUVM directly

@@ -49,5 +49,6 @@ pip:
# Packages that will be installed with the `--nodeps` flag.
pkgs_nodeps:
release_common:
- torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this comment: #6439 (review)

I agree with @vanbasten23's hack in the short term: #6439 (comment)

Basically, just add pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu to the TPU CI script. This is going to fail some of the time when the last nightly doesn't include required changes from the current head. But, failing some of the time is better than failing all of the time. TPU CI script: https://github.com/pytorch/xla/blob/master/test/tpu/xla_test_job.yaml

A short term hack is fine, since we're trying to replace the current TPU CI anyway. cc @mbzomowski

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Will, make sense and TPUCI passed with this change: https://github.com/pytorch/xla/runs/21241385079

let's add this short term hack and wait for TPU CI replacement too

@ManfeiBai ManfeiBai changed the title update TPU CI trigger used package_version bring TPU CI bakc to green by reinstall torch/torchvision Feb 5, 2024
Copy link
Collaborator

@will-cromar will-cromar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -41,6 +41,7 @@ spec:
- bash
- -cxe
- |
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with the error we're trying to work around?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good point, added comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants