-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bring TPU CI bakc to green by reinstall torch/torchvision #6458
Conversation
ok, according to the log of TPUCI(https://github.com/pytorch/xla/pull/6458/checks?check_run_id=21166603603), rename the |
so, successed in other tests and failed in TPU CI, so there are some different when we built the wheel used in test, maybe like some env variable? |
let's build on TPU and test to see whether we could see this failure or not update: met |
infra/ansible/config/pip.yaml
Outdated
@@ -49,5 +49,6 @@ pip: | |||
# Packages that will be installed with the `--nodeps` flag. | |||
pkgs_nodeps: | |||
release_common: | |||
- torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this comment: #6439 (review)
I agree with @vanbasten23's hack in the short term: #6439 (comment)
Basically, just add pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
to the TPU CI script. This is going to fail some of the time when the last nightly doesn't include required changes from the current head. But, failing some of the time is better than failing all of the time. TPU CI script: https://github.com/pytorch/xla/blob/master/test/tpu/xla_test_job.yaml
A short term hack is fine, since we're trying to replace the current TPU CI anyway. cc @mbzomowski
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Will, make sense and TPUCI passed with this change: https://github.com/pytorch/xla/runs/21241385079
let's add this short term hack and wait for TPU CI replacement too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -41,6 +41,7 @@ spec: | |||
- bash | |||
- -cxe | |||
- | | |||
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment with the error we're trying to work around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, good point, added comment
the TPU CI has used
package_version
as2.1.0
for a long time, and we update it to2.3.0
since PyTorch/XLA has announced 2.2 releaseadd short term hack
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
to the TPU CI script to bring TPU CI green before replaced with the new TPU CI