-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" with torch-xla nightly wheel for 2.6 #8406
Comments
@tengyifei did we switch the default ABI config? |
Shoot, it looks like I screwed up the cxx11 ABI for In the meantime @jeffhataws if you're blocked, you can try a filename with a date:
This one doesn't have C++11 ABI. |
@tengyifei just wondering if this is fixed. |
Hi, I'm just back from vacation. This is still on my radar to be fixed. Does the workaround work for you or you're still blocked on it? |
Hi @tengyifei welcome back. The workaround works but since we want to test nightly builds, we don't want to hardcode the date and have to change it everyday, so its best if we have the non-ABI CXX11 version for nightlies. |
This fixes #8406. The existing "Rename and append +YYYYMMDD suffix to nightly wheels" ansible action is pretty confusing since it operates on files in both pytorch/xla/dist and /tmp/staging-wheels. Inadvertently this causes the next "Add cxx11 suffix to wheels built with C++11 ABI" action to miss renaming "torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl", which means we're uploading a C++11 ABI wheel to a non-C++11 location. I've refactored the ansible actions to only operate under /tmp/staging-wheels. Under local ansible test runs: When cxx_abi=0, ansible creates these files under /dist: torch-2.6.0.dev-cp310-cp310-linux_x86_64.whl torch-2.6.0.dev20241206-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev20241206-cp310-cp310-linux_x86_64.whl torchvision-0.19.0a0+d23a6e1-cp310-cp310-linux_x86_64.whl When cxx_abi=1, ansible creates these files under /dist: torch-2.6.0.dev.cxx11-cp310-cp310-linux_x86_64.whl torch-2.6.0.dev20241206.cxx11-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev.cxx11-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev20241206.cxx11-cp310-cp310-linux_x86_64.whl torchvision-0.19.0a0+d23a6e1.cxx11-cp310-cp310-linux_x86_64.whl The files under /dist are then uploaded to GCS. I also added documentation about C++11 ABI wheels to the README.
This fixes #8406. The existing "Rename and append +YYYYMMDD suffix to nightly wheels" ansible action is pretty confusing since it operates on files in both pytorch/xla/dist and /tmp/staging-wheels. Inadvertently this causes the next "Add cxx11 suffix to wheels built with C++11 ABI" action to miss renaming "torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl", which means we're uploading a C++11 ABI wheel to a non-C++11 location. I've refactored the ansible actions to only operate under /tmp/staging-wheels. Under local ansible test runs: When cxx_abi=0, ansible creates these files under /dist: torch-2.6.0.dev-cp310-cp310-linux_x86_64.whl torch-2.6.0.dev20241206-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev20241206-cp310-cp310-linux_x86_64.whl torchvision-0.19.0a0+d23a6e1-cp310-cp310-linux_x86_64.whl When cxx_abi=1, ansible creates these files under /dist: torch-2.6.0.dev.cxx11-cp310-cp310-linux_x86_64.whl torch-2.6.0.dev20241206.cxx11-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev.cxx11-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev20241206.cxx11-cp310-cp310-linux_x86_64.whl torchvision-0.19.0a0+d23a6e1.cxx11-cp310-cp310-linux_x86_64.whl The files under /dist are then uploaded to GCS. I also added documentation about C++11 ABI wheels to the README.
#8465 should fix it |
🐛 Bug
Currently, if I install from nightly using instructions from README.md, and try to run a simple test, I see "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" which has "cxx11" in it indicating that torch-xla nightly may have ABI cxx11 on by default:
To Reproduce
Install nightly torch-xla together with torch using instructions from https://github.com/pytorch/xla/blob/master/README.md#installation.
Then run
Expected behavior
Since the default torch is still non-cxx11 version, it would be good to keep default to be non-cxx11.
Environment
Additional context
The text was updated successfully, but these errors were encountered: