Getting "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" with torch-xla nightly wheel for 2.6 #8406

jeffhataws · 2024-11-21T18:34:22Z

🐛 Bug

Currently, if I install from nightly using instructions from README.md, and try to run a simple test, I see "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" which has "cxx11" in it indicating that torch-xla nightly may have ABI cxx11 on by default:

(aws_neuron_venv_pytorch_pt26) ubuntu@ip-10-0-8-190:~$ python -c "import torch; import torch_xla; x=torch.ones((10,10), device='xla'); print(x*x)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/aws_neuron_venv_pytorch_pt26/lib/python3.10/site-packages/torch_xla/__init__.py", line 20, in <module>
    import _XLAC
ImportError: /home/ubuntu/aws_neuron_venv_pytorch_pt26/lib/python3.10/site-packages/_XLAC.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E

To Reproduce

Install nightly torch-xla together with torch using instructions from https://github.com/pytorch/xla/blob/master/README.md#installation.

pip3 install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
pip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl' -f https://storage.googleapis.com/libtpu-releases/index.html

Then run

python -c "import torch; import torch_xla; x=torch.ones((10,10), device='xla'); print(x*x)"

Expected behavior

Since the default torch is still non-cxx11 version, it would be good to keep default to be non-cxx11.

Environment

Reproducible on XLA backend [CPU/TPU/CUDA]: CPU/Neuron
torch_xla version: 2.6

Additional context

The text was updated successfully, but these errors were encountered:

JackCaoG · 2024-11-21T18:39:21Z

@tengyifei did we switch the default ABI config?

tengyifei · 2024-11-21T19:03:59Z

Shoot, it looks like I screwed up the cxx11 ABI for torch_xla-2.6.0.dev-cp311-cp311-linux_x86_64.whl in particular. I'll assign this bug to me and fix it.

In the meantime @jeffhataws if you're blocked, you can try a filename with a date:

pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0.dev20241121-cp310-cp310-linux_x86_64.whl

This one doesn't have C++11 ABI.

jeffhataws · 2024-12-02T22:06:19Z

@tengyifei just wondering if this is fixed.

tengyifei · 2024-12-05T06:58:25Z

Hi, I'm just back from vacation. This is still on my radar to be fixed. Does the workaround work for you or you're still blocked on it?

jeffhataws · 2024-12-05T16:53:08Z

Hi @tengyifei welcome back. The workaround works but since we want to test nightly builds, we don't want to hardcode the date and have to change it everyday, so its best if we have the non-ABI CXX11 version for nightlies.

This fixes #8406. The existing "Rename and append +YYYYMMDD suffix to nightly wheels" ansible action is pretty confusing since it operates on files in both pytorch/xla/dist and /tmp/staging-wheels. Inadvertently this causes the next "Add cxx11 suffix to wheels built with C++11 ABI" action to miss renaming "torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl", which means we're uploading a C++11 ABI wheel to a non-C++11 location. I've refactored the ansible actions to only operate under /tmp/staging-wheels. Under local ansible test runs: When cxx_abi=0, ansible creates these files under /dist: torch-2.6.0.dev-cp310-cp310-linux_x86_64.whl torch-2.6.0.dev20241206-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev20241206-cp310-cp310-linux_x86_64.whl torchvision-0.19.0a0+d23a6e1-cp310-cp310-linux_x86_64.whl When cxx_abi=1, ansible creates these files under /dist: torch-2.6.0.dev.cxx11-cp310-cp310-linux_x86_64.whl torch-2.6.0.dev20241206.cxx11-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev.cxx11-cp310-cp310-linux_x86_64.whl torch_xla-2.6.0.dev20241206.cxx11-cp310-cp310-linux_x86_64.whl torchvision-0.19.0a0+d23a6e1.cxx11-cp310-cp310-linux_x86_64.whl The files under /dist are then uploaded to GCS. I also added documentation about C++11 ABI wheels to the README.

tengyifei · 2024-12-06T04:35:03Z

#8465 should fix it

tengyifei self-assigned this Nov 21, 2024

tengyifei mentioned this issue Dec 6, 2024

Improve wheel naming logic in ansible #8465

Merged

tengyifei closed this as completed in #8465 Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" with torch-xla nightly wheel for 2.6 #8406

Getting "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" with torch-xla nightly wheel for 2.6 #8406

jeffhataws commented Nov 21, 2024

JackCaoG commented Nov 21, 2024

tengyifei commented Nov 21, 2024

jeffhataws commented Dec 2, 2024

tengyifei commented Dec 5, 2024

jeffhataws commented Dec 5, 2024

tengyifei commented Dec 6, 2024

Getting "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" with torch-xla nightly wheel for 2.6 #8406

Getting "undefined symbol: _ZN5torch4lazy13MetricFnValueB5cxx11E" with torch-xla nightly wheel for 2.6 #8406

Comments

jeffhataws commented Nov 21, 2024

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

JackCaoG commented Nov 21, 2024

tengyifei commented Nov 21, 2024

jeffhataws commented Dec 2, 2024

tengyifei commented Dec 5, 2024

jeffhataws commented Dec 5, 2024

tengyifei commented Dec 6, 2024