Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI break between clang 17 & 18 causing linkage errors with abseil #87

Closed
h-vetinari opened this issue Dec 30, 2024 · 12 comments · Fixed by #88
Closed

ABI break between clang 17 & 18 causing linkage errors with abseil #87

h-vetinari opened this issue Dec 30, 2024 · 12 comments · Fixed by #88

Comments

@h-vetinari
Copy link
Member

h-vetinari commented Dec 30, 2024

The change from clang 17 to clang 18 seems to have cause an ABI break w.r.t. symbol mangling, causing failures of the following kind when different builds are mixed

yld[23066]: Symbol not found: __ZN4absl12lts_2024072212log_internal10LogMessagelsIiLi0EEERS2_RKT_
  Referenced from: <7DD6F527-7A4A-3649-87B6-D68B25F8B594> /Users/runner/miniconda3/envs/test/lib/libprotobuf.28.2.0.dylib
  Expected in:     <D623F952-8116-35EC-859D-F7F8D5DD7699> /Users/runner/miniconda3/envs/test/lib/libabsl_log_internal_message.2407.0.0.dylib
Subprocess aborted

To fix this, I had added #86, but this has just moved the problem further down the stack. We've rebuilt

so far, but it appears from post-merge comments in #86 that also other places like arrow are affected.

@h-vetinari
Copy link
Member Author

Ah well... conda-forge/libprotobuf-feedstock#244 was applied to 5.28.3, but our pinning is on 5.28.2 😑

However, at least that means we can just migrate for 5.28.3 to ensure stuff gets built consistently

@traversaro
Copy link
Contributor

traversaro commented Jan 2, 2025

After #86 I think I am starting observing in downstream CI the opposite problem in downstream builds of jax.

E   ImportError: dlopen(/Users/runner/micromamba/envs/adamdev/lib/python3.13/site-packages/jaxlib/xla_extension.so, 0x0002): Symbol not found: __ZN4absl12lts_2024072212log_internal10LogMessagelsIPKvLi0EEERS2_RKT_
E     Referenced from: <05E679BA-8DDA-37BF-BA62-E2CC01CA7001> /Users/runner/micromamba/envs/adamdev/lib/python3.13/site-packages/jaxlib/xla_extension.so
E     Expected in:     <D623F952-8116-35EC-859D-F7F8D5DD7699> /Users/runner/micromamba/envs/adamdev/lib/libabsl_log_internal_message.2407.0.0.dylib

see ami-iit/adam#115 for more details.

Just a bit of recap, from what I understand the situation is the following:

clang version symbol present demangled symbol present libabseil version
17 __ZN4absl12lts_2024072212log_internal10LogMessagelsIPKvLi0EEERS2_RKT_ _absl::lts_20240722::log_internal::LogMessage& absl::lts_20240722::log_internal::LogMessage::operator<< <void const*, 0>(void const* const&) cxx17_hf9b8971_1
18 __ZN4absl12lts_2024072212log_internal10LogMessagelsIiLi0EEERS2_RKT_ _absl::lts_20240722::log_internal::LogMessage& absl::lts_20240722::log_internal::LogMessage::operator<< <int, 0>(int const&) cxx17_h07bc746_2

The problem is that the latest build of jaxlib was built with clang 17, so it is not compatible with libabseil compiled with clang 18.

@sanurielf
Copy link

sanurielf commented Jan 2, 2025

I found a similar issue but with libgpr:

ImportError: dlopen(.pixi/envs/dev/lib/python3.12/site-packages/grpc/_cython/cygrpc.cpython-312-darwin.so, 0x0002): Symbol not found: __ZN4absl12lts_2024072212log_internal10LogMessagelsIiLi0EEERS2_RKT_
  Referenced from: <4DB83F7F-C593-3BDC-AEB2-798416EDF135> .pixi/envs/dev/lib/libgpr.44.0.0.dylib
  Expected in:     <D623F952-8116-35EC-859D-F7F8D5DD7699> .pixi/envs/dev/lib/libabsl_log_internal_message.2407.0.0.dylib

@traversaro
Copy link
Contributor

The problem is that the latest build of jaxlib was built with clang 17, so it is not compatible with libabseil compiled with clang 18.

I opened conda-forge/jaxlib-feedstock#296 for visibility.

@h-vetinari
Copy link
Member Author

It would be nice if we could patch abseil to contain both symbols.

@anjos
Copy link

anjos commented Jan 2, 2025

Unfortunately, importing pytorch with this version of libabseil is also failing with the same issues reported here.

@traversaro
Copy link
Contributor

Just in case it is not clear, temporary pinning libabseil to the version =20240722.0=*_1 should fix the loading of libraries that were compiled with clang 17 (such as jax and pytorch).

@anjos
Copy link

anjos commented Jan 2, 2025

Right! I just came to report a similar fix with libabseil = { version = "20240722.0", build = "cxx17_*_1" } (pixi manifest).

@peterbygrave
Copy link

Ah well... conda-forge/libprotobuf-feedstock#244 was applied to 5.28.3, but our pinning is on 5.28.2 😑

However, at least that means we can just migrate for 5.28.3 to ensure stuff gets built consistently

For pytorch this breaks all current and historical packages due to the hard patch-level pin on libprotobuf:

% conda search "pytorch=2.5.1=*py313*7" --info
Loading channels: done
pytorch 2.5.1 cpu_generic_py313_h44dfc17_7
------------------------------------------
file name   : pytorch-2.5.1-cpu_generic_py313_h44dfc17_7.conda
name        : pytorch
version     : 2.5.1
build       : cpu_generic_py313_h44dfc17_7
build number: 7
size        : 25.0 MB
license     : BSD-3-Clause
subdir      : osx-arm64
url         : https://conda.anaconda.org/conda-forge/osx-arm64/pytorch-2.5.1-cpu_generic_py313_h44dfc17_7.conda
md5         : ea4d54af88ab35078ddbc7c2e035c83d
timestamp   : 2024-12-25 10:52:26 UTC
constraints : 
  - pytorch-gpu ==99999999
  - pytorch-cpu ==2.5.1
dependencies: 
  - __osx >=11.0
  - filelock
  - fsspec
  - jinja2
  - libabseil * cxx17*
  - libabseil >=20240722.0,<20240723.0a0
  - libcblas >=3.9.0,<4.0a0
  - libcxx >=18
  - liblapack >=3.9.0,<4.0a0
  - libprotobuf >=5.28.2,<5.28.3.0a0
  - libtorch 2.5.1.*
  - libuv >=1.49.2,<2.0a0
  - llvm-openmp >=18.1.8
  - networkx
  - nomkl
  - numpy >=1.21,<3
  - python >=3.13,<3.14.0a0
  - python >=3.13,<3.14.0a0 *_cp313
  - python_abi 3.13.* *_cp313
  - setuptools
  - sleef >=3.7,<4.0a0
  - sympy >=1.13.1,!=1.13.2
  - typing_extensions

I'm not sure where the best place is to help this situation

@isuruf
Copy link
Member

isuruf commented Jan 3, 2025

#88 should fix this. See also https://releases.llvm.org/18.1.6/tools/clang/docs/ReleaseNotes.html#c-specific-potentially-breaking-changes

@traversaro
Copy link
Contributor

Thanks a lot for working on this @isuruf !

@peterbygrave
Copy link

Nice work on this, I can confirm this fixes my local testings on osx-arm64 when importing pytorch with fresh environment (via pixi) 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants