Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux torch builds report an incompatible version string #315

Closed
1 task done
bschindler opened this issue Jan 8, 2025 · 27 comments
Closed
1 task done

linux torch builds report an incompatible version string #315

bschindler opened this issue Jan 8, 2025 · 27 comments
Labels
bug Something isn't working

Comments

@bschindler
Copy link

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

When installing pytorch from conda-forge, the version reported through importlib doesn't match the version installed on conda-forge, as in it contains a version suffix. In case of 2.4.1, the version reported is:

2.4.1.post300

This causes a number of issues. In "simple" cases, it is that semver cannot parse this string:

>>> import importlib.metadata
>>> v = importlib.metadata.version("torch")
>>> v
'2.4.1.post300'
>>> import semver
>>> semver.Version.parse(v, optional_minor_and_patch=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/miniconda/envs/env/lib/python3.10/site-packages/semver/version.py", line 646, in parse
    raise ValueError(f"{version} is not valid SemVer string")
ValueError: 2.4.1.post300 is not valid SemVer string

It can also cause issues installing pip packages that depend on specific torch versions. For example, we ran into a case where the package requirements were torch >2.4,<=2.4.1. This fails with this version string.

It would be fantastic if the conda package could report just 2.4.1 assuming this being feasible.

Knowing that backports are hard, I am fine with this being fixed just for 2.5

Installed packages

# packages in environment at /opt/miniconda/envs/torch:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
bzip2                     1.0.8                h4bc722e_7    conda-forge
c-ares                    1.34.4               hb9d3cd8_0    conda-forge
ca-certificates           2024.12.14           hbcca054_0    conda-forge
cpp-expected              1.1.0                hf52228f_0    conda-forge
cpython                   3.10.16         py310hd8ed1ab_1    conda-forge
cuda-cudart               12.6.77              h5888daf_0    conda-forge
cuda-cudart_linux-64      12.6.77              h3f2d84a_0    conda-forge
cuda-nvrtc                12.6.85              hbd13f7d_0    conda-forge
cuda-nvtx                 12.6.77              hbd13f7d_0    conda-forge
cuda-version              12.6                 h7480c83_3    conda-forge
cudnn                     9.3.0.75             h62a6f1c_2    conda-forge
filelock                  3.16.1             pyhd8ed1ab_1    conda-forge
fmt                       11.0.2               h434a139_0    conda-forge
fsspec                    2024.12.0          pyhd8ed1ab_0    conda-forge
gmp                       6.3.0                hac33072_2    conda-forge
gmpy2                     2.1.5           py310he8512ff_3    conda-forge
jinja2                    3.1.5              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.3               h659f571_0    conda-forge
ld_impl_linux-64          2.43                 h712a8e2_2    conda-forge
libabseil                 20240722.0      cxx17_hbbce691_4    conda-forge
libarchive                3.7.7                h4585015_3    conda-forge
libblas                   3.9.0           26_linux64_openblas    conda-forge
libcblas                  3.9.0           26_linux64_openblas    conda-forge
libcublas                 12.6.4.1             hbd13f7d_0    conda-forge
libcufft                  11.3.0.4             hbd13f7d_0    conda-forge
libcurand                 10.3.7.77            hbd13f7d_0    conda-forge
libcurl                   8.11.1               h332b0f4_0    conda-forge
libcusolver               11.7.1.2             hbd13f7d_0    conda-forge
libcusparse               12.5.4.2             hbd13f7d_0    conda-forge
libedit                   3.1.20240808    pl5321h7949ede_0    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc                    14.2.0               h77fa898_1    conda-forge
libgcc-ng                 14.2.0               h69a702a_1    conda-forge
libgfortran               14.2.0               h69a702a_1    conda-forge
libgfortran5              14.2.0               hd5240d6_1    conda-forge
libgomp                   14.2.0               h77fa898_1    conda-forge
libhwloc                  2.11.2          default_h0d58e46_1001    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
liblapack                 3.9.0           26_linux64_openblas    conda-forge
liblzma                   5.6.3                hb9d3cd8_1    conda-forge
libmagma                  2.8.0                h566cb83_2    conda-forge
libmagma_sparse           2.8.0                h0af6554_0    conda-forge
libmamba                  2.0.5                h49b8a8d_1    conda-forge
libnghttp2                1.64.0               h161d5f1_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libnvjitlink              12.6.85              hbd13f7d_0    conda-forge
libopenblas               0.3.28          pthreads_h94d23a6_1    conda-forge
libprotobuf               5.28.2               h5b01275_0    conda-forge
libsolv                   0.7.30               h3509ff9_0    conda-forge
libsqlite                 3.47.2               hee588c1_0    conda-forge
libssh2                   1.11.1               hf672d98_0    conda-forge
libstdcxx                 14.2.0               hc0a3c3a_1    conda-forge
libstdcxx-ng              14.2.0               h4852527_1    conda-forge
libtorch                  2.4.1           cuda120_hcf1373b_303    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.49.2               hb9d3cd8_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.13.5               h0d44e9d_1    conda-forge
libzlib                   1.3.1                hb9d3cd8_2    conda-forge
llvm-openmp               19.1.6               h024ca30_0    conda-forge
lz4-c                     1.10.0               h5888daf_1    conda-forge
lzo                       2.10              hd590300_1001    conda-forge
mamba                     2.0.5                h8871ed4_1    conda-forge
markupsafe                3.0.2           py310h89163eb_1    conda-forge
mkl                       2023.2.0         h84fe81f_50496    conda-forge
mpc                       1.3.1                h24ddda3_1    conda-forge
mpfr                      4.2.1                h90cbb55_3    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_1    conda-forge
nccl                      2.24.3.1             hb92ee24_0    conda-forge
ncurses                   6.5                  he02047a_1    conda-forge
networkx                  3.4.2              pyh267e887_2    conda-forge
nlohmann_json             3.11.3               he02047a_1    conda-forge
numpy                     2.2.1           py310h5851e9f_0    conda-forge
openssl                   3.4.0                h7b32b05_1    conda-forge
pip                       24.3.1             pyh8b19718_2    conda-forge
python                    3.10.16         he725a3c_1_cpython    conda-forge
python_abi                3.10                    5_cp310    conda-forge
pytorch                   2.4.1           cuda120_py310hf7eb567_303    conda-forge
readline                  8.2                  h8228510_1    conda-forge
reproc                    14.2.5.post0         hb9d3cd8_0    conda-forge
reproc-cpp                14.2.5.post0         h5888daf_0    conda-forge
setuptools                75.6.0             pyhff2d567_1    conda-forge
simdjson                  3.11.4               h84d6215_0    conda-forge
sleef                     3.7                  h1b44611_2    conda-forge
spdlog                    1.15.0               h10c9db5_0    conda-forge
sympy                     1.13.3           pyh2585a3b_105    conda-forge
tbb                       2021.13.0            hceb3a55_1    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
typing_extensions         4.12.2             pyha770c72_1    conda-forge
tzdata                    2024b                hc8b5060_0    conda-forge
wheel                     0.45.1             pyhd8ed1ab_1    conda-forge
yaml-cpp                  0.8.0                h59595ed_0    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

Environment info

active environment : torch
    active env location : /opt/miniconda/envs/torch
            shell level : 2
       user config file : /root/.condarc
 populated config files : /opt/miniconda/.condarc
                          /root/.condarc
          conda version : 24.11.2
    conda-build version : not installed
         python version : 3.10.16.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=cascadelake
                          __conda=24.11.2=0
                          __cuda=12.2=0
                          __glibc=2.35=0
                          __linux=6.1.100=0
                          __unix=0=0
       base environment : /opt/miniconda  (writable)
      conda av data dir : /opt/miniconda/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /opt/miniconda/pkgs
                          /root/.conda/pkgs
       envs directories : /opt/miniconda/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/24.11.2 requests/2.31.0 CPython/3.10.16 Linux/6.1.100+ ubuntu/22.04.4 glibc/2.35 solver/libmamba conda-libmamba-solver/24.11.1 libmambapy/2.0.5
                UID:GID : 0:0
             netrc file : None
           offline mode : False
@bschindler bschindler added the bug Something isn't working label Jan 8, 2025
@mgorny
Copy link
Contributor

mgorny commented Jan 8, 2025

It's not up to me, but I'm going to be the devil's advocate here:

it is that semver cannot parse this string

I don't understand why the semver package would be relevant here. Python packages follow https://packaging.python.org/en/latest/specifications/version-specifiers/#public-version-identifiers. For example, packaging can be used to parse Python version identifiers:

>>> packaging.version.Version(importlib.metadata.version("torch"))
<Version('2.5.1.post108')>

we ran into a case where the package requirements were torch >2.4,<=2.4.1

I would suggest reporting that anyway, since <= deps are generally a bad idea — they probably wanted <2.5 instead, or <2.4.2 if they really insist.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Jan 8, 2025

we can try to align with the versions reported in the pip packages.

The CPU version is

In [1]: import torch
In [2]: torch.__version__
'2.5.1+cpu'

The Cuda version is

In [1]: import torch
In [2]: torch.__version__
'2.5.1+cu124'

These seem to be semver parsable.

But.... I agree that semver parsability is not something we strive for. But in the case, "matching upstream behavior". In this case they are "aligned"

so we would have

'2.5.1+cpu'
and
'2.5.1+cu126'

but.... i swear i tried to match pypi behavior when i first saw.

So as next steps:

  • We can try to match upstream behavior 2.5.1. a PR is appreciated if you want to investigate it @bschindler
  • I don't know if we are going to go back to build the 2.4.X releases. We generally don't build too many historically releases at conda-forge

We could.... but we would need a champion to investigate this on various platforms since pytorch compilation behavior has been hard to predict.

@mgorny
Copy link
Contributor

mgorny commented Jan 8, 2025

For the record, I can play with it later this week, if you want me to. If we were to go back to 2.4.x, it would probably make sense to backport the libtorch_python.so symlink fix too, as requested somewhere else.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Jan 8, 2025

Those two small fixes would be good to bundle together.

@bschindler
Copy link
Author

The ask about libtorch_python.so was also from myself. I have two (rather ugly) workarounds for both, so I don't rely on you guys backporting the fix.

But.... I agree that semver parsability is not something we strive for. But in the case, "matching upstream behavior". In this case they are "aligned"

Thinking about this, I agree. But to be fair, this package was the very first package to fail the semver test, and our environment is rather large with lots of exotic packages.

So lets rephrase to "matching upstream behavior" and I guess that would be useful.

@h-vetinari
Copy link
Member

I created a v2.4.x branch. It's about 150 commits behind main, so... let's limit the cherry-picking to fixes, and avoid backporting features

@mgorny
Copy link
Contributor

mgorny commented Jan 9, 2025

Here's the fix:

diff --git a/recipe/build.sh b/recipe/build.sh
index 3bf2963..9593991 100644
--- a/recipe/build.sh
+++ b/recipe/build.sh
@@ -63,7 +63,9 @@ CMAKE_FIND_ROOT_PATH+=";$SRC_DIR"
 unset CMAKE_INSTALL_PREFIX
 export TH_BINARY_BUILD=1
 export PYTORCH_BUILD_VERSION=$PKG_VERSION
-export PYTORCH_BUILD_NUMBER=$PKG_BUILDNUM
+# Always pass 0 to avoid appending ".post" to version string.
+# https://github.com/conda-forge/pytorch-cpu-feedstock/issues/315
+export PYTORCH_BUILD_NUMBER=0
 
 export INSTALL_TEST=0
 export BUILD_TEST=0
diff --git a/recipe/bld.bat b/recipe/bld.bat
index 30cc5d4..85d8fd2 100644
--- a/recipe/bld.bat
+++ b/recipe/bld.bat
@@ -2,7 +2,9 @@
 
 set TH_BINARY_BUILD=1
 set PYTORCH_BUILD_VERSION=%PKG_VERSION%
-set PYTORCH_BUILD_NUMBER=%PKG_BUILDNUM%
+rem Always pass 0 to avoid appending ".post" to version string.
+rem https://github.com/conda-forge/pytorch-cpu-feedstock/issues/315
+set PYTORCH_BUILD_NUMBER=0
 
 if "%pytorch_variant%" == "gpu" (
     set build_with_cuda=1

It's a bit weird to be changing that, given that it's been clearly intended to be used like that upstream. I'm not making a PR not to trigger another CI run, @h-vetinari can probably merge it into one of the existing PRs before merging (or retrying CI again).

With it applied, I can confirm:

>>> importlib.metadata.version("torch")
'2.5.1'

@mgorny
Copy link
Contributor

mgorny commented Jan 9, 2025

(edited the diff to add Windows change as well)

@h-vetinari
Copy link
Member

Thanks a lot for figuring this out! I'd suggest to pick this into the next PR, I'm hoping current CI in #305 passes and we can merge that as-is.

@hmaarrfk
Copy link
Contributor

Do we want to instead match upstream's crazy +cpu and +cu160??

@rgommers
Copy link

The patch above to avoid .postxxx seems most important.

The local version specifiers +cpu/+cu160 don't really have much value I think. PyTorch provides public APIs to introspect what hardware the package supports (e.g., torch.cuda.is_available), which should be used rather than those local version specifiers. PEP 440 also recommends that the local version specifiers are not uploaded to a public index server (of course it misses that that is actually necessary to tell wheels apart).

I don't see harm in amending the version strings to match upstream's wheels, but I also don't think it's worth the effort of patching to get to that state.

@hmaarrfk
Copy link
Contributor

I don't see harm in amending the version strings to match upstream's wheels, but I also don't think it's worth the effort of patching to get to that state.

Agreed.

@h-vetinari
Copy link
Member

The patch above to avoid .postxxx seems most important.

This is part of #316, and is only waiting for the open-gpu server to come back online.

@h-vetinari
Copy link
Member

Closing as fixed as part of #316

@hmaarrfk
Copy link
Contributor

A PR with the associated fixes targetted to the v2.4.x branch will be considered.

@mgorny
Copy link
Contributor

mgorny commented Jan 15, 2025

Sorry, does that mean you'll be doing it or you will consider accepting it when somebody else does? I.e. should I add it to my TODO? :-)

@hmaarrfk
Copy link
Contributor

that mean you'll be doing it

no.

you will consider accepting it

yes

I.e. should I add it to my TODO? :-)

I can't answer the "should". But hopefully the answers to the other two questions can guide you on how you spend your own time ^_^

@hmaarrfk
Copy link
Contributor

But I also want to motivate that others can try as well. it doesn't have to be you.

@mgorny
Copy link
Contributor

mgorny commented Jan 15, 2025

Well, I have a build setup, resources and an idea what needs to be done :-).

@mgorny
Copy link
Contributor

mgorny commented Jan 15, 2025

@hmaarrfk, @h-vetinari, I see that 2.4.1 was built for CUDA 11.8 and 12.0. Should I modify the recipe to force building for 12.0 again or just update it for 12.6?

@hmaarrfk
Copy link
Contributor

@hmaarrfk, @h-vetinari, I see that 2.4.1 was built for CUDA 11.8 and 12.0. Should I modify the recipe to force building for 12.0 again or just update it for 12.6?

no opinion

@mgorny
Copy link
Contributor

mgorny commented Jan 15, 2025

I'm going to stay with 12.0 for now, without rerendering — since apparently the branch was somehow made like that.

@h-vetinari
Copy link
Member

h-vetinari commented Jan 15, 2025

I'd build it for 11.8 and 12.6 (if you're asking for my preference)

@hmaarrfk
Copy link
Contributor

without rerendering

Just FYI: sometimes our tooling with break. There are still some part of the build process that will pull in the latest versions. so its not always guaranteed that this works

@jakirkham
Copy link
Member

In conda-forge, we were building with CUDA 12.0 originally to provide the broadest range of support for CUDA 12 minor versions

After discussion in issue ( conda-forge/conda-forge-pinning-feedstock#6630 ), we decided to move to CUDA 12.x where x is latest (currently 12.6)

@h-vetinari
Copy link
Member

h-vetinari commented Jan 15, 2025

... and part of that discussion was that the move to 12.6 doesn't reduce support in any meaningful way (otherwise I wouldn't have suggested it for the 2.4 builds)

@jakirkham
Copy link
Member

Yes. Wasn't trying to move the decision in one way or another

Just got the impression that Michal was confused about why the change occurred. So wanted to provide context

Thank you for filling in the bits I missed above 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants