Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate .pyc versions #328

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

mgorny
Copy link
Contributor

@mgorny mgorny commented Jan 19, 2025

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Fixes #319
Fixes #320
Fixes #321
Fixes #327
Fixes #329

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Jan 19, 2025

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe/meta.yaml:

  • ℹ️ The recipe is not parsable by parser conda-souschef (grayskull). This parser is not currently used by conda-forge, but may be in the future. We are collecting information to see which recipes are compatible with grayskull.
  • ℹ️ The recipe is not parsable by parser conda-recipe-manager. The recipe can only be automatically migrated to the new v1 format if it is parseable by conda-recipe-manager.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12939294964. Examine the logs at this URL for more detail.

@h-vetinari
Copy link
Member

Would also be good to address #327 in this PR.

@hmaarrfk hmaarrfk marked this pull request as draft January 19, 2025 23:31
@hmaarrfk
Copy link
Contributor

(draft until all builds are enabled again)

@h-vetinari
Copy link
Member

So the extra CUDA deps on windows seem to work fine 🥳

@mgorny
Copy link
Contributor Author

mgorny commented Jan 20, 2025

Okay, I'll try adding some logic for that here later today.

@mgorny
Copy link
Contributor Author

mgorny commented Jan 20, 2025

Ok, I've decided to instead start by adding error handling to .bat, to see which of the existing commands fail.

By the way, any chance you could give me the power to run CI myself?

@h-vetinari
Copy link
Member

By the way, any chance you could give me the power to run CI myself?

You already should have AFAIU, and I don't have the power to change it further. This should be a matter of Quansight/open-gpu-server#52 plus a sync of the setup to https://github.com/conda-forge/.cirun (which happened), so I'm really not sure where this goes wrong. Pinging @jaimergp & @aktech

One conceivable angle is that the lack of authorization for the windows agents prevents you from starting any job at all (though IIRC you already weren't able to start things before we merged windows support). For that, you could make a PR like conda-forge/.cirun#27

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to add

if %ERRORLEVEL% neq 0 exit 1

to calls that can fail, rather than adding || goto error everywhere. There's no functional difference AFAIK, and although both styles have their pros & cons, the goto-variant is quite rare in conda-forge (a quick search shows 170 feedstocks using goto, while ~1.6 feedstocks use an errorlevel variant).

recipe/bld.bat Outdated Show resolved Hide resolved
@aktech
Copy link

aktech commented Jan 20, 2025

@mgorny You need to add your name here: https://github.com/conda-forge/.cirun/blob/218150c90899c7dfeb771d0bd9b176ca191848f3/.access.yml#L89 like this PR: conda-forge/.cirun#27 and this is subject to @wolfv's approval I believe as they (prefix.dev) are the ones sponsoring the windows machine.

and you need to be in any of these roles for this repo to be able to trigger workflows that involves GPU CI runs. cc @jaimergp

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to need existence tests for the DLLs to be under %LIBRARY_BIN%.

The tests are already there, and just need to be modified, e.g.

- if not exist %SP_DIR%\torch\lib\{{ each_lib }}.dll exit 1 # [win]

(similar below)

@h-vetinari
Copy link
Member

Thanks for the quick response @aktech!

and you need to be in any of these roles for this repo to be able to trigger workflows that involves GPU CI runs.

Ah, perhaps it's just a question of making @mgorny a maintainer here? That's a hypothesis we can test easily (given your involvement here, I'm assuming you have nothing against this @mgorny) 😉

@h-vetinari
Copy link
Member

By the way, any chance you could give me the power to run CI myself?

You already should have AFAIU, and I don't have the power to change it further.

Well, perhaps I did have the power after all: 4ff92b3 ;-)

We'll find out on your next push here!

@mgorny
Copy link
Contributor Author

mgorny commented Jan 20, 2025

Ok, I'll replace || goto :error with if errorlevel checks, then push and take a break.

@h-vetinari
Copy link
Member

h-vetinari commented Jan 20, 2025

We'll find out on your next push here!

Sorry, I was talking about the linux builds. For the windows agents, we'll definitely still need to wait for conda-forge/.cirun#32

@h-vetinari
Copy link
Member

Ok, I'll replace || goto :error with if errorlevel checks

You're probably aware, but just for completeness (and because batch is an insane language): if errorlevel is a very different construct from if %errorlevel%. In particular, the more "advanced" relational operators like NEQ are not supported; more details

@mgorny
Copy link
Contributor Author

mgorny commented Jan 20, 2025

Ah, okay, sorry. Either case, I really need to take a break now, so maybe it'll be done when I get back.

@h-vetinari
Copy link
Member

Ah, okay, sorry. Either case, I really need to take a break now, so maybe it'll be done when I get back.

No worries. Perhaps until then the PR is merged. ;-)

When you do get back, please squash f9fe7bf & ad726bc to avoid the back-and-forth.

@h-vetinari h-vetinari force-pushed the more-win branch 2 times, most recently from 2934eba to d569bbb Compare January 23, 2025 03:12
@h-vetinari
Copy link
Member

Well, that last commit to see the internals of the failed call to ninja was a wash. Not sure if this is just missing a flush, but in any case, it's not obvious where this is going wrong. I looked at the sources of the two failing tests, and it's not obvious what issues they could encounter - certainly it doesn't look like they're using exotic symbols, but I guess we can always run into missing transitive dependencies that require some of the static libs we've removed (if that is even the problem).

@h-vetinari
Copy link
Member

OK, so I downloaded the previous pytorch cpu build, set up the test environment, and was able to run both tests successfully

(test) E:\upstreams\pytorch>python test\test_autograd.py TestAutograd.test_multi_grad_all_hooks
Using C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu as PyTorch extensions root...
Emitting ninja build file C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu\test_autograd_cpp_node\build.ninja...
Building extension module test_autograd_cpp_node...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] cl /showIncludes -DTORCH_EXTENSION_NAME=test_autograd_cpp_node -DTORCH_API_INCLUDE_EXTENSION_H -IE:\miniforge\envs\test\Lib\site-packages\torch\include -IE:\miniforge\envs\test\Lib\site-packages\torch\include\torch\csrc\api\include -IE:\miniforge\envs\test\Lib\site-packages\torch\include\TH -IE:\miniforge\envs\test\Lib\site-packages\torch\include\THC -IE:\miniforge\envs\test\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -c C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu\test_autograd_cpp_node\main.cpp /Fomain.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30158 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

[2/2] "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64/link.exe" main.o /nologo /DLL c10.lib torch_cpu.lib torch.lib /LIBPATH:E:\miniforge\envs\test\Lib\site-packages\torch\lib torch_python.lib /LIBPATH:E:\miniforge\envs\test\libs /out:test_autograd_cpp_node.pyd
   Creating library test_autograd_cpp_node.lib and object test_autograd_cpp_node.exp
Loading extension module test_autograd_cpp_node...
.
----------------------------------------------------------------------
Ran 1 test in 10.331s

OK

(test) E:\upstreams\pytorch>python test\test_custom_ops.py TestCustomOp.test_autograd_function_backed_op
Using C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu as PyTorch extensions root...
Creating extension directory C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu\mylib...
Emitting ninja build file C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu\mylib\build.ninja...
Building extension module mylib...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] cl /showIncludes -DTORCH_EXTENSION_NAME=mylib -DTORCH_API_INCLUDE_EXTENSION_H -IE:\miniforge\envs\test\Lib\site-packages\torch\include -IE:\miniforge\envs\test\Lib\site-packages\torch\include\torch\csrc\api\include -IE:\miniforge\envs\test\Lib\site-packages\torch\include\TH -IE:\miniforge\envs\test\Lib\site-packages\torch\include\THC -IE:\miniforge\envs\test\Include -D_GLIBCXX_USE_CXX11_ABI=0 /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc /std:c++17 -c C:\Users\[...]\AppData\Local\torch_extensions\torch_extensions\Cache\py312_cpu\mylib\main.cpp /Fomain.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30158 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

[2/2] "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64/link.exe" main.o /nologo /DLL c10.lib torch_cpu.lib torch.lib /LIBPATH:E:\miniforge\envs\test\Lib\site-packages\torch\lib torch_python.lib /LIBPATH:E:\miniforge\envs\test\libs /out:mylib.pyd
   Creating library mylib.lib and object mylib.exp
Loading extension module mylib...
.
----------------------------------------------------------------------
Ran 1 test in 12.550s

OK

Then I moved all the DLLs to %LIBRARY_PREFIX%\bin, deleted the torch_extensions cache, and ran it again - also ran fine.

Note though the /LIBPATH:E:\miniforge\envs\test\Lib\site-packages\torch\lib for the linker - IOW, this is where the test expects the .lib files. The reason our windows compilers don't check in %LIBRARY_LIB% by default is that this is only done when building packages (which isn't the case during the test phase). This is in contrast to what our unix compilers do, which also explains why moving everything to $PREFIX/lib works there.

For completeness, I also tried again after deleting the static libraries (not the import libraries for the DLLs), and again these two tests ran fine (so deleting fmt.lib wasn't the issue as I had feared).

I think the nicest approach here would be to modify our compiler activation on windows to look into %LIBRARY_LIB% by default. For that I opened conda-forge/vc-feedstock#90, let's see how that goes. If that is not acceptable for some reason, then the easiest will be to just move the DLLs, and leave the .libs in %SP_DIR%\torch\lib (even if that means we might have to patch downstream packages to look there specifically).

@h-vetinari
Copy link
Member

OK, with conda-forge/vc-feedstock#90, I can now move all the .libs to %LIBRARY_LIB% and the tests still work. However, what didn't work was moving the headers to %LIBRARY_INC%. That's because the torch headers actually have a nested structure:

Lib/site-packages/torch/include/torch/csrc/api/include/torch/all.h
                        ^^^^^^^                ^^^^^^^
                     entry point 1          entry point 2

And just moving torch/csrc/api/include/torch/all.h to %LIBRARY_INC% will not resolve #include <torch/all.h>. Pytorch generally adds both paths to the includes, and so does the test infrastructure.

I don't understand how this works out on unix TBH (where we're moving the headers to $PREFIX/include, but not handling torch/csrc/api/include explicitly AFAICT). Maybe it somehow gets persisted in INTERFACE_INCLUDE_DIRECTORIES?

Anyway, I think I'll try patching this, unless someone has better ideas?

@mgorny
Copy link
Contributor Author

mgorny commented Jan 23, 2025

I don't understand how this works out on unix TBH (where we're moving the headers to $PREFIX/include, but not handling torch/csrc/api/include explicitly AFAICT). Maybe it somehow gets persisted in INTERFACE_INCLUDE_DIRECTORIES?

On Linux, we still have a bunch of headers in ${SP_DIR}/torch/include and symlinks for the directories moved to ${CONDA_PREFIX}/include.

Anyway, I think I'll try patching this, unless someone has better ideas?

Patching sounds good to me. For CMake, I guess changing TORCH_INSTALL_PREFIX should work. However, given that they explicitly repeat it in tests, I'm worried that other Python packages also hardwire these paths.

@h-vetinari
Copy link
Member

symlinks for the directories moved to ${CONDA_PREFIX}/include.

Ah right, forgot the symlink!

For CMake, I guess changing TORCH_INSTALL_PREFIX should work. However, given that they explicitly repeat it in tests, I'm worried that other Python packages also hardwire these paths.

We might need to do both? For now I'd let this run through and see what comes out. Feel free to push more fixes, I'll be offline for the next ~8h.

@mgorny
Copy link
Contributor Author

mgorny commented Jan 23, 2025

I guess it would make sense to remove the symlinks from Linux too, for consistency.

@hmaarrfk
Copy link
Contributor

I’m worried about moving too many files around without patching. I’m too busy in the coming weeks to read through the bottom thread but here it is for reference

pytorch/pytorch#144690

I’ll remind us that the main goal of having the so in our shared library is to avoid conda packaging from giving downstream packages the need to think about whether or not they have to add something to the missing dso whitelist.

@mgorny
Copy link
Contributor Author

mgorny commented Jan 23, 2025

Okay, I'm going not to touch that part and try #330 instead.

Include Python in build env also when building natively.  Otherwise,
we end up with Python 3.13 in build env and Python files are
byte-compiled for both target Python and 3.13.
@h-vetinari
Copy link
Member

Phew, I'm glad this finally passed. #329 still isn't solved, let me reinstate your a514e05.

I'm fine to keep the symlink on linux - it makes life easier, but yeah, the lack of symlinks on windows will probably bite use in a few ways here. I'm mostly keen to get this PR merged and check again what happens when building torchvision etc.

At least for now we've matched the design from the unix side pretty closely (modulo the symlinks) in how the libs are all in %LIBRARY_{LIB,BIN}%, and the ATen, c10, shm, torch headers are in %LIBRARY_INc%, while the rest stays in %SP_DIR%\torch\include.

@mgorny
Copy link
Contributor Author

mgorny commented Jan 23, 2025

Don't you want to enable the remaining Windows builds? I wanted to add removal of TH_BINARY_BUILD too. If that's fine, I'll push it along with rerender to restore all build targets.

@h-vetinari
Copy link
Member

Don't you want to enable the remaining Windows builds?

Of course, but I wanted to solve all the current issues first, and also ask @hmaarrfk and @isuruf for review, if they have time. CC also @danpetry if interested.

I wanted to add removal of TH_BINARY_BUILD too. If that's fine, I'll push it along with rerender to restore all build targets.

How high is your confidence that this will work first try? I think this PR is already doing a lot, and I'd like to get it out so we can start building dependent packages and test out whether things work as intended. I wouldn't be surprised if follow-ups turn out to be necessary anyway, so I'd prefer to put TH_BINARY_BUILD into the next round.

@h-vetinari h-vetinari changed the title Attempt reenabling cuDSS, cuSPARSELt and CUPTI on Windows Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate .pyc versions Jan 23, 2025
@h-vetinari
Copy link
Member

FWIW, #329 is now fixed as well, so I think this is ready (modulo a revert of 3ebf1a9 + a rerender, which I'm holding off on for now, in case we need some more rounds after feedback).

@h-vetinari h-vetinari marked this pull request as ready for review January 23, 2025 22:05
@h-vetinari
Copy link
Member

Nevermind, I mind found an issue with the windows CUDA builds that needs fixing anyway, so pushing a rerendered build, but will cancel everything but the windows+CUDA job. And it'll also be good to see if CI on osx passes with the couple relevant changes here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants