-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move libtorch bits to PREFIX & enable cuDSS, cuSPARSELt, CUPTI on win; fix duplicate .pyc
versions
#328
base: main
Are you sure you want to change the base?
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12939294964. Examine the logs at this URL for more detail. |
Would also be good to address #327 in this PR. |
(draft until all builds are enabled again) |
So the extra CUDA deps on windows seem to work fine 🥳 |
Okay, I'll try adding some logic for that here later today. |
Ok, I've decided to instead start by adding error handling to By the way, any chance you could give me the power to run CI myself? |
You already should have AFAIU, and I don't have the power to change it further. This should be a matter of Quansight/open-gpu-server#52 plus a sync of the setup to https://github.com/conda-forge/.cirun (which happened), so I'm really not sure where this goes wrong. Pinging @jaimergp & @aktech One conceivable angle is that the lack of authorization for the windows agents prevents you from starting any job at all (though IIRC you already weren't able to start things before we merged windows support). For that, you could make a PR like conda-forge/.cirun#27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to add
if %ERRORLEVEL% neq 0 exit 1
to calls that can fail, rather than adding || goto error
everywhere. There's no functional difference AFAIK, and although both styles have their pros & cons, the goto-variant is quite rare in conda-forge (a quick search shows 170 feedstocks using goto, while ~1.6 feedstocks use an errorlevel variant).
@mgorny You need to add your name here: https://github.com/conda-forge/.cirun/blob/218150c90899c7dfeb771d0bd9b176ca191848f3/.access.yml#L89 like this PR: conda-forge/.cirun#27 and this is subject to @wolfv's approval I believe as they (prefix.dev) are the ones sponsoring the windows machine. and you need to be in any of these roles for this repo to be able to trigger workflows that involves GPU CI runs. cc @jaimergp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to need existence tests for the DLLs to be under %LIBRARY_BIN%
.
The tests are already there, and just need to be modified, e.g.
pytorch-cpu-feedstock/recipe/meta.yaml
Line 199 in 39b84c3
- if not exist %SP_DIR%\torch\lib\{{ each_lib }}.dll exit 1 # [win] |
(similar below)
Thanks for the quick response @aktech!
Ah, perhaps it's just a question of making @mgorny a maintainer here? That's a hypothesis we can test easily (given your involvement here, I'm assuming you have nothing against this @mgorny) 😉 |
Well, perhaps I did have the power after all: 4ff92b3 ;-) We'll find out on your next push here! |
Ok, I'll replace |
Sorry, I was talking about the linux builds. For the windows agents, we'll definitely still need to wait for conda-forge/.cirun#32 |
You're probably aware, but just for completeness (and because batch is an insane language): |
Ah, okay, sorry. Either case, I really need to take a break now, so maybe it'll be done when I get back. |
2934eba
to
d569bbb
Compare
Well, that last commit to see the internals of the failed call to |
OK, so I downloaded the previous pytorch cpu build, set up the test environment, and was able to run both tests successfully
Then I moved all the DLLs to Note though the For completeness, I also tried again after deleting the static libraries (not the import libraries for the DLLs), and again these two tests ran fine (so deleting I think the nicest approach here would be to modify our compiler activation on windows to look into |
OK, with conda-forge/vc-feedstock#90, I can now move all the
And just moving I don't understand how this works out on unix TBH (where we're moving the headers to Anyway, I think I'll try patching this, unless someone has better ideas? |
On Linux, we still have a bunch of headers in
Patching sounds good to me. For CMake, I guess changing |
Ah right, forgot the symlink!
We might need to do both? For now I'd let this run through and see what comes out. Feel free to push more fixes, I'll be offline for the next ~8h. |
I guess it would make sense to remove the symlinks from Linux too, for consistency. |
I’m worried about moving too many files around without patching. I’m too busy in the coming weeks to read through the bottom thread but here it is for reference I’ll remind us that the main goal of having the so in our shared library is to avoid conda packaging from giving downstream packages the need to think about whether or not they have to add something to the missing dso whitelist. |
Okay, I'm going not to touch that part and try #330 instead. |
Include Python in build env also when building natively. Otherwise, we end up with Python 3.13 in build env and Python files are byte-compiled for both target Python and 3.13.
Phew, I'm glad this finally passed. #329 still isn't solved, let me reinstate your a514e05. I'm fine to keep the symlink on linux - it makes life easier, but yeah, the lack of symlinks on windows will probably bite use in a few ways here. I'm mostly keen to get this PR merged and check again what happens when building torchvision etc. At least for now we've matched the design from the unix side pretty closely (modulo the symlinks) in how the libs are all in |
Don't you want to enable the remaining Windows builds? I wanted to add removal of |
Of course, but I wanted to solve all the current issues first, and also ask @hmaarrfk and @isuruf for review, if they have time. CC also @danpetry if interested.
How high is your confidence that this will work first try? I think this PR is already doing a lot, and I'd like to get it out so we can start building dependent packages and test out whether things work as intended. I wouldn't be surprised if follow-ups turn out to be necessary anyway, so I'd prefer to put |
.pyc
versions
Nevermind, I mind found an issue with the windows CUDA builds that needs fixing anyway, so pushing a rerendered build, but will cancel everything but the windows+CUDA job. And it'll also be good to see if CI on osx passes with the couple relevant changes here. |
…nda-forge-pinning 2025.01.23.20.59.37
This reverts commit fc5693c.
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)Fixes #319
Fixes #320
Fixes #321
Fixes #327
Fixes #329