-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: yet another attempt to add windows builds #231
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe:
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipe:
For recipe:
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe:
|
Both pipelines failed because they ran out of disk space:
What would be the most idiomatic way to solve this issue? |
Try following https://conda-forge.org/docs/maintainer/conda_forge_yml/#azure to clear some disk space. Set this in
and then rerender the feedstock. |
I think there’s little we can do - the Azure free disk space setting is already enabled. I’d try and see if these build locally. Perhaps there is a way to use the Quansight servers for Windows as well, the same way they are used for Linux builds? If not, I guess if there are some volunteers to build these locally then this would be an option - I did that for aarch64 for a while for qt. Conda-forge has a windows server too, but disk space has always been quite restricted there too so it might be a bit of a pain. |
Perhaps cross-compiling Windows from Linux is worth trying? Here is a different feedstock PR that does this ( conda-forge/polars-feedstock#187 ) If we were to use Quansight resources for Windows, being able to run the build on Linux (so cross-compiling) would be very helpful |
Sadly thats already set: pytorch-cpu-feedstock/conda-forge.yml Line 2 in 9e99e03
I assume you mean the runners provided through open-gpu-server by Quantsight and MetroStar? This PR only build the cpu-only version but if we also start building for Cuda I think this is the only possible way forward (let alone for other related repositories like tensorflow). However, the open-gpu-servers don't seem to provide any Windows images. Do you know who I should contact to get the ball rolling?
That would be an option but Id prefer to automate and open-source things as much as possible. Having something hooked up to this repository would be ideal.
The native code of the example you linked is using Rust which makes this much easier. I doubt that this would be easy to achieve with pytorch. |
I also expect another error when actual linking starts. On my local machine that takes at least 16GB of memory. The cuda version will mostly require more. |
If we don't try, we won't know |
Although that is technically true, its already hard enough to build pytorch natively. Adding cross-compilation in the mix seems to me to complicate this even further. Id much rather first focus on getting native builds working. Even if we need to modify the infrastructure to do so. I think having the ability to do resource intensive windows builds would be a huge benefit for the conda-forge ecosystem in general. However, if all else fails cross-compiling seems like a worthwhile avenue to explore. |
One thing to try is to move the build from
You should have roughly 70 GB free on |
Thanks! I added that to the PR. I quickly searched github and it seems c:\bld\ is used more often so I tried that. |
Just make sure that the directory exists and is writeable. Also, you need to rerender for the variable to be set. This comment should trigger the bot. @conda-forge-admin, please rerender |
A bit of history. Back when this feedstock was created 6 years ago, the pytorch officially suggested that people install two distinct packages I personally feel like for windows users, we would HURT their experience to not have a GPU package in 2024. |
Couldnt agree more. I started with CPU only to be able to make incremental progression. My goal is definitely to be able to build the cuda version too! |
well few things:
Typically we "stop" the compilation on the CIs when we reach your stage (seems like it is working OK enough...). |
Hi @baszalmstra @hmaarrfk - do you have any updates on this? It would be amazing to see this happen :)! |
@Tobias-Fischer Im still working on the Cuda builds but its a slow process because it takes ages to build them locally so iteration times are suuuper slow. In parallel we are also looking into getting large Windows runners into the conda-forge infrastructure. |
I got to the testing stage and noticed this: pytorch-cpu-feedstock/recipe/meta.yaml Line 300 in f9fd731
However this seems to always fail with (this is from the logs of the latest release):
Some dependencies are missing. Particularly:
(as can be seen here https://github.com/pytorch/pytorch/blob/6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0/test/run_test.py#L1705) Given that the test is allowed to fail (due to |
The more we fix, the better. If it's really a lot of failures, we might not fix it right away (though depending on the severity of the failures, we might want to think twice about releasing something in that state). In any case, let's leave the testing in, add the required dependencies, and pick up as many fixes as we can. |
So something in my last few commits fixed the builds. Also, my gut feeling that the |
And this PR provides a hint about the likely source of the mkldnn segfaults: pytorch/pytorch#138834 We could either try to use this PR and see if it solves our issues (I am still not well enough versed to understand the OMP implementations here on conda-forge and in this PR), or try to use llvm-openmp instead of intel-openmp already in this PR as that seems to be the source of the conflict (but again, I might be wrong). @isuruf - could you please take a look at pytorch/pytorch#138834 and let me know your opinion? |
I don't understand. What's the issue with using intel-openmp? |
It has somehow resolved after the merge, apologies for the noise! PS: I was trying to run the test suite for pip-installed pytorch on the conda-forge Windows server, and I think I killed it - double apologies :( |
I would suggest to let the dust settle for a while. Having a first PyTorch package on Windows is high-value, the rest is lower-value. CUDA would add more than non-mkl, in case one would like to try a next build config later. The upstream Windows CUDA packages were discussed as candidate for dropping multiple times, since it's a lot of work and not all that relevant to production needs, only for local development (and it's possible to use WSL for that too). |
No additional test failures with the mkldnn tests enabled, new summary:
Ok - let me see how the CUDA mkl build is going. |
For whatever it's worth, it looks fine to me. As pointed out, it's comparable to the number of failures in their pip package. Pip's getting typeerrors while these are all maths accuracy errors, so less critical afaics. |
CUDA+mkl build succeeded - hooray! I've marked this PR as ready for review @conda-forge/pytorch-cpu - looking for any feedback before enabling the full build pipeline. My plan would be to mark the |
# TODO Temporary pin, remove | ||
{% set mkl = "<2025" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is noted as temporary, what's the plan/status here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I’m not sure if the right (compatible with mkl) version of intel-openmp would be pulled in without it. I can test after getting some more feedback, I want to avoid running CI more than needed now.
I'm attempting a merge of this PR and #305 in #316. All the commits here are maintained 1:1, and this PR will show up as merged if/once #316 is merged. 🤞 we get everything passing this time The only question that remains: who writes a blog post about this epic journey? 😛 huge thanks to @baszalmstra @Tobias-Fischer for the work here, and of course to prefix.dev for sponsoring the server!!! 🙏 🥳 |
Happy to write a blog post - very happy to jointly write with others involved @baszalmstra et al. :) |
Amazing! Id be happy to contribute to a blog post! |
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)Fixes #32
This PR is another attempt to add Windows builds (see #134) .
For now I disabled all other builds to be able to test the windows part first. I made this PR draft so we don't accidentally merge it.