-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cudaPackages.cudatoolkit: use nix-built dependencies to avoid spurious failures #224646
Comments
I was wondering if I accidentally dropped some bash that might have been removing these libraries, but looking at a pre-merge revision, we have always had them: ❯ nix build --impure --expr '(import (builtins.getFlake github:NixOS/nixpkgs/c819f0adc75eccb71d95df4bd7c23c06471ccf43) { config.cudaSupport = true; config.allowUnfree = true; }).cudaPackages.cudatoolkit'
❯ ls result/host-linux-x64/
CrashReporter libboost_iostreams.so.1.70.0 libDevicePropertyProto.so libInterfaceSharedLoggers.so libQt5Core.so.5 libQt5Qml.so.5 libQt5WebEngineCore.so.5 libstdc++.so.6 nsys-ui.png
ImportNvtxt libboost_program_options.so.1.70.0 libDeviceProperty.so libInterfaceShared.so libQt5DBus.so.5 libQt5QuickParticles.so.5 libQt5WebEngine.so.5 libStreamSections.so NVIDIA_SLA.pdf
libAgentAPI.so libboost_python35.so.1.70.0 libexec libLinuxPerf.so libQt5DesignerComponents.so.5 libQt5Quick.so.5 libQt5WebEngineWidgets.so.5 libSymbolAnalyzerLight.so nvlog.config.template
libAnalysisData.so libboost_regex.so.1.70.0 libexporter.so libnvlog.so libQt5Designer.so.5 libQt5QuickTest.so.5 libQt5Widgets.so.5 libSymbolDemangler.so Plugins
libAnalysisProto.so libboost_serialization.so.1.70.0 libGenericHierarchy.so libNvQtGui.so libQt5Gui.so.5 libQt5QuickWidgets.so.5 libQt5X11Extras.so.5 libTimelineAssert.so python
libAnalysis.so libboost_system.so.1.70.0 libGpuInfo.so libpapi.so.5 libQt5Help.so.5 libQt5Script.so.5 libQt5XcbQpa.so.5 libTimelineCommon.so QdstrmImporter
libAppLibInterfaces.so libboost_thread.so.1.70.0 libGpuTraits.so libpfm.so.4 libQt5MultimediaQuick.so.5 libQt5ScriptTools.so.5 libQt5XmlPatterns.so.5 libTimelineUIUtils.so reports
libAppLib.so libboost_timer.so.1.70.0 libicudata.so.56 libProcessLauncher.so libQt5Multimedia.so.5 libQt5Sensors.so.5 libQt5Xml.so.5 libTimelineWidget.so ResolveSymbols
libAssert.so libCommonProtoServices.so libicui18n.so.56 libProtobufCommClient.so libQt5MultimediaWidgets.so.5 libQt5Sql.so.5 libQtPropertyBrowser.so libz.so resources
libboost_atomic.so.1.70.0 libCommonProtoStreamSections.so libicuuc.so.56 libProtobufCommProto.so libQt5Network.so.5 libQt5Svg.so.5 libsqlite3-shared.so libz.so.1.2.7 rules
libboost_chrono.so.1.70.0 libCore.so libInjectionCommunicator.so libProtobufComm.so libQt5OpenGL.so.5 libQt5Test.so.5 libSshClient.so Mesa Scripts
libboost_container.so.1.70.0 libcrypto.so libInterfaceData.so libprotobuf-shared.so libQt5Positioning.so.5 libQt5WaylandClient.so.5 libssh.so nsys-ui sqlite3
libboost_date_time.so.1.70.0 libcrypto.so.1.1 libInterfaceSharedBase.so libQt5Charts.so.5 libQt5PrintSupport.so.5 libQt5WaylandCompositor.so.5 libssl.so nsys-ui.bin translations
libboost_filesystem.so.1.70.0 libCudaDrvApiWrapper.so libInterfaceSharedCore.so libQt5Concurrent.so.5 libQt5QmlModels.so.5 libQt5WebChannel.so.5 libssl.so.1.1 nsys-ui.desktop.template |
@SomeoneSerge I think these changes are breaking cuda for us. We are using pkgs.linuxPackages_6_1.nvidia_x11
pkgs.cudaPackages_12.cudatoolkit
pkgs.cudaPackages_12.cudatoolkit.lib in this flake and just noticed this because it blocks Error log: error.txt Last couple lines of log:
cc @SpamDoodler |
Hi @aaronmondal! Yup, that's my fault, I should've tested Note that we are trying to get rid of the legacy
Oh 🙃 this is actually intended! Libcuda is the userspace driver which has to go hand in hand with the system's |
@SomeoneSerge Ah thanks for clearing that up! I'll try out the patch later today 😊 @SpamDoodler It's new, its shiny, I want it 😄. I think #224646 (comment) was the insight we needed which explains why our non-local cuda toolchains are so fragile. We should rework our |
@aaronmondal offtopic, but it would seem that you and your colleagues are familiar with Bazel? It is my impression that there's quite a bit of struggle with Bazel in nixpkgs: platform-dependent P.S. I'd have asked on the discord linked on eomii.org, but it insists that I provide a phone number 😅 |
@SomeoneSerge Ahh yeah of course we'd like to help 😊 We're trying to get a solid Nix/Bazel interop to work at this very moment, and also noticed that this is actually fairly hard to get working. Jax is also on the list of libraries we'll want to support well in |
Describe the bug
As seen in #222273, #178440 introduced a regression:
I guess the hotfix is to either remove it or to replace it with a symlink to our own libstdc++
Note that we had not noticed this issue during initial review of #178440 long time before the gcc11->gcc12 update: this must be because the libstdc++ shipped by cudatoolkit was actually compatible with pytorch we were building back then
An
ls
reveals there are many more dependencies incudatoolkit
that could have been symlinks to nix store paths, which probably would have been less dangerous:EDIT 2023-04-06: As a matter of fact it's very likely that they all can be removed: they're probably used by the profiling tools through
$ORIGIN
, and I would bet that we already replace$ORIGIN
with the paths to respective nixpkgs packagesNotify maintainers
CC @NixOS/cuda-maintainers
Expected response
The barest minimum is that we fix the libstdc++ error
The text was updated successfully, but these errors were encountered: