-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT] clang -fuse-ld=lld
output is low priority
#102000
Conversation
Fixes: dotnet#101727 Context: 410aa0a Context: https://github.com/dotnet/msbuild/blob/1c3b240ce7417223672c62862a6ff7e884e6997a/src/Tasks/Exec.cs#L385 Context: https://github.com/dotnet/msbuild/blob/1c3b240ce7417223672c62862a6ff7e884e6997a/src/Shared/TaskLoggingHelper.cs#L1380 Imagine you want to build a NativeAot library for linux-bionic-arm64: % dotnet new classlib -n HelloBionicSharedLib % cd HelloBionicSharedLib % dotnet publish -c Release -r linux-bionic-arm64 -p:PublishAotUsingRuntimePack=true -p:PublishAot=true …*but your environment is wrong*. Because your environment is wrong, it will error out. (It *should* error out!) Are the generated errors *useful*? % dotnet publish -c Release -r linux-bionic-arm64 -p:PublishAotUsingRuntimePack=true -p:PublishAot=true … clang : error : invalid linker name in argument '-fuse-ld=lld' $HOME/.nuget/packages/microsoft.dotnet.ilcompiler/8.0.4/build/Microsoft.NETCore.Native.Unix.targets(236,5): error : Symbol stripping tool ('llvm-objcopy' or 'objcopy') not found in PATH. Try installing appropriate package for llvm-objcopy or objcopy to resolve the problem or set the StripSymbols property to false to disable symbol stripping. There are two errors here: one from `clang`, and one from the `microsoft.dotnet.ilcompiler` NuGet package. The first one from `clang` is useless ("what does it mean?! how do I fix it?!"), and the latter message is the *useful* error message. Additionally, the first error *shouldn't even be emitted*! It comes from commit 410aa0a, which attempts to get the `lld` version: clang -fuse-ld=lld -Wl,--version This check is supposed to be *optional*; if it fails, that's "fine."" The wrapping `<Exec/>` captures stdout and the error code, and wants the output to be Low priority so that it isn't be shown by default. The problem is that it wasn't *really* fine; if there is no `lld`, then the above command will fail: % clang -fuse-ld=lld -Wl,--version [stderr] clang: error: invalid linker name in argument '-fuse-ld=lld' [exit code is 1] and the [default "warning error format"][0] checks for the strings `error` and `warning`. Meaning *because* `clang -fuse-ld=lld -Wl,--version` writes a string containing `error` to stderr, that is converted into an MSBuild error. Because this invocation is supposed to be optional, update the `<Exec/>` to set [`Exec.IgnoreStandardErrorWarningFormat`][1]=true and [`Exec.StandardErrorImportance`][2]=Low: <Exec Command=""$(CppLinker)" -fuse-ld=lld -Wl,--version" … IgnoreStandardErrorWarningFormat="true" StandardErrorImportance="Low" /> This ensures that the error messages from `clang` are *not* reported as errors, removing the unnecessary error message from the build. Review the other `<Exec/>` invocations and set `IgnoreStandardErrorWarningFormat="true"` and `StandardErrorImportance="Low"` when `Exec.IgnoreExitCode`=true or `Exec.StandardOutputImportance`=Low. [0]: https://github.com/dotnet/msbuild/blob/1c3b240ce7417223672c62862a6ff7e884e6997a/src/Shared/CanonicalError.cs#L77 [1]: https://learn.microsoft.com/dotnet/api/microsoft.build.tasks.exec.ignorestandarderrorwarningformat?view=msbuild-17-netcore [2]: https://learn.microsoft.com/dotnet/api/microsoft.build.utilities.tooltask.standarderrorimportance?view=msbuild-17-netcore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This check is not optional. It is needed to ensure that lld versions > 12 includes linker script to get a successful build. If this PR is enabling cross-building for linux-bionic on macOS, then we need to treat it as such; "enabler" PR. First, these errors are real and they are supposed to fail the build fast (early on). Lets examine it:
Since clang 13 on linux and freebsd, we need sections.ld version script to keep symbols like __modules. Without it, we were running into these errors: #84493.
Because it's only logical to ensure we have a working stripping tool when StripSymbols is enabled. You can turn off StripSymbols in your project if you don't have the stripping tool on the build machine. I think we should keep the current behavior as we generally do not support cross-os-build (but only the cross-arch-build) for NativeAOT. linux-bionic is an exception because Android NDK provides a complete toolchain (compiler, libs and the whole nine) and we just invoke NDK toolchain. So we should figure out why it's not working as expected on macOS. You always have option to use cross-compile on linux in your setup (we have several docker images are available to give you a head start). |
Is it possible that some of these checks should be blocking and some shouldn't be? For instance, <Exec Command="command -v "$(Xcrun)"" IgnoreExitCode="true" StandardOutputImportance="Low" StandardErrorImportance="Low" IgnoreStandardErrorWarningFormat="true" Condition="'$(_IsiOSLikePlatform)' == 'true'">
<Output TaskParameter="ExitCode" PropertyName="_WhereXcrun" />
</Exec>
<Error Condition="'$(_WhereXcrun)' != '0' and '$(_IsiOSLikePlatform)' == 'true'"
Text="'$(Xcrun)' not found in PATH. Make sure '$(Xcrun)' is available in PATH." /> It doesn't seem like a problem if that first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know when grep
/command -v
would write to stderr? Wondering under what circumstances these are needed.
Otherwise LGTM, thanks!
<Output TaskParameter="ExitCode" PropertyName="_WhereXcrun" /> | ||
</Exec> | ||
<Error Condition="'$(_WhereXcrun)' != '0' and '$(_IsiOSLikePlatform)' == 'true'" | ||
Text="'$(Xcrun)' not found in PATH. Make sure '$(Xcrun)' is available in PATH." /> | ||
|
||
<Exec Command=""$(Xcrun)" --sdk $(_AppleSdkName) --show-sdk-path" IgnoreExitCode="true" StandardOutputImportance="Low" Condition="'$(SysRoot)' == '' and '$(_IsiOSLikePlatform)' == 'true'" ConsoleToMsBuild="true"> | ||
<Exec Command=""$(Xcrun)" --sdk $(_AppleSdkName) --show-sdk-path" IgnoreExitCode="true" StandardOutputImportance="Low" StandardErrorImportance="Low" IgnoreStandardErrorWarningFormat="true" Condition="'$(SysRoot)' == '' and '$(_IsiOSLikePlatform)' == 'true'" ConsoleToMsBuild="true"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little bit suspicious about the pre-existing IgnoreExitCode
here. @ivanpovazan do we want to ignore exit code here or was it just a copy-paste mistake? If we want ignored exit code, the addition looks reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems like a copy-paste mistake (good catch, thanks!).
If this command fails, we shouldn't be continuing the build as SysRoot
has to be resolved when '$(_IsiOSLikePlatform)' == 'true'
.
I suggest we remove IgnoreExitCode="true"
here.
Cross-building for linux-bionic from macOS already works. The top post in this PR describes how this was hit - if you have a misconfigured toolchain (the clang on the PATH is not the clang from the bionic SDK), on a mac you'll hit the codepath where The fix/bug itself is not bionic specific - this codepath is hittable on Linux-targeting-Linux as well. It just happened to be found with misconfigured toolchain on mac targeting bionic. |
@MichalStrehovsky, macOS has ld64 linker which is used by AppleClang. If lld is needed for cross-build, then the stock toolchain needs to be installed from homebrew. It does not hit on linux because we have appropriate conditions in place. Ignoring these errors is not the right thing to do here. These errors were intentional as I give a breakdown in my previous comment. |
This PR is touching more lines than was strictly needed for this bug. The discussion is about this line: runtime/src/coreclr/nativeaot/BuildIntegration/Microsoft.NETCore.Native.Unix.targets Lines 288 to 291 in de4fa45
Currently if clang doesn't understand |
I think we would need: - <Exec Command=""$(CppLinker)" -fuse-ld=lld -Wl,--version" Condition="'$(LinkerFlavor)' == 'lld'" IgnoreExitCode="true" StandardOutputImportance="Low" ConsoleToMSBuild="true">
+ <Exec Command=""$(CppLinker)" -fuse-ld=lld -Wl,--version 2>/dev/null" Condition="'$(LinkerFlavor)' == 'lld'" IgnoreExitCode="true" StandardOutputImportance="Low" ConsoleToMSBuild="true"> to match what we use in other places, e.g. runtime/eng/native/build-commons.sh Line 45 in de4fa45
(it will still captures the exit code) Reminds me of this feature request opened a while back: dotnet/msbuild#7970 to idiomatically filter out selective errors in |
What's the difference between that and adding |
The current error mode of |
We currently have this handling:
Is the bug in |
I think this does not happen because we always get an error in case lld was not located by clang driver, but requested by user either explicitly (or e.g. on FreeBSD, where it's set as default). Yes, this is not the explicit way of handing it and we are relying on MSBuild's error-detection behavior. As it stands in main branch, msbuild will error out.
objcopy has separate conditions which select llvm-objcopy for llvm toolchain and falls back to binutil's objcopy. We had a discussion to enhance this in order to align with our cmake implementation which searches for exact toolchain matching objcopy, e.g. llvm-objcopy-18 (convention followed by all major linux distros) and not just the un-suffixed one on PATH llvm-objcopy (convention followed by our manually compiled toolchain in CBL-mariner/AzureLinux); but our runtime/eng/native/configuretools.cmake Line 60 in 9926d60
At the end it was getting too complicated so we ended up handling it differently by providing ObjCopyName msbuild property so user can set toolchain specific names .
Back on If linux-bionic compilation works with ld64 only choice provided by AppleClang (which is a bit surprising), we can extend the condition from
|
I'll ask this way: what would break if we delete the What we want to fix is for the build to not fail with two different errors at the same time. It looked like we need to suppress error message from a handled error condition, but it looks like the error condition is not actually handled and should fail the build instead. |
Yes, we are not subsequently checking for error and that result in odd looking error message later because it's ultimately a no-go. e.g. on Ubuntu 18 with $ dotnet9 publish -p:PublishAot=true -p:LinkerFlavor=lld | tee
...
wapi1 -> /wapi1/bin/Release/net9.0/linux-arm64/wapi1.dll
clang : error : invalid linker name in argument '-fuse-ld=lld' [/wapi1/wapi1.csproj]
Generating native code
/root/.nuget/packages/microsoft.dotnet.ilcompiler/9.0.0-preview.5.24256.1/build/Microsoft.NETCore.Native.targets(349,96): error MSB4086: A numeric comparison was attempted on "$(_LinkerVersion)" that evaluates to "" instead of a number, in condition "'$(LinkerFlavor)' == 'lld' and '$(_LinkerVersion)' > '12'". [/wapi1/wapi1.csproj] We can either:
|
Sorry I'm a bit late to the discussion. Just to clarify a few points that @am11 already made (I'm only talking about the linker version check, not sure if this applies to the other checks you've touched):
Which of these issues are you intending to solve @jonpryor?
If we only want to solve 1) then I'd suggest removing |
Reading through #101727 I think the main point was about "1. Multiple errors are reported (unavailable lld and some downstream, potentially unrelated, errors)". If we switch this to
(First line has an error message but absolutely zero context about it and it makes it look like the subsequent line is context) To:
This is enough for people to start troubleshooting because the message is now relevant to the failing MSBuild line. I would not go as far as "2. The error message about missing lld is not friendly" because that has other questions like "why don't we do similar detection when the linker is not lld?", "why don't we add detection for other misconfigured toolchains - like trying to target MUSL but using a GLIBC toolchain, or missing libz-dev, or any of the many ways the native toolchain could be misconfigured". We have to rely on an externally supplied toolchain that could be in all sorts of bad state for what the user wants to do. I worry about adding various handling for this because we could introduce bugs in it and things that would compile fine will get blocked because we wanted to go above and beyond with messages and detect things that could be wrong. The message we produce could also be just misleading because we don't really know what's wrong. |
It's better to handle the lld<v13 case explicitly. |
I've extracted the "don't IgnoreExitCode for LLD version check" part to #102278. |
@jonpryor I think we pulled the highest value items. Is this PR still relevant? |
Closing as obsolete for now. Can re-open if there's still work to be done here. |
Fixes: #101727
Context: 410aa0a
Context: https://github.com/dotnet/msbuild/blob/1c3b240ce7417223672c62862a6ff7e884e6997a/src/Tasks/Exec.cs#L385
Context: https://github.com/dotnet/msbuild/blob/1c3b240ce7417223672c62862a6ff7e884e6997a/src/Shared/TaskLoggingHelper.cs#L1380
Imagine you want to build a NativeAot library for linux-bionic-arm64:
…but your environment is wrong. Because your environment is wrong, it will error out. (It should error out!)
Are the generated errors useful?
There are two errors here: one from
clang
, and one from themicrosoft.dotnet.ilcompiler
NuGet package. The first one fromclang
is useless ("what does it mean?! how do I fix it?!"), and the latter message is the useful error message.Additionally, the first error shouldn't even be emitted! It comes from commit 410aa0a, which attempts to get the
lld
version:This check is supposed to be optional; if it fails, that's "fine."" The wrapping
<Exec/>
captures stdout and the error code, and wants the output to be Low priority so that it isn't be shown by default.The problem is that it wasn't really fine; if there is no
lld
, then the above command will fail:and the default "warning error format" checks for the strings
error
andwarning
. Meaning becauseclang -fuse-ld=lld -Wl,--version
writes a string containingerror
to stderr, that is converted into an MSBuild error.Because this invocation is supposed to be optional, update the
<Exec/>
to setExec.IgnoreStandardErrorWarningFormat
=true andExec.StandardErrorImportance
=Low:This ensures that the error messages from
clang
are not reported as errors, removing the unnecessary error message from the build.Review the other
<Exec/>
invocations and setIgnoreStandardErrorWarningFormat="true"
andStandardErrorImportance="Low"
whenExec.IgnoreExitCode
=true orExec.StandardOutputImportance
=Low.