-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Much faster symlinking, safer interactions with rustc
#386
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,6 +8,7 @@ | |
, rsync | ||
, stdenv | ||
, vendorCargoDeps | ||
, writeShellApplication | ||
, zstd | ||
}: | ||
|
||
|
@@ -39,13 +40,55 @@ let | |
"pnameSuffix" | ||
"stdenv" | ||
]; | ||
|
||
rustcWrapper = writeShellApplication { | ||
name = "rustc-wrapper"; | ||
text = '' | ||
set -euo pipefail | ||
|
||
args=("$@") | ||
|
||
temp_dir=$(mktemp -d) | ||
trap 'rm -rf -- "$temp_dir"' EXIT | ||
|
||
for i in "''${!args[@]}"; do | ||
if [[ "''${args[i]}" == --out-dir=* ]]; then | ||
current_out_dir="''${args[i]#--out-dir=}" | ||
args[i]="--out-dir=$temp_dir" | ||
break | ||
elif [[ "''${args[i]}" == --out-dir ]]; then | ||
current_out_dir="''${args[i+1]}" | ||
args[i+1]="$temp_dir" | ||
break | ||
fi | ||
done | ||
|
||
if [[ -v current_out_dir ]]; then | ||
stderr="$temp_dir/crane_stderr" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why bother with capturing stderr and stdout? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh. Cargo uses it to know when a process finished, instead of waiting for it to fully terminate? :/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thought: But it just occurred to me that it's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One could use https://osterlund.xyz/posts/2018-03-12-interceptiong-functions-c.html A bit involved, but could probably work perfectly and be 100% invisible. Though I don't know if it works on MacOS, and it will only work as long as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's no good ld_preload equivalent on macos which is why i didn't go down that path - and that seems extremely brittle to recompiles. Ideally Nix would let us mount fuse filesystems inside a build - that'd be the 'reliable' solution to this. I felt this traded off nicely between being robust to change but still getting the job done - like - it depends on public apis of rustc rather than private. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I don't see a better way than the current one. |
||
set +e | ||
stdout=$("''${args[@]}" 2>"$stderr") | ||
exit_code=$? | ||
set -e | ||
stderr_text=$(cat "$stderr") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just a thought - if you know out dir ... wouldn't it be faster & easier to delete it beforehand instead of changing it and copying from it? Technically cargo might want to read something from it first... but something tells me it never actually happen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also be a bit worries about these temp paths being written somewhere and then causing rebuilds etc. Instead of making them random, maybe they should be fixed. Eg. always There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I considered this. The problem is that what rustc outputs is dependent on what rustc has been asked to do (tell it to make a dylib? it'll make one!), so it's much more tightly coupled. Whereas the output dir is loosely coupled. Regarding the temp paths being written somewhere - I logicked myself that this should never happen. In particular, compilers put a lot of energy into being reproducible - it's dodgy if the output is dependent on the path the output is to be written to. |
||
rm "$stderr" | ||
cp -r "$temp_dir/." "$current_out_dir" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
echo -n "$stdout" | ||
echo -n "$stderr_text" >&2 | ||
exit $exit_code | ||
else | ||
exec "$@" | ||
fi | ||
''; | ||
}; | ||
in | ||
chosenStdenv.mkDerivation (cleanedArgs // { | ||
inherit cargoArtifacts; | ||
|
||
pname = "${args.pname or crateName.pname}${args.pnameSuffix or ""}"; | ||
version = args.version or crateName.version; | ||
|
||
RUSTC_WRAPPER="${rustcWrapper}/bin/rustc-wrapper"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't even know crane has functionality like this and I was using my wrapper myself anyway. It seems to me like an unnecessary functionality to expose to the user. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While OK for PoC use, I think a proper wrapper should be handled by crane internally:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm strongly opposed to the forced usage of (not to mention that plenty of builds work just fine without needing this workaround, so it's the exceptional builds which should opt into different behavior than forcing it on everyone) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack - I can handle the wrapper internally (and then also wrap with the user's wrapper). Like, it should be easily possible to wrap twice. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still concerned that this would be too brittle to maintain at the library level (you can set a rustc-wrapper in |
||
|
||
# Controls whether cargo's `target` directory should be copied as an output | ||
doInstallCargoArtifacts = args.doInstallCargoArtifacts or true; | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,10 +50,9 @@ inheritCargoArtifacts() { | |
# being built (since changing `Cargo.lock` would rebuild everything anyway), which makes them | ||
# good candidates for symlinking (esp. since they can make up 60-70% of the artifact directory | ||
# on most projects). Thus we ignore them when copying all other artifacts below as we will | ||
# symlink them afterwards. Note that we scope these checks to the `/deps` subdirectory; the | ||
# workspace's own .rlib and .rmeta files appear one directory up (and these may require being | ||
# writable depending on how the actual workspace build is being invoked, so we'll leave them | ||
# alone). | ||
# symlink them afterwards. Note that we scope these checks to the `/deps` subdirectory, as | ||
# this directory contains only rustc output. We have a rustc wrapper which ensures that any | ||
# writes to these symlinks do not try to change the (immutable) symlink targets. | ||
# | ||
# NB: keep the executable bit only if set on the original file | ||
# but make all files writable as sometimes read-only files will make the build choke | ||
|
@@ -66,27 +65,15 @@ inheritCargoArtifacts() { | |
--times \ | ||
--chmod=u+w \ | ||
--executability \ | ||
--exclude 'deps/*.rlib' \ | ||
--exclude 'deps/*.rmeta' \ | ||
--exclude '*/deps/*' \ | ||
--exclude '.cargo-lock' \ | ||
"${preparedArtifacts}/" \ | ||
"${cargoTargetDir}/" | ||
|
||
local linkCandidates=$(mktemp linkCandidatesXXXX.txt) | ||
find "${preparedArtifacts}" \ | ||
'(' -path '*/deps/*.rlib' -or -path '*/deps/*.rmeta' ')' \ | ||
-printf "%P\n" \ | ||
>"${linkCandidates}" | ||
|
||
# Next create any missing directories up front so we can avoid redundant checks later | ||
cat "${linkCandidates}" \ | ||
| xargs --no-run-if-empty -n1 dirname \ | ||
| sort -u \ | ||
| (cd "${cargoTargetDir}"; xargs --no-run-if-empty mkdir -p) | ||
|
||
# Lastly do the actual symlinking | ||
cat "${linkCandidates}" \ | ||
| xargs -P ${NIX_BUILD_CORES} -I '##{}##' ln -s "${preparedArtifacts}/##{}##" "${cargoTargetDir}/##{}##" | ||
|
||
symlinks="${cargoArtifacts}/symlinks.tar" | ||
if [[ -f "${symlinks}" ]]; then | ||
tar -xf "${symlinks}" -C "${cargoTargetDir}" --strip-components=1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would still compress. Probably lots of redundancy in these paths and that point CPU use will pay for itself in saved IO. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: Now that everything is a symlink, but the symlink is invisible to Nix because it's in a tar archive, what is going to prevent Nix from GCing the step being linked to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dpc's observation here is correct. Nix will perform a naive scan of the results for any store paths, but will not notice them if they are behind a tarball, meaning a store GC would end up invalidating references. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However, when compressed, I'd expect it's hard for nix to find what it's looking for, hence the explicit choice to not compress. Does that make sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That might work, but I wonder: would compressing + putting 1 symlink somewhere in the output (outside of what is copied to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it'd definitely work - mostly was going for simplicity vs optimality. A 345MB deps dir on my machine ends up with a 900kB symlinks.tar. Might be more clean to explicitly add a symlink though from the perspective of making the purpose clear. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as there is a single line somewhere in the output (which can be a separate text file and not necessarily the tarball) which includes |
||
fi | ||
fi | ||
else | ||
echo unable to copy cargo artifacts, \"${preparedArtifacts}\" looks invalid | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
I don't agree with changing the defaults here because
cargoClippy
is intended to be allow chaining with other builds by defaultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed this to be a large slowdown in practice - specifically around copying and patching interpreter paths in the output. My intuition was that folks are using cargoClippy as a linting task, and so like cargoDoc in the last version, it was an oversight which would primarily affect new users.
What would cargoClippy chaining look like for you? Is there an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree that cargo clippy should be a a leaf node 99% of the time, if someone has a good reason to run it other way it should work. It technically can produce artifacts (same as
cargo check
iirc).If I read the code here right this change doesn't just change the default, but hardcodes
false
.Probably best as a separate PR with changelog info about change in behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep if you'd like to propose a different default value here feel free to submit a separate PR for it (then we can at least have a more focused discussion on it!)