Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent nix/store references in static builds #3194

Merged

Conversation

wolfgangwalther
Copy link
Member

@wolfgangwalther wolfgangwalther commented Jan 28, 2024

This adds a check to confirm that the static executable we build via nix does not have any references to /nix/store anymore. This currently fails with paths to the following derivations, as noted in #3169 (comment):

# haskell
/nix/store/j9pp5jp32z3mp84p2fsnwdg4si7qgm0z-postgrest-static-x86_64-unknown-linux-musl-11.2.0
/nix/store/a35zz3304b1fzv4613mfr5lwgz1n0wnj-warp-static-x86_64-unknown-linux-musl-3.3.25
/nix/store/icwvcmmdblflirrnbkhs73hlr2z1ix0b-HTTP-static-x86_64-unknown-linux-musl-4000.4.1

# non-haskell
/nix/store/mkig2hb4ksyk0j006rwyfq8b84nhh2pb-libpq-x86_64-unknown-linux-musl-16.1/etc/postgresql
/nix/store/bf84fd2c0kdr70chri3d21mn8za1ab8n-libkrb5-static-x86_64-unknown-linux-musl-1.20.2
/nix/store/qk1hy5cv7fjf44g3xyw8d94imvhl04xf-openssl-static-x86_64-unknown-linux-musl-3.0.12-etc
/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-openssl-static-x86_64-unknown-linux-musl-3.0.12

Most of those were introduced in 259d97a and c94aa9c - but the openssl-etc reference was there before, already. Before the switch to pkgsStatic, this didn't blow up closure size that much, so went unnoticed.

But even before, the references that OpenSSL was storing to the sysconfdir (/etc) would not have worked on a non-nix system. This should be possible to fix with a --sysconfdir configure flag. Not sure, yet, about the other openssl reference, which has already been replaced by nixpkgs' remove-references-to.

The libpq reference is just an oversight when I introduced the new libpq derivation which I fixed in the second commit. The libkrb5 references can probably be solved similarly - we just didn't build with GSSAPI support, so that's new.

This leaves us with three haskell dependencies - and at first glance, this seems quite random. The thing here is, that in the new executable, the auto-generated Paths_xxx modules for those packages are included. Those three are likely the only dependencies we use that have such a module and that's why they show up. IIRC those Paths modules are actually Template Haskell - and should not be referenced in the final bundle at all. The fact they show up means that GHC is linking way too much stuff into our executable now. The unpacked size of the build artifact is a whopping 51 MB for it now - compared to 17 MB before the switch.

How what happened here and why? I don't exactly know, yet, but looking at the last CI run before the switch at https://github.com/PostgREST/postgrest/actions/runs/7660083430 we have:

  • A statically linked artifact for linux of around 5 MB.
  • A dynamically linked artifact for macos of around 3-4 MB.

Those sizes are zipped, but correspond to the sizes mentioned above. Now, those two seem to match up nice: We expect the dynamic build to be smaller than the static build, right?

But note that we also have zipped sizes of:

  • 13 MB for freebsd
  • 17 MB for ubuntu-aarch64
  • 13 MB for ubuntu-x64
  • 14 MB for windows

Those are very similar to the new static build, which is around 12-13 MB zipped: https://github.com/PostgREST/postgrest/actions/runs/7670588137. In fact, those dynamically linked builds also contain all the Paths modules, just without a /nix/store path, because they were build outside. Example:

/home/runner/work/postgrest/postgrest/.stack-work/install/x86_64-linux-tinfo6/7fb34a25383af7e6721ce2222081f7f4af7b1ed5f3f45443771c11e72a32092d/9.4.5/etc

This means, that the new static build didn't "break" this in an isolated way, but rather:

  • The macos and static-haskell-nix builds did "something right".
  • All the other builds are missing something.

Interestingly, the dynamic package build via nix-build -A postgrestPackage (not exposed as artifcat anywhere) is also small.

I have not been able to find out exactly why this happens, but here are my observations and ideas:

  • For any haskell dependency we always get a .so file, in the static overlay we also get a .a file.
  • Because nix enables --enable-split-sections, we also get a bunch of .hi, .p_hi and dyn_hi files - one for each module a package exposes.
  • GHC actually links the haskell dependencies statically anyway - even in the very small MacOS case. The difference between the static and dynamic builds is only in the non-haskell libs and how they are linked.

My best guess is that somehow GHC looses the ability to use the .hi files directly, so that it only links in the code we need. Instead we seem to link in all of the .a files, so basically each package in full.

@wolfgangwalther
Copy link
Member Author

So I tried to enable split-sections on stack and cabal builds:

  • For linux and freebsd, this worked nicely: The dynamic executables built for that platform are now a lot smaller (5-6 MB zipped vs. 13 MB before).
  • MacOs gives us a warning (which is then treated as an error): -split-sections is not useful on this platform since it always uses subsections via symbols. Ignoring.. This explains why the macos builds have always been small.
  • Windows build errors out with this setting. Not surprising, because everywhere I read about split-sections, it mentions that this only really works on Linux.
  • Linux ARM is not build in the PR, but it should be smaller as well. Tested locally via cabal and it worked (at least on x64).

This means I will try to make the split-sections setting depend on the OS type. And then I will have to look into pkgsStatic and why it doesn't work there as expected.

@wolfgangwalther
Copy link
Member Author

This means I will try to make the split-sections setting depend on the OS type.

Seems hard to do with stack.yaml. A better option might be to not build the dynamic ubuntu executable with stack anymore, but take one of the cabal builds that run anyway instead. Then, cabal will build with -split-sections and stack without.

But even before, the references that OpenSSL was storing to the sysconfdir (/etc) would not have worked on a non-nix system. This should be possible to fix with a --sysconfdir configure flag. Not sure, yet, about the other openssl reference, which has already been replaced by nixpkgs' remove-references-to.

The libpq reference is just an oversight when I introduced the new libpq derivation which I fixed in the second commit. The libkrb5 references can probably be solved similarly - we just didn't build with GSSAPI support, so that's new.

All of those are now solved. The "only" references remaining are the haskell libraries via Paths_ modules.

@steve-chavez
Copy link
Member

A better option might be to not build the dynamic ubuntu executable with stack anymore, but take one of the cabal builds that run anyway instead.

Yeah, that LGTM. We don't publish that ubuntu executable anyway. IIRC it's just for verifying stack works.

@wolfgangwalther
Copy link
Member Author

Yeah, that LGTM. We don't publish that ubuntu executable anyway. IIRC it's just for verifying stack works.

Ah, right. That makes it easy. I removed the ubuntu executable from the list of artifacts for each MR, no need to store it there, then.

At the same time, I also re-enabled the freebsd executable as part of a release, which was disabled because of Cirrus timeouts? I hope that this works again.

@wolfgangwalther
Copy link
Member Author

At the same time, I also re-enabled the freebsd executable as part of a release, which was disabled because of Cirrus timeouts? I hope that this works again.

And... immediately the first freebsd build times out. Well the job to fetch the executable from cirrus times out - and that's very likely because this job runs before the cirrus CI job even exists. The latest commit adds a fix to the fetch script, which should deal with this case properly.

This makes the static build fail in case any references to the nix store
are left over. Those will increase the closure size of the nix derivation
massively and lead to a huge docker image.

At the same time, those references will not be functional on non-nix systems,
to which the static executable is distributed, anyway.
By passing -split-sections to all dependencies, GHC will link only the
modules we actually use and not the full package for each dependency.
This does neither work on MacOS nor Windows, thus we don't do it for
stack right now.

Stripping unused symbols in CI will further decrease the size of those files.
This is not used in the release process anyway, because we are using the static build.
This was temporarily disabled, because of timeouts in Cirrus. This seems to work well again.
@wolfgangwalther
Copy link
Member Author

Turns out that, for pkgsStatic and all other pkgsCross variations, the ghc-options passed to configure never make it to GHC. I opened an issue in nixpkgs about that and also a pull request to work around it.

For now, we'll need to apply the patch in my pull request to use --enable-split-sections - then this should work.

…aller again

This works around NixOS/nixpkgs#286285 to use -split-sections
in a cross-compiling scenario. This will reduce the size of the static executable and also
remove the remaining references to /nix/store/.. reducing closure size dramatically.

This also fixes the docker image blowing up in size since we switched to pkgsStatic.
@wolfgangwalther wolfgangwalther marked this pull request as ready for review February 4, 2024 16:39
@wolfgangwalther
Copy link
Member Author

This worked nicely, the docker image is now around 5 MB again.

The prepopulate nix job is currently running out of diskspace, though:

2024-02-04T18:48:13.4890376Z dist/build/libHSCabal-syntax-3.10.2.0-26Pj78yn6nj2QTg3vbuRfV.a: copyFile: resource exhausted (No space left on device)
2024-02-04T18:48:13.5285917Z ##[warning]You are running out of disk space. The runner will stop working when the machine runs out of disk space. Free space left: 94 MB

This is because we are re-building the whole dependency chain for haskell packages at once in this PR. I will have to look into setting up the cachix token again and push this from my local machine where it is already built.

@wolfgangwalther
Copy link
Member Author

Ofc, I can't seed the macos job locally. The other test failures seem to be random github actions failures. I now pushed the branch to the postgrest repo instead of my fork and started a manually triggered run of the Cachix workflow to seed everything: https://github.com/PostgREST/postgrest/actions/runs/7785206895

@wolfgangwalther
Copy link
Member Author

I now pushed the branch to the postgrest repo instead of my fork and started a manually triggered run of the Cachix workflow to seed everything

Seems like I will need to fix the "Seed Cachix" workflow first. Just building the postgrest-push-cachix tool forces the MacOS job to rebuild everything.. which defeats the purpose of the staged build & push the seed job is supposed to provide.

@wolfgangwalther
Copy link
Member Author

Loadtest is still failing, but that's unrelated. Everything else passed after seeding the cache. Merging.

@wolfgangwalther wolfgangwalther merged commit ea02729 into PostgREST:main Feb 6, 2024
33 of 35 checks passed
@wolfgangwalther wolfgangwalther deleted the no-static-references branch February 6, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants