Add "external" ccache
to speed up builds; preserve caches outside Docker container
#83
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SeedSigner OS builds need to be "cleaned" (i.e. delete all build artifacts) when switching to a different build target (e.g. pi0 to pi4). This forces the compiler to rebuild everything from scratch. But in reality, a huge portion of the build artifacts are identical across targets.
ccache
(see: https://ccache.dev) is a system-wide compiler cache that keeps build artifacts in its cache and quickly retrieves them if the exact same source file with the exact same compiler settings is due to be compiled again.BuildRoot already uses its own internal
ccache
at/root/.buildroot-ccache
to speed up builds. But this cache does not persist acrossdocker compose
up/down cycles.This PR adds a second "external"
ccache
to the parts of the build process that are not cached by BuildRoot's "internal"ccache
. My rough understanding is that there are initial steps to build the BuildRoot tooling itself and then that tooling is used to build the specified board target. This PR's new "external"ccache
speeds up the steps to build the BuildRoot tooling.This PR also ensures that both
ccache
caches persist acrossdocker compose
cycles by writing them to a subdir on the host machine.Changes
ccache
apt
dependency to theDockerfile
container setup.docker-compose.yml
so the twoccache
caches can persist acrossdocker compose
cycles.PATH
injection so that everything within themake
process automatically leveragesccache
.Performance Tests
Test machines:
Ryzen 5 5600x
This was a fairly powerful mid- to upper-range workhorse cpu from late 2020. Still competes pretty well against more recent cpus.
6 cores, 12 threads
CPU Mark:
Ryzen 5 PRO 2400GE
This is a low power cpu for mini PC builds from 2018. Still respectable and a great choice for running a node but very weak compared to a modern desktop cpu.
4 cores, 8 threads
CPU Mark:
Results Data
All results are in seconds.
The yellow columns were run against the current
0.8.5-rc1
which only has the built-in BuildRoot ("BR")ccache
.The green columns were run against this PR which adds the "external"
ccache
.The "sequence" list indicates that 8 builds were performed sequentially with the
ccache
(s) accumulating cache data along the way.🚨 If you want to run your own sequence test, see: https://gist.github.com/kdmukai/b82f74bf5ff4f84c60e98f999968c189
The "solo" section lists 4 completely isolated runs, each from a totally empty cache state, to provide a baseline "first build" data set.
Results Analysis
Comparing first build times (totally empty cache(s)), the external
ccache
incurs a modest performance penalty as it does additional work to build its initial cache. The BuildRoot compilation task has some redundant objects, so the BRccache
has a roughly 15% hit rate; it's helping to slightly speed up even an initial build from an empty cache. The externalccache
has almost no hits: ~0.5%.But when we switch to a new build target, the external
ccache
now provides some benefit to that new build: 12-13% speedup. In this phase, the BRccache
yields a 50.6% hit rate and the externalccache
is a whopping 98.6%!And when we do a follow-up rebuild of the same target, the external
ccache
gives us a 16.5-19.1% boost. Not surprisingly, the BRccache
yields a 99.9% hit rate and the externalccache
is at 99.4%.Cache sizes
The BR cache is roughly 570MB for any single build target. And given the 50.6% hit rate from above (half of the BR cache is useful when switching build targets), it's not surprising that:
Which gets us almost exactly to what was observed on disk after building all four targets.
Whereas the external
ccache
and its incredibly high hit rate (98.6%) means that there isn't anything left to add to the cache after the first build. So it stays more or less the same size.Note that while the build results are still deterministic, interestingly the two
ccache
s show some variation between runs. Best guess is that this is random timing happenstance due to parallel threaded compiles withinmake
steps. In fact, the builds on the weaker 2400GE which supports fewer threads have much more consistent (though slightly larger(?)) cache sizes than the 5600X which has 50% more threads.Note that if you haven't changed build targets and are just re-compiling with a different SeedSigner
--app-repo
,--app-branch
, or--app-commit-id
you can build with the--no-clean
switch which still yields the fastest possible rebuild since there's no recompilation whatsoever;ccache
has no effect.--no-clean
build times:Future considerations
These reusable caches could potentially be added to their own repo and pulled by a Github Action to speed up automated CI builds. Probably not a best practice for end-users trying to verify their software via reproducible builds, but would greatly aid testing PRs if CI automatically produced pi0 and pi02w builds of the proposed PR. That would make it easier for more people to test a PR since they wouldn't even need a dev environment. Just have to be careful and clear about not using that image with any real seeds.