Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "external" ccache to speed up builds; preserve caches outside Docker container #83

Open
wants to merge 2 commits into
base: 0.8.5-rc1
Choose a base branch
from

Conversation

kdmukai
Copy link
Contributor

@kdmukai kdmukai commented Jan 9, 2025

SeedSigner OS builds need to be "cleaned" (i.e. delete all build artifacts) when switching to a different build target (e.g. pi0 to pi4). This forces the compiler to rebuild everything from scratch. But in reality, a huge portion of the build artifacts are identical across targets.

ccache (see: https://ccache.dev) is a system-wide compiler cache that keeps build artifacts in its cache and quickly retrieves them if the exact same source file with the exact same compiler settings is due to be compiled again.

BuildRoot already uses its own internal ccache at /root/.buildroot-ccache to speed up builds. But this cache does not persist across docker compose up/down cycles.

This PR adds a second "external" ccache to the parts of the build process that are not cached by BuildRoot's "internal" ccache. My rough understanding is that there are initial steps to build the BuildRoot tooling itself and then that tooling is used to build the specified board target. This PR's new "external" ccache speeds up the steps to build the BuildRoot tooling.

This PR also ensures that both ccache caches persist across docker compose cycles by writing them to a subdir on the host machine.


Changes

  • Add ccache apt dependency to the Dockerfile container setup.
  • Create new volumes in docker-compose.yml so the two ccache caches can persist across docker compose cycles.
  • PATH injection so that everything within the make process automatically leverages ccache.

Performance Tests

Test machines:

Ryzen 5 5600x

This was a fairly powerful mid- to upper-range workhorse cpu from late 2020. Still competes pretty well against more recent cpus.

6 cores, 12 threads
CPU Mark:

  • multithreaded: 21,885
  • single core: 3362

Ryzen 5 PRO 2400GE

This is a low power cpu for mini PC builds from 2018. Still respectable and a great choice for running a node but very weak compared to a modern desktop cpu.

4 cores, 8 threads
CPU Mark:

  • multithreaded: 7,665
  • single core: 2116

Results Data

Screenshot 2025-01-09 at 9 06 12 AM

All results are in seconds.

The yellow columns were run against the current 0.8.5-rc1 which only has the built-in BuildRoot ("BR") ccache.

The green columns were run against this PR which adds the "external" ccache.

The "sequence" list indicates that 8 builds were performed sequentially with the ccache(s) accumulating cache data along the way.

🚨 If you want to run your own sequence test, see: https://gist.github.com/kdmukai/b82f74bf5ff4f84c60e98f999968c189

The "solo" section lists 4 completely isolated runs, each from a totally empty cache state, to provide a baseline "first build" data set.


Results Analysis

Screenshot 2025-01-09 at 9 19 01 AM

Comparing first build times (totally empty cache(s)), the external ccache incurs a modest performance penalty as it does additional work to build its initial cache. The BuildRoot compilation task has some redundant objects, so the BR ccache has a roughly 15% hit rate; it's helping to slightly speed up even an initial build from an empty cache. The external ccache has almost no hits: ~0.5%.

But when we switch to a new build target, the external ccache now provides some benefit to that new build: 12-13% speedup. In this phase, the BR ccache yields a 50.6% hit rate and the external ccache is a whopping 98.6%!

And when we do a follow-up rebuild of the same target, the external ccache gives us a 16.5-19.1% boost. Not surprisingly, the BR ccache yields a 99.9% hit rate and the external ccache is at 99.4%.


Cache sizes

Screenshot 2025-01-09 at 11 53 23 AM

The BR cache is roughly 570MB for any single build target. And given the 50.6% hit rate from above (half of the BR cache is useful when switching build targets), it's not surprising that:

# First build size + another 50% for each subsequent build target
570 + 3*570/2 = 1425

Which gets us almost exactly to what was observed on disk after building all four targets.

Whereas the external ccache and its incredibly high hit rate (98.6%) means that there isn't anything left to add to the cache after the first build. So it stays more or less the same size.

Note that while the build results are still deterministic, interestingly the two ccaches show some variation between runs. Best guess is that this is random timing happenstance due to parallel threaded compiles within make steps. In fact, the builds on the weaker 2400GE which supports fewer threads have much more consistent (though slightly larger(?)) cache sizes than the 5600X which has 50% more threads.


Note that if you haven't changed build targets and are just re-compiling with a different SeedSigner --app-repo, --app-branch, or --app-commit-id you can build with the --no-clean switch which still yields the fastest possible rebuild since there's no recompilation whatsoever; ccache has no effect.

--no-clean build times:

  • 5600X: 23 seconds
  • 2400GE: 39 seconds

Future considerations

These reusable caches could potentially be added to their own repo and pulled by a Github Action to speed up automated CI builds. Probably not a best practice for end-users trying to verify their software via reproducible builds, but would greatly aid testing PRs if CI automatically produced pi0 and pi02w builds of the proposed PR. That would make it easier for more people to test a PR since they wouldn't even need a dev environment. Just have to be careful and clear about not using that image with any real seeds.

@jdlcdl
Copy link

jdlcdl commented Jan 13, 2025

ACK tested as of 0b3a7f3

On a 4 core Xeon E3-1226 @ 3.3GHz system w/ 8GB RAM, running Ubuntu:

  • just less than 3m to recreate the seedsigner-os-build-images container,
  • 58m36s to build for pi0 the first time,
  • 18m3s to build for pi0 the second time,
  • 1m3s to build for pi0 a third time with --no-clean
  • 33s to build for pi0 a fourth time with --no-clean --skip-repo
    ...reproduced same binaries each time.

image (0.8.5-rc1 from Nick's merges last night) appears to run fine on pi0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants