Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: embed and check githash in .olean #2766

Merged
merged 2 commits into from
Nov 27, 2023
Merged

Conversation

Kha
Copy link
Member

@Kha Kha commented Oct 25, 2023

This is an additional safety net on top of #2749: it protects users that circumvent the build system (e.g. with lake env) as well as obviates the need for TOCTOU-like race condition checks in the build system.

The check is activated by CHECK_OLEAN_VERSION=ON, which now defaults to OFF as the sensible default for local development. When activated, USE_GITHASH=ON is also force-enabled for stage 0 in order to make sure that stage 1 can load its own core library.

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Oct 25, 2023
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 25, 2023
@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Oct 25, 2023
@leanprover-community-mathlib4-bot
Copy link
Collaborator

leanprover-community-mathlib4-bot commented Oct 25, 2023

src/library/module.cpp Outdated Show resolved Hide resolved
@kim-em
Copy link
Collaborator

kim-em commented Oct 26, 2023

Options for a successful PR run:

  • rebase this PR onto d126c09 (the commit underlying nightly-2023-10-25), which should give an immediate run.
  • wait until nightly-2023-10-26 lands and I have fixed Std+Mathlib for it (approximately +10 hours from now), and then push an empty commit (or anything) to this branch.

@kim-em kim-em removed the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Oct 26, 2023
@digama0
Copy link
Collaborator

digama0 commented Oct 26, 2023

Note: this will break leantar / lake exe cache, keep me appraised when it hits mathlib.

@digama0
Copy link
Collaborator

digama0 commented Oct 26, 2023

While we're changing the olean format, I also have a request: add a version field, pinned to 1 for the time being. We could make it part of the oleanfile header, say using 12 bytes for oleanfile!!! and then a 4 byte version field.

I don't want to start the discussion of ABI versioning yet, but it would be nice if we had a little foresight...

@digama0
Copy link
Collaborator

digama0 commented Oct 26, 2023

I have a patch for leantar up at https://github.com/digama0/leangz/tree/olean_v1 . It is backward compatible with earlier versions, so I will make a release for it as soon as the new olean format is finalized and this PR is merged.

@Kha Kha marked this pull request as draft October 28, 2023 07:43
@digama0
Copy link
Collaborator

digama0 commented Nov 3, 2023

I'll repeat my request for adding a version field. leantar needs this to detect any ABI breaking change to core data structures, in particular Expr, for which there have been two PRs in the recent past (both from me and not yet accepted, but still) to be able to detect changes. leantar has to work with multiple versions of lean simultaneously, because users can jump between different versions of the toolchain when switching branches, so it is helpful for any changes to the olean format to be accompanied by a version bump. There are generally very few of these, but this PR and the two mentioned ones would involve an olean version bump. Git hashes are not sufficient, they allow detecting that the version is wrong but not whether it is an old version or a new version, and neither git hashes nor toolchain names are easily placed in a linear order.

@Kha
Copy link
Member Author

Kha commented Nov 4, 2023

I added a version field for the .olean header format itself but I don't see how we could realistically version the payload format itself given that any change to a user environment extension can be an ABI change. And even if we restricted ourselves to core, it seems infeasible to me without some automation double-checking the contract.

leantar has to work with multiple versions of lean simultaneously

It doesn't have to, cache could refer to and download an exact leantar version instead of a lower bound. It would be a simpler solution than olean versioning by orders of magnitude.

@digama0
Copy link
Collaborator

digama0 commented Nov 4, 2023

I added a version field for the .olean header format itself but I don't see how we could realistically version the payload format itself given that any change to a user environment extension can be an ABI change. And even if we restricted ourselves to core, it seems infeasible to me without some automation double-checking the contract.

Thanks. A "good enough" solution is fine for the leantar usecase, as if it gets the format wrong the worst that happens is the compression ratio goes way down. The main idea here is that when we anticipate breakage due to use cases in the wild we have a number we can bump to fix them. For lean itself it makes sense that you would want to use the exact githash.

It doesn't have to, cache could refer to and download an exact leantar version instead of a lower bound. It would be a simpler solution than olean versioning by orders of magnitude.

The problem here is that one would have to know which leantar version to use given an olean, and unless it is being distributed with the toolchain and being released with every commit I don't how to do that. The number of olean version changes that are relevant to leantar that have happened in the last year can be counted on one hand, so it is perfectly feasible to release new leantar versions when needed and have complete backward compatibility to every change made to the format so far. Maybe this won't be the case forever but the changes so far have been quite tame and predictable.

I have some ideas for how to do complete olean versioning using "schema generation" from the lean environment; a lot of this is already implemented in oleandump, which reads lean declarations to determine how to deserialize oleans. That would make it possible to detect and automatically bump the version whenever there is an ABI breaking change. But I'm not proposing that at this time, only a version that is manually bumped to avoid the worst fallout of ABI breaks.

@Kha
Copy link
Member Author

Kha commented Nov 6, 2023

The problem here is that one would have to know which leantar version to use given an olean, and unless it is being distributed with the toolchain and being released with every commit I don't how to do that

Why does my suggestion of hardcoding the leantar version in the cache source code not work for that? As you said it would only have to be adjusted in a rare few lean-toolchain bumps in mathlib.

@digama0
Copy link
Collaborator

digama0 commented Nov 6, 2023

Why does my suggestion of hardcoding the leantar version in the cache source code not work for that? As you said it would only have to be adjusted in a rare few lean-toolchain bumps in mathlib.

Oh I see, yes that would work, you are saying that the "olean version" would be tracked in the cache source code instead of the lean core source code. This can be done without needing to pin leantar versions too, cache could pass an olean version as a command line argument.

I would go for all of the above: a lean embedded olean version number, a cache version command line arg, and leantar versioned releases. The reason is because these each have slightly different development processes and this gives us flexibility to detect and correct issues caused by any of these places.

@Kha Kha marked this pull request as ready for review November 6, 2023 11:53
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Nov 6, 2023
@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Nov 6, 2023
@Kha Kha requested a review from leodemoura November 7, 2023 11:37
@Kha Kha marked this pull request as draft November 7, 2023 12:29
@Kha
Copy link
Member Author

Kha commented Nov 7, 2023

(ready for review but I want to do one final test round before merge)

@digama0
Copy link
Collaborator

digama0 commented Nov 7, 2023

Updated https://github.com/digama0/leangz/tree/olean_v1 , I will make a release once this PR is merged. Fortunately or unfortunately, old leantar versions will detect that these oleans have a funny format and will instead use a generic (and much less efficient) compression format, so if we are not on the ball bumping the leantar version at the right time we will simply have some large cache files. The new leantar is backward compatible though, so I will try to get it merged in advance of this change landing in mathlib, and then hopefully it will not be an issue.

@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added builds-mathlib CI has verified that Mathlib builds against this PR and removed breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan labels Nov 14, 2023
@Kha Kha marked this pull request as ready for review November 14, 2023 15:13
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Nov 14, 2023
@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan and removed builds-mathlib CI has verified that Mathlib builds against this PR labels Nov 14, 2023
@kim-em
Copy link
Collaborator

kim-em commented Nov 15, 2023

Mathlib used parts of ProofWidgets in tests, that were not being built by lake build. Hence this had the wrong hash, and lake was not being given the chance to rebuild them. I've added lake build ProofWidgets in the CI test step for Mathlib, and cherry-picked that to lean-pr-testing-2766. Hopefully this will go green again shortly!

@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added builds-mathlib CI has verified that Mathlib builds against this PR and removed breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan labels Nov 15, 2023
mathlib-bors bot pushed a commit to leanprover-community/mathlib4 that referenced this pull request Nov 19, 2023
After leanprover/lean4#2766 this becomes critical.



Co-authored-by: Scott Morrison <[email protected]>
mathlib-bors bot pushed a commit to leanprover-community/mathlib4 that referenced this pull request Nov 19, 2023
After leanprover/lean4#2766 this becomes critical.



Co-authored-by: Scott Morrison <[email protected]>
@Kha Kha requested a review from kim-em as a code owner November 20, 2023 08:15
alexkeizer pushed a commit to leanprover-community/mathlib4 that referenced this pull request Nov 21, 2023
After leanprover/lean4#2766 this becomes critical.



Co-authored-by: Scott Morrison <[email protected]>
@Kha Kha added this pull request to the merge queue Nov 27, 2023
@Kha Kha removed this pull request from the merge queue due to the queue being cleared Nov 27, 2023
@Kha Kha changed the title Embed and check githash in .olean feat: embed and check githash in .olean Nov 27, 2023
@Kha Kha added this pull request to the merge queue Nov 27, 2023
Merged via the queue into leanprover:master with commit 79251f5 Nov 27, 2023
17 checks passed
@Kha
Copy link
Member Author

Kha commented Nov 27, 2023

@digama0 It's merged!

mathlib-bors bot pushed a commit to leanprover-community/mathlib4 that referenced this pull request Nov 28, 2023
After leanprover/lean4#2766 lands, the git hash of will be embedded in every olean. `lean4checker` will thus reject oleans unless it was compiled on the same toolchain as they were!

We may want to relax this later, but for now, since we are building `lean4checker` in CI anyway, let's just make sure it is on the same toolchain as mathlib.

Co-authored-by: Scott Morrison <[email protected]>
mathlib-bors bot pushed a commit to leanprover-community/mathlib4 that referenced this pull request Nov 29, 2023
leanprover/lean4#2766 changed the format of .olean files, so `leantar` also needs to be updated to support it. `leantar 0.1.9` supports both the new format and the old one, and is fully backward compatible, so it is safe to land this in advance of the next RC bump. (This change has already been tested on [`lean-pr-testing-2766`](0c9d123).)

In fact, leantar is also forward-compatible in this release: old leantar can pack and unpack oleans made by new lean, but it does not recognize the files as oleans, so it falls back to plain old gzip (with a significant loss in compression quality). But it's probably best to land this first so we don't waste space and download time.

Co-authored-by: Mario Carneiro <[email protected]>
grunweg pushed a commit to leanprover-community/mathlib4 that referenced this pull request Dec 15, 2023
After leanprover/lean4#2766 this becomes critical.



Co-authored-by: Scott Morrison <[email protected]>
awueth pushed a commit to leanprover-community/mathlib4 that referenced this pull request Dec 19, 2023
After leanprover/lean4#2766 lands, the git hash of will be embedded in every olean. `lean4checker` will thus reject oleans unless it was compiled on the same toolchain as they were!

We may want to relax this later, but for now, since we are building `lean4checker` in CI anyway, let's just make sure it is on the same toolchain as mathlib.

Co-authored-by: Scott Morrison <[email protected]>
awueth pushed a commit to leanprover-community/mathlib4 that referenced this pull request Dec 19, 2023
leanprover/lean4#2766 changed the format of .olean files, so `leantar` also needs to be updated to support it. `leantar 0.1.9` supports both the new format and the old one, and is fully backward compatible, so it is safe to land this in advance of the next RC bump. (This change has already been tested on [`lean-pr-testing-2766`](0c9d123).)

In fact, leantar is also forward-compatible in this release: old leantar can pack and unpack oleans made by new lean, but it does not recognize the files as oleans, so it falls back to plain old gzip (with a significant loss in compression quality). But it's probably best to land this first so we don't waste space and download time.

Co-authored-by: Mario Carneiro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
builds-mathlib CI has verified that Mathlib builds against this PR toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants