-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write metadata for all output files to manifest.json #724
Conversation
bloodearnest
commented
Apr 24, 2024
- Write output metadata to manifest.json
- Add a cli command to backfill manifests for all workspaces
The main motivation for this was allowing airlock to access commit data for level 4. However, it was simple to extend this idea to include a) all outputs and b) much more metadata, like job, action, size, timestamp, content hash, and whether a file has been excluded from level 4 and why. This data is useful for airlock, but also potentially for other systems and general reporting.
f8f43a9
to
77ed5f7
Compare
I have manually tested this using the db from the test backend. I have not included automated tests, as a) this is a one off migration command, b) they would be quite hard to write and c) this is an offline process that is not mission critical
77ed5f7
to
b638470
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
One more general question I have is under what circumstances old outputs get deleted? It used to be that if you ran action_x
and it produced a different set of outputs to last time then any previous outputs would get deleted (and removed from the manifest). But I think that general logic has changed at some point? In any case, as long as the manifest reflects what's actually on disk then I think that's OK.
I think that is still done? https://github.com/opensafely-core/job-runner/blob/main/jobrunner/run.py#L384 It's just done in the scheduler, not the executor (which has no knowledge of any outputs except the current job's outputs) |
Ah yeah, I see. Doesn't that mean though that these don't get removed from the manifest and so that will end up with stale outputs in it which don't actually exist on disk? |
Oh rats, yes it will. I think we'll need to make the executor APIs |