-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add desi_update_processing_table_statuses and update dry_run_level's #2385
Conversation
…date_processing_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two crash-inducing "typos" noted inline which appear to be leftovers of your IDE crashing and writing broken files. Please fix those and re-double check the rest, since our unit test coverage isn't catching those.
I have a knee-jerk reaction against the long name of this script; I can't even think the words quickly. Consider desi_update_proctable_status
? Given that we won't be typing this by hand regularly, it doesn't matter that much.
Do you envision running this as a separate scronjob after the last desi_proc_night call? Or run as part of the same script that runs the tsnr_afterburner and nightqa?
Thanks for these comments. I've resolved all of the comments, including changing the script name to I envision adding this to the wrapper script that calls the tsnr afterburner and nightqa. That way it is run every morning and each time we do reprocessing on daily. Documentation tests passed at NERSC, so I can't reproduce the failure \Ggithub is claiming for the commit. |
The github doctest is failing due to this warning:
which is also failing on other PRs, independent of docstring updates you've made here. I think we can merge this and sort out github docstring tests separately. Thanks for the updates. |
Summary
This adds a script,
desi_update_processing_table_statuses
, that updates only the STATUS column of the processing table. This is important because after 6 months NERSC deletes all knowledge of the job statuses. The new operating procedure would be to add this as a step in the tsnr afterburner and nightqa script, such that everytime we sign off on a night we would also run this script such that all jobs will be reflected as COMPLETE.In adding that I went down a rabbit hole to update and make the dry_run_level behavior more uniform. I also updated the documentation to be uniform as well. There was a challenge with unit tests with this, since some assumed behavior that deviated from the new levels. Some required changes to levels to reflect the expected behavior in each case.
New dry_run_level documentation:
The one notable change was in
desi_proc_night
(actuallyproc_night.py
). The test environment claims it's at nersc for simulation reasons and wants to write out files to check the outputs. However, the tests run off-site don't have access to Slurm. If we went with a large enough dry_run_level to avoid querying Slurm, then we wouldn't write out the files. So to get around this, I've added "if" statements toproc_night.py
that doesn't try to update the tables when not submitting jobs to Slurm or adry_run_level
large enough to avoid Slurm (dry_run_level
= 4 or 5). we want to differentiate the Slurm querying from thisTest
I believe the unit tests provide good coverage of the dry_run_level changes. I tested the new
desi_update_processing_table_statuses
script by running it on the last 6 months of data. There were no errors and the output tables are updates. There are no remaining "SUBMITTED" "PENDING" or "RUNNING" jobs.Output tables are here:
/global/cfs/cdirs/desi/spectro/redux/ptab_test/daily/processing_tables/
One example would be to compare
/global/cfs/cdirs/desi/spectro/redux/ptab_test/daily/processing_tables/processing_table_daily-20241004.csv
with the updated output in:
/global/cfs/cdirs/desi/spectro/redux/ptab_test/daily/processing_tables/processing_table_daily-20241004.csv
versus