-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
full dump reporting #1148
Comments
In the logs for number_of_records task, we could pull out into an email:
In the calculate_start_stop tasks, the logs in the mapped tasks show:
☝️ this might be the cause of some duplication? Shouldn't the next mapped task have I see in the logs for transform_marc_records_add_holdings task, we get stuff like that might be good to pull out for a reporting email:
For the transform_marc_records_clean_serialize task, this stuff in the logs might also be good to add to an email report. For the serializing and removing fields logging, maybe add the filename and skip pulling out of the log the smart_open call.
|
In the meantime I have been running this script I wrote around marccli
|
We need to have better reporting on how many unique records are in the full dump, possibly how many per file, how many files make up the full dump, how many, if any, duplicates in the materialized view, etc. This could help us with troubleshooting issues, such as POD only getting 5.4 million unique records and duplicates occurring across files.
With previous full dump from Symphony, we'd get an email at sul-unicorn-devs list of counts. Maybe consider this when the full dump selection dag finishes.
The text was updated successfully, but these errors were encountered: