-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch up job archiving code a bit #2139
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
app/aws/s3.py
Outdated
response = bucket.delete_objects(Delete={"Objects": [{"Key": key} for key in object_keys]}) | ||
except botocore.exceptions.ClientError: | ||
current_app.logger.exception("Couldn't delete any objects from bucket %s.", bucket.name) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why log it to raise it? we could just let celery log it itself no? We might end up with two exceptions here per my understanding.
app/celery/nightly_tasks.py
Outdated
for job in jobs: | ||
s3.remove_job_from_s3(job.service_id, job.id) | ||
dao_archive_job(job) | ||
current_app.logger.info("Job ID {} has been removed from s3.".format(job.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could dump the list instead of iterating through each individual entry as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary | Résumé
Task that archives jobs is archiving about 230 jobs / minute, which is pretty slow given that we expect it to do about 40K per day. This PR batches up both the database operation to mark the jobs as archived as well as deleting the files from s3.
Related Issues | Cartes liées
Test instructions | Instructions pour tester la modification
"schedule": 60,
(to run the task every minute)end_date = today + timedelta(days=7)
(to remove all jobs, not just old ones)20000
to2
make run-celery-purge
make run-celery-local
andmake run
Bonus points!
Release Instructions | Instructions pour le déploiement
Check the speed that jobs are being archived. Should be much faster than 230 / minute.
Reviewer checklist | Liste de vérification du réviseur