Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cron job maintenance #448

Merged
merged 30 commits into from
Apr 18, 2024
Merged

Cron job maintenance #448

merged 30 commits into from
Apr 18, 2024

Conversation

ksuderman
Copy link
Contributor

@ksuderman ksuderman commented Feb 9, 2024

Adds cron jobs that clean up the jobs_directory, datasets, and histories. The jobs are defined in the values.yaml file and templates generated from the cronjob-maintenance.yaml template. The cron jobs are based on the Ansible Playbook used on main

Since the /galaxy/server/database/tmp directory is cleaned up I am not sure the job that cleans the tus_upload_store is needed since it also lives in /galaxy/server/database/tmp, but I've included it for completeness.

The pgcleanup.py script writes logs to the tmp directory, but we should likely create a separate logs directory for these and then also prune the logs as needed.

Should we just call the maintenance.sh script, or a modified version of the script?

NOTE I have set the cron jobs to runAsUser: 0 but we should probably run these as the Galaxy user once we have determined why mutliqc is creating files and directories as nobody.

This is untested as of yet.

Closes #408

@ksuderman ksuderman self-assigned this Feb 9, 2024
@ksuderman ksuderman marked this pull request as draft February 9, 2024 21:35
galaxy/values.yaml Outdated Show resolved Hide resolved
Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea of providing overridable cronjob templates with unlimited cron jobs. Given however, that the default cron jobs are all run daily, what if we just have a single bash script that is invoked by a single cronjob, which will in turn invoke the 3 cleanup commands sequentially? This can be overridden or replaced as desired.

@ksuderman
Copy link
Contributor Author

what if we just have a single bash script that is invoked by a single cronjob, which will in turn invoke the 3 cleanup commands sequentially? This can be overridden or replaced as desired.

I was going to suggest the same thing, but I wanted to ensure the syntax for defining multiple cron jobs was working correctly first. There is already a maintenance.sh script we could use as-is, or modify for our needs.

@nuwang
Copy link
Member

nuwang commented Feb 14, 2024

Makes sense to switch to the maintenance script if we can.

galaxy/values.yaml Outdated Show resolved Hide resolved
@ksuderman
Copy link
Contributor Author

Please ensure #453 has been merged before merging this.

Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Can this be taken out of draft?

galaxy/values.yaml Outdated Show resolved Hide resolved
galaxy/values.yaml Outdated Show resolved Hide resolved
galaxy/values.yaml Outdated Show resolved Hide resolved
galaxy/values.yaml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
galaxy/values.yaml Outdated Show resolved Hide resolved
@ksuderman ksuderman marked this pull request as ready for review March 12, 2024 19:35
@ksuderman ksuderman requested a review from nuwang March 12, 2024 19:52
galaxy/values.yaml Show resolved Hide resolved

To disable a cron job after Galaxy has been deployed simply set the schedule to a date that
can never occur such as midnight on Februrary 30th:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add an enabled flag as we have for everything else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but in other cases when enabled: false is set we don't render the template. However, in this case it means the CronJob is not defined and not available to be run manually. By setting the schedule to a time that never occurs the CronJob is defined and is available to be run manually if desired. At least that was my thought process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading more about cron jobs, maybe I should be using .spec.suspend here. I'll play around with that and see if I get the desired behavior.

Copy link
Member

@nuwang nuwang Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to investigate suspend. Maybe we should add both. Some instructions on how to manually invoke the cronjob would also be useful.

Copy link
Contributor Author

@ksuderman ksuderman Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .spec.suspend looks to be less than ideal, particularly if one wants to run the job once but keep it suspended. In that case the user must:

  1. Patch the cron job to set .spec.suspend=false
  2. Run the job if it does not get triggered automatically, and
  3. Patch the cron job again to re-set .spec.suspend=true

Given users can always disable a cron job with the schedule I am not sure .spec.suspend is worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding an enabled flag would still be useful. Users who want to run the cronjob manually can still use the trick you suggested. Just that it feels weird to have to think of an invalid date for the simplest case of not wanting the cron job.

galaxy/values.yaml Outdated Show resolved Hide resolved
galaxy/values.yaml Outdated Show resolved Hide resolved
galaxy/templates/cronjob-maintenance.yaml Show resolved Hide resolved
Copy link
Member

@nuwang nuwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ksuderman Looks like we have a conflict, and I've made some more minor comments, but other than that, this looks like it's good to go?


To disable a cron job after Galaxy has been deployed simply set the schedule to a date that
can never occur such as midnight on Februrary 30th:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that adding an enabled flag would still be useful. Users who want to run the cronjob manually can still use the trick you suggested. Just that it feels weird to have to think of an invalid date for the simplest case of not wanting the cron job.

galaxy/templates/cronjob-maintenance.yaml Outdated Show resolved Hide resolved
@nuwang nuwang merged commit 72f95ad into galaxyproject:master Apr 18, 2024
2 checks passed
@ksuderman ksuderman deleted the 408-maintenance branch April 26, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Maintenance job should also call maintenance.sh script and others
2 participants