Cancelled database jobs aren't cancelled while in db-maintenance mode #766

evansd · 2024-12-04T12:38:35Z

In one sense this isn't a problem as the jobs are cancelled as soon as we come out of maintenance mode. But as well as a source of user confusion this potentially makes our stats/dashboard a bit misleading as we continue to consider the jobs active until the end of maintenance mode.

We could address this with an extra case in this branch (if a job isn't running but is cancelled them move it immediately to failed):

job-runner/jobrunner/run.py

Lines 228 to 241 in 4fba743

    
           if mode == "db-maintenance" and job_definition.allow_database_access: 
        
               if job.state == State.RUNNING: 
        
                   log.warning(f"DB maintenance mode active, killing db job {job.id}") 
        
                   # we ignore the JobStatus returned from these API calls, as this is not a hard error 
        
                   api.terminate(job_definition) 
        
                   api.cleanup(job_definition) 
        
               # reset state to pending and exit 
        
               set_code( 
        
                   job, 
        
                   StatusCode.WAITING_DB_MAINTENANCE, 
        
                   "Waiting for database to finish maintenance", 
        
               ) 
        
               return

But I'm wary of adding even more complexity to the state manipulation code here. Maybe there's a more principled way of refactoring things here to get the behaviour we want?

Slack thread:
https://bennettoxford.slack.com/archives/C069YDR4NCA/p1733311815069519

evansd · 2024-12-11T16:16:56Z

It turns out this is more of a problem than just a confusing UX. Job Runner will refuse to schedule a new job running action X while there is an existing job for action X pending. If you realise there's a problem with one of your pending jobs then the natural thing to do is to cancel it and schedule a new version with the fixed code. But if we're in database maintenance mode then you can't do this because you can't actually cancel the job to schedule a new one.

Slack thread:
https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1733933228089519

bloodearnest · 2024-12-13T11:20:56Z

Oh yikes. Yeah, we need to fix this.

rebkwok mentioned this issue Dec 5, 2024

Lack of visibility of job status / DB maintenance mode opensafely-core/job-server#4760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cancelled database jobs aren't cancelled while in db-maintenance mode #766

Cancelled database jobs aren't cancelled while in db-maintenance mode #766

evansd commented Dec 4, 2024 •

edited

Loading

evansd commented Dec 11, 2024

bloodearnest commented Dec 13, 2024

Cancelled database jobs aren't cancelled while in db-maintenance mode #766

Cancelled database jobs aren't cancelled while in db-maintenance mode #766

Comments

evansd commented Dec 4, 2024 • edited Loading

evansd commented Dec 11, 2024

bloodearnest commented Dec 13, 2024

evansd commented Dec 4, 2024 •

edited

Loading