Jobs stuck at "running" #86

mpermana · 2020-04-24T05:43:30Z

This can happen when a job is running and the ndscheduler process died.

I.e to reproduce:
can create shell job like that sleeps for a while i.e:
["bash","-c","sleep 3600"]

when the job is running, send kill signal, the next time ndscheduler starts, the job will be stuck at running.

palto42 · 2020-06-30T15:16:59Z

I can confirm this behavior.
What would be needed is a database cleanup at the start of ndscheduler to change the status of those jobs to "failed" since they are most likely not completed.

palto42 · 2020-07-04T16:39:02Z

I submitted a PR #90 which cleans the database from such interrupted executions.

In my case the interruption was caused by running the ndscheduler via systemd unit which sends a SIGTERM at stop/restart and not the SIGINT which is expected by ndscheduler. It is possible to change the stop signal used by systemd unit to SIGINT in order to ensure graceful stop of ndscheduler. Another alternative would be to add SIGTERM in server.py alongside with the handler for SIGINT.

palto42 mentioned this issue Jul 4, 2020

update executions in status 'running' at start-up #90

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs stuck at "running" #86

Jobs stuck at "running" #86

mpermana commented Apr 24, 2020

palto42 commented Jun 30, 2020 •

edited

Loading

palto42 commented Jul 4, 2020

Jobs stuck at "running" #86

Jobs stuck at "running" #86

Comments

mpermana commented Apr 24, 2020

palto42 commented Jun 30, 2020 • edited Loading

palto42 commented Jul 4, 2020

palto42 commented Jun 30, 2020 •

edited

Loading