You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cecilia reported a workflow run where the jobs were printing a message from MiniWDL saying that they saw signal 2, and failing.
I think they are getting the timeout signal from Slurm, but MiniWDL's signal handler is replacing the one we install/the default Python one and not letting us see the timeout signal that we expect to make the worker actually fail the job in a way we recognize as a timeout. So we don't get any of the useful user-facing timeout logging and the user thinks the job is actually failing and not just timing out.
We either need to hack MiniWDL's signal handlers, or detect when MinIWDL is raising its WDL.runtime.error.Terminated exception and treat it as a timeout (at least under Slurm).
┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1637
The text was updated successfully, but these errors were encountered:
Cecilia reported a workflow run where the jobs were printing a message from MiniWDL saying that they saw signal 2, and failing.
I think they are getting the timeout signal from Slurm, but MiniWDL's signal handler is replacing the one we install/the default Python one and not letting us see the timeout signal that we expect to make the worker actually fail the job in a way we recognize as a timeout. So we don't get any of the useful user-facing timeout logging and the user thinks the job is actually failing and not just timing out.
We either need to hack MiniWDL's signal handlers, or detect when MinIWDL is raising its
WDL.runtime.error.Terminated
exception and treat it as a timeout (at least under Slurm).┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1637
The text was updated successfully, but these errors were encountered: