-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esm_runscripts does not stop on model crashes #179
Comments
I thought this was solved quite some time ago. Can you please provide the versions and the machine? |
I am not sure but probably I actually developed a better runtime scheduler for that which checks the slurm output and decides if it needs to submit the new job or to kill eveything. Ie. think of it as Unfortunately some system admins block the access to @seb-wahl, do you think that your problem is related to this issue. Then I can bring this to the table on Thursday's meeting. |
Yes that's exactly something I'm looking for. An equivalent of |
We solved the case of |
I am still surprised that ESM-Tools moves forward after the model crashes. If you look at the following lines in I think is worth investigating it but I need the versions for that. |
I guess it's because of |
If the model (e.g. echam or any component of the coupled setup) crashes in the fortran code, e.g.
esm_runscripts continues and tries to move files, set's up the next leg of the run etc. In bash something like
(of course we need to handle echam's possible return code of 127) would do the trick. This has been an issue for us ever since and can be annoying at times.
Is there a way we can solve this in esm_runscripts?
The text was updated successfully, but these errors were encountered: