-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly set the exit code when an exception is raised in the atexit callback _prepare_to_exit. #6383
Conversation
Ok, by this far, I can repro the failure by re-applying #6247. I can see the error msg |
torch_xla/__init__.py
Outdated
if int(os.environ.get('PT_XLA_DEBUG', '0')): | ||
_summarize_fn_tracker() | ||
except Exception as e: | ||
print('Caught an exception in the atexit callback: ', e, flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you make this logging.error("...", exc_info=e)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you!
except Exception as e: | ||
logging.error( | ||
"Caught an exception when exiting the process. Exception: ", exc_info=e) | ||
os._exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, one more micro-nit: make this exit(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried and it didn't set the exit code correctly. Per https://docs.python.org/3/library/constants.html#exit, calling exit(1)
raises a SystemExit
exception. But due to https://bugs.python.org/issue27035, raising an exception won't set the exit code correctly.
Also, per https://docs.python.org/3/library/constants.html#exit, exit
or quit
"are useful for the interactive interpreter shell and should not be used in programs.".
I'll add a comment.
… callback _prepare_to_exit. (pytorch#6383)
… callback _prepare_to_exit. (#6383)
This PR is intended to correctly set the exit code when an exception is raised in the atexit callback
_prepare_to_exit
. Consequently, the CI can catch such issues whenever an exception happens in the callback, as opposed to silently failing and escaping the CI. See #6385.Currently, when an exception is raised in the call back, the exit code is still 0 (success), which makes the CI unable to catch issues under the circumstance. The incorrect exit code is due to https://bugs.python.org/issue27035: python 2 can set the correct exit code (expect to be non-zero) if an exception is raised in the atexit callback but python 3 cannot. But the bug seems still to be open.
In the callback, neither
sys.exit(1)
orraise Exception
is able to set the exit code correctly so a os._exit(1) is used. It may not be ideal but it's better for us to catch the mishap. Also it should be ok because it's in the less frequently executed path.