Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly set the exit code when an exception is raised in the atexit callback _prepare_to_exit. #6383

Merged
merged 11 commits into from
Feb 9, 2024

Conversation

vanbasten23
Copy link
Collaborator

@vanbasten23 vanbasten23 commented Jan 25, 2024

This PR is intended to correctly set the exit code when an exception is raised in the atexit callback _prepare_to_exit. Consequently, the CI can catch such issues whenever an exception happens in the callback, as opposed to silently failing and escaping the CI. See #6385.

Currently, when an exception is raised in the call back, the exit code is still 0 (success), which makes the CI unable to catch issues under the circumstance. The incorrect exit code is due to https://bugs.python.org/issue27035: python 2 can set the correct exit code (expect to be non-zero) if an exception is raised in the atexit callback but python 3 cannot. But the bug seems still to be open.

In the callback, neither sys.exit(1) or raise Exception is able to set the exit code correctly so a os._exit(1) is used. It may not be ideal but it's better for us to catch the mishap. Also it should be ok because it's in the less frequently executed path.

@vanbasten23
Copy link
Collaborator Author

vanbasten23 commented Jan 26, 2024

Ok, by this far, I can repro the failure by re-applying #6247. I can see the error msg Check failed: dist_runtime_client_->Connect().ok() even though the GPU CI succeeded.

@vanbasten23 vanbasten23 changed the title [DO NOT REVIEW] Fix CI. Correctly set the exit code when an exception is raised in the atexit callback _prepare_to_exit. Feb 6, 2024
@vanbasten23 vanbasten23 marked this pull request as ready for review February 7, 2024 01:11
if int(os.environ.get('PT_XLA_DEBUG', '0')):
_summarize_fn_tracker()
except Exception as e:
print('Caught an exception in the atexit callback: ', e, flush=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you make this logging.error("...", exc_info=e)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Copy link
Contributor

@yeounoh yeounoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

except Exception as e:
logging.error(
"Caught an exception when exiting the process. Exception: ", exc_info=e)
os._exit(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, one more micro-nit: make this exit(1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and it didn't set the exit code correctly. Per https://docs.python.org/3/library/constants.html#exit, calling exit(1) raises a SystemExit exception. But due to https://bugs.python.org/issue27035, raising an exception won't set the exit code correctly.

Also, per https://docs.python.org/3/library/constants.html#exit, exit or quit "are useful for the interactive interpreter shell and should not be used in programs.".

I'll add a comment.

@vanbasten23 vanbasten23 merged commit 1cd3f46 into master Feb 9, 2024
18 checks passed
amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants