-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bad error message when PjRtComputationClient
throws exception
#5946
Conversation
what would the error message be after your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Will!
0e06652
to
fde5b87
Compare
It would just be the first error, |
fde5b87
to
4e3e9cb
Compare
There seems to be a consistent error during teardown during the GPU tests. Creating a GPU dev environment to see what's going on |
Setting up a GPU VM is taking too long, so I came up with a better idea. We don't have to carefully prevent Let's see if this works on GPU... |
Some of this GPU tests are starting to pass, so it looks like this tweak works. |
|
||
ComputationClient* client; | ||
ComputationClient* GetComputationClient() { | ||
static std::unique_ptr<ComputationClient> client = []() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this and the original approach, since both rely on the static initializer? It seems like the logic has just been moved out of CreateClient and into the lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's right. This lets me skip checking the case where this function gets called twice, the handling of which causes the bad error message. C++11 statics will ensure that it only completes once, and the function is anonymous now, which will prevent future code from erroneously calling CreateClient
somewhere else in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, the lambda is guaranteed to only ever be called once. Do we know how CreateClient
was called twice in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that check to defend against future errors (likely by future me). There was never a case in the original code where CreateClient
actually completed twice or was called concurrently.
The bad error messaging occurred when the constructor of PjRtComputationClient
threw an exception. g_computation_client_initialized
was never reset to false
, GetComputationClientIfInitialized
calls GetComputationClient
during teardown, which re-runs CreateClient
, which complains because g_computation_client_initialized
never got reset.
In this PR, I only set g_computation_client_initialized
after the actual runtime init completed without breaking, since there are guaranteed to be no other concurrent callers of CreateClient
.
The
g_computation_client_initialized
doesn't get reset in this case, leading to incorrect behavior inGetComputationClientIfInitialized
. MakeCreateClient
anonymous so we can't call it twice by accident.Example from @jonb377:
ComputationClient
was never initialized, so it doesn't make sense to printComputationClient already initialized
.