-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"add sled" needs a longer timeout #5116
Conversation
Thanks for the quick fix @davepacheco |
I'd like to test this, ideally on real hardware but maybe even just in the testbed just to make sure it didn't somehow break anything. I'm out for a lot of tomorrow but if it's useful for this to be landed, feel free to do it! |
I can give it a test on testbed if John doesn't take it for a spin on madrid first. Seems relatively innocuous, but worth testing. |
I'm going to hold off on another madrid run until we get more of #5111 knocked down. 👍 on giving this a testbed spin, but agreed it looks good. |
Tested out on testbed. The add works fine. Node gets added to the sled table in CRDB and bootstore learned its share on the added node (which is a prereq to it showing up in CRDB). |
Test failure also seems like a fluke. It's definitely unrelated to this code.
|
In #5111:
PUT /sleds
to some sled agentPUT /sleds
itself blocks until the new sled's sled agent has startedSo as a result, the "add sled" request failed, even though the operation ultimately succeeded.
In this PR, I bump the timeout to 5 minutes. I do wonder if we should remove it altogether, or if we should consider the other changes mentioned in #5111 (like not blocking sled agent startup on this, or not blocking these API calls in this way). But for now, this seems like a low-risk way to improve this situation.