-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test failure: integration_tests::instances::test_instance_failed_by_instance_watcher_automatically_reincarnates
#6727
Comments
This is my test, so I'm gonna try to figure out what's gone wrong here once once I finish more urgent work. |
integration_tests::instances::test_instance_failed_by_instance_watcher_automatically_reincarnates
Saw this on https://github.com/oxidecomputer/omicron/runs/31050364405 / https://buildomat.eng.oxide.computer/wg/0/details/01J9A281VZ5AWSENGSBPS2SZBM/YgsiBVpZeZs8xKwEs3NG8vJWRHaj6HRg4P2sOa8fSLBWOjXN/01J9A28XVZQ35MEGH8KPZZ9Q26. At a glance it looks basically the same (other than a new |
A possible test flake exists in the tests for instance reincarnation. This is caused by a race that occurs when the request to simulate an instance's state transition is sent to the simulated sled-agent *before* an instance-start saga sends the request to start the instance to that sled-agent. When this occurs, the simulated state transition is lost, and the test keeps waiting for it forever. See [this comment][1] for details. This commit resolves this by adding a new `instance_wait_for_simulated_transition` helper, which is identical to `instance_wait_for_state`, but with the addition of `instance_poke` requests every time the instance is not observed to be in the desired state. This is a bit of a blunt instrument, but it ensures that the simulated sled-agent will always be told to simulate a state transition after it's requested by the control plane. Hopefully fixes #6727 [1]: #6727 (comment)
A possible test flake exists in the tests for instance reincarnation. This is caused by a race that occurs when the request to simulate an instance's state transition is sent to the simulated sled-agent *before* an instance-start saga sends the request to start the instance to that sled-agent. When this occurs, the simulated state transition is lost, and the test keeps waiting for it forever. See [this comment][1] for details. This commit resolves this by adding a new `instance_wait_for_simulated_transition` helper, which is identical to `instance_wait_for_state`, but with the addition of `instance_poke` requests every time the instance is not observed to be in the desired state. This is a bit of a blunt instrument, but it ensures that the simulated sled-agent will always be told to simulate a state transition after it's requested by the control plane. Hopefully fixes #6727 [1]: #6727 (comment)
#7295 should fix this. |
On branch #6726 commit b3346d9, I saw this test failure:
Reran the test and it appears to pass. I wonder if it just took slightly longer than 2 minutes for the instance to transition back from
Starting
->Running
? Feels like a pretty long time though...Log file:
test_all-afb586eba2d02dc7-test_instance_failed_by_instance_watcher_automatically_reincarnates.2148721.0.log
NOTE: Consider attaching any log files produced by the test.
The text was updated successfully, but these errors were encountered: