Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] time out tests in test_all after a duration #4780

Conversation

sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Jan 8, 2024

We're seeing some tests in test_all hang forever (#4779). Set a reasonable
upper bound on test duration.

This will also cause stdout and stderr for failing tests to be printed. Doing
so on SIGTERM in general is tracked at
nextest-rs/nextest#1208.

Also, bump up the required nextest version to 0.9.64 to make use of the
binary_id predicate.

Created using spr 1.3.5
@sunshowers sunshowers requested a review from smklein January 8, 2024 22:33
Comment on lines 36 to 40
[[profile.default.overrides]]
filter = 'binary_id(omicron-nexus::test_all)'
# As of 2023-01-08, the slowest test in test_all takes 196s on a Ryzen 7950X.
# 900s is a good upper limit that adds a comfortable buffer.
slow-timeout = { period = '60s', terminate-after = 15 }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great, but I've seen tests on my laptop that take up to 1000 seconds -- the saga 'test_action_failure_can_unwind' tests are particularly slow, and definitely need optimizing.

I'm not even sure I'd say this timeout is "wrong" - that test needs fixing - but figured I'd mention it regardless

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that test isn't part of test_all.

Comment on lines 36 to 40
[[profile.default.overrides]]
filter = 'binary_id(omicron-nexus::test_all)'
# As of 2023-01-08, the slowest test in test_all takes 196s on a Ryzen 7950X.
# 900s is a good upper limit that adds a comfortable buffer.
slow-timeout = { period = '60s', terminate-after = 15 }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great, but I've seen tests on my laptop that take up to 1000 seconds -- the saga 'test_action_failure_can_unwind' tests are particularly slow, and definitely need optimizing.

I'm not even sure I'd say this timeout is "wrong" - that test needs fixing - but figured I'd mention it regardless

@davepacheco
Copy link
Collaborator

FWIW there is a timeout already on the whole test run and we went with a few hours:

ptime -m timeout 2h cargo nextest run --profile ci --locked --verbose

Clulow's Lament applies here: it's a tradeoff between false positives (spurious test failures because we didn't wait long enough) vs. taking too long to get feedback when they're hung. Personally, if a test hangs when I'm running it locally, I'd always rather it hang (and not kill it) because that will preserve a lot of useful runtime state I might want to debug it. In CI I don't think I care either way. But false positives here would certainly suck.

@sunshowers
Copy link
Contributor Author

In CI I don't think I care either way. But false positives here would certainly suck.

Fair, I'll restrict it to CI.

Created using spr 1.3.5
@sunshowers
Copy link
Contributor Author

FWIW there is a timeout already on the whole test run and we went with a few hours:

Yes -- there's currently a bug in nextest where it isn't printing out stdout/stderr on SIGTERM (nextest-rs/nextest#1208) -- I'll get around to fixing it soon but wanted to get some output going before then.

@sunshowers sunshowers enabled auto-merge (squash) January 9, 2024 00:07
Created using spr 1.3.5
Created using spr 1.3.5
@sunshowers sunshowers disabled auto-merge January 9, 2024 06:59
@sunshowers sunshowers enabled auto-merge (squash) January 9, 2024 06:59
@sunshowers sunshowers merged commit 9fe8a3c into main Jan 9, 2024
20 checks passed
@sunshowers sunshowers deleted the sunshowers/spr/tests-time-out-tests-in-test_all-after-a-duration branch January 9, 2024 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants