Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZTS: Fix zpool_status_008_pos false positive #16769

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Nov 15, 2024

Motivation and Context

Resolve false positives reported by the CI for the zpool_status_008_pos test case.

https://github.com/openzfs/zfs/actions/runs/11848141397/job/33027753860?pr=16755
https://github.com/openzfs/zfs/actions/runs/11848141397/job/33027755132?pr=16755

Description

When checking that healthy vdevs[1-3] aren't shown omit the -s flag so slow vdevs are not considered as described by the comment. This avoids the possibility of a false positive in the CI when ZIO_SLOW_IO_MS is reduce to 160ms. Additionally, clear the fault injection as soon as it is no longer required for the test case.

Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold to 750ms to avoid false positives due to unrelated slow IOs which may occur in the CI environment. Additionally, clear the fault injection as soon as it is no longer required for the test case.

How Has This Been Tested?

Updated only the test case which will be tested by the CI.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 15, 2024
@amotin
Copy link
Member

amotin commented Nov 15, 2024

I wonder if the actual problem of the test is that supposedly healthy devices really do not fit into the timeout? Sure your patch should "work", but it might just hide valid failure case when vdevs are falsely reported slow, that is tested now. I wonder if we should increase the timeouts instead.

Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf
Copy link
Contributor Author

behlendorf commented Nov 15, 2024

I wonder if the actual problem of the test is that supposedly healthy devices really do not fit into the timeout?

That's exactly my suspicion. I'm game to try increasing the timeouts first and see if that resolves it. There's not going to be any right value so we'll have to pick something and see how it goes. We may end up just making the failure less likely, but not impossible.

Edit: updated PR to set slow IO threshold at 750ms and inject 1000ms delays.

@behlendorf
Copy link
Contributor Author

behlendorf commented Nov 19, 2024

I've run this through the CI 3 times now without reproducing this failure which is a good sign. Only after we merge this will we know for certain if the larger values are enough to resolve this entirely.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 19, 2024
@behlendorf behlendorf merged commit 49a377a into openzfs:master Nov 20, 2024
24 checks passed
behlendorf added a commit to behlendorf/zfs that referenced this pull request Nov 21, 2024
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#16769
behlendorf added a commit to behlendorf/zfs that referenced this pull request Nov 21, 2024
Increase the injected delay to 1000ms and the ZIO_SLOW_IO_MS threshold
to 750ms to avoid false positives due to unrelated slow IOs which may
occur in the CI environment.  Additionally, clear the fault injection as
soon as it is no longer required for the test case.

Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#16769
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants