-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: add retry for anp/banp connection test #273
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hongzhen-ma The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @hongzhen-ma! |
Hi @hongzhen-ma. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
5be85c8
to
32ed979
Compare
Signed-off-by: 马洪贞 <[email protected]>
32ed979
to
0a1140e
Compare
is this needed? I didn't see the test flake in the CI and adding retries for network policies means the pods will not be protected during that time, that is something bad for the users |
is this needed? Another point is that retry is about connection results after an anp or a banp applied, and does not modify policy rules during |
Modifying tests that may hide problems lowers the quality bar ... the tests are done to assert on behaviors not to just pass, |
Thanks for the reply. |
FWIW, I've seen some flakes on Calico too. Haven't analysed them in detail yet but there are classes of updates that can race with connectivity checks. Calico's model is as follows:
Some examples of races with that model:
I expect that |
We discussed this in slack, and just to clarify, and apoligize if this came wrong, I'm not advocating for zero latency, but just before bumping timeouts to have a better understanding on the problem, it most probably bias, as it happens a lot to me that the immediate solution for a flake is adding a sleep or a retry, and a considerable number of times there is a bug that we ignore. In in k/k there is 10 s IIUIC https://github.com/kubernetes/kubernetes/blob/master/test/e2e/network/netpol/kubemanager.go#L328-L350 I think ANP has 3 seconds for the tests network-policy-api/conformance/utils/config/timeout.go Lines 50 to 58 in b1d9608
I'm +1 on adapting the tests to allow 10s for the progamming latency as it is the same we have in k/k, but I really think we should not go beyond that , sounds fair? /reopen |
@aojea: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Thanks for the information. |
@aojea yep, 100% agree. Calico's FV tests also use 10s timeout for connectivity/policy convergence. I think the best strategy I've found for these kinds of tests is to measure convergence time (with a a very long abort timeout) and then assert that the time taken was less than a "reasonable" time. Then you get one of a three results instead of just pass/fail:
Being able to distinguish "converged but took too longer than we want" from "did not converge after an unreasonably long time" makes the debugging a lot easier. |
I love that |
add retry for anp/banp connection test since the policy need some time to take effect but the tests run check just once, which will lead to unnecessary failure
part of #108