aws-custom-route-controller ControlPlane health checks are failing because Event that is already resolved #733
Labels
area/quality
Output qualification (tests, checks, scans, automation in general, etc.) related
kind/bug
Bug
lifecycle/rotten
Nobody worked on this for 12 months (final aging stage)
platform/aws
Amazon web services platform/infrastructure
How to categorize this issue?
/area quality
/kind bug
/platform aws
What happened:
#688 introduced the following health check - it checks for events created by aws-route-controller and fails the ControlPlane health check when there is event of type Warning.
While this can help with issues that cannot be resolved on its own, this also is leading to a lot of false-positive results:
For example the route-controller failed to updated routes 50mins ago:
This issue was only temporary one and was fixed on its own. But according to the health check logic, provider-aws will fail the ControlPlane health check until the event is gone from the system. So there was a transient failure for a moment, it resolved on its own, but the health check implementation fails the ControlPlane health check when there is no actual issue.
What you expected to happen:
provider-aws to do not fail the ControlPlane health check with such false-positive results.
How to reproduce it (as minimally and precisely as possible):
See above.
Anything else we need to know?:
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: