Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-custom-route-controller ControlPlane health checks are failing because Event that is already resolved #733

Open
ialidzhikov opened this issue Apr 7, 2023 · 0 comments
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/bug Bug lifecycle/rotten Nobody worked on this for 12 months (final aging stage) platform/aws Amazon web services platform/infrastructure

Comments

@ialidzhikov
Copy link
Member

How to categorize this issue?

/area quality
/kind bug
/platform aws

What happened:
#688 introduced the following health check - it checks for events created by aws-route-controller and fails the ControlPlane health check when there is event of type Warning.
While this can help with issues that cannot be resolved on its own, this also is leading to a lot of false-positive results:
For example the route-controller failed to updated routes 50mins ago:

53m         Warning   RoutesUpdateFailed             serviceaccount/aws-custom-route-controller   creating route 10.101.10.0/24 -> i-<uid> in table rtb-<uid> failed: IncorrectInstanceState: Instance with state 'shutting-down' is not valid for this operation....
53m         Warning   RoutesUpdateFailed             serviceaccount/aws-custom-route-controller   creating route 10.101.10.0/24 -> i-<uid> in table rtb-<uid> failed: IncorrectInstanceState: Instance with state 'shutting-down' is not valid for this operation....
53m         Warning   RoutesUpdateFailed             serviceaccount/aws-custom-route-controller   creating route 10.101.10.0/24 -> i-<uid> in table rtb-<uid> failed: InvalidParameterValue: Invalid value 'i-<uid>' for instance ID. Instance is not in a VPC....
52m         Warning   RoutesUpdateFailed             serviceaccount/aws-custom-route-controller   creating route 10.101.10.0/24 -> i-<uid> in table rtb-<uid> failed: InvalidParameterValue: Invalid value 'i-<uid>' for instance ID. Instance is not in a VPC....
52m         Warning   RoutesUpdateFailed             serviceaccount/aws-custom-route-controller   creating route 10.101.10.0/24 -> i-<uid> in table rtb-<uid> failed: InvalidParameterValue: Invalid value 'i-<uid>' for instance ID. Instance is not in a VPC....

This issue was only temporary one and was fixed on its own. But according to the health check logic, provider-aws will fail the ControlPlane health check until the event is gone from the system. So there was a transient failure for a moment, it resolved on its own, but the health check implementation fails the ControlPlane health check when there is no actual issue.

What you expected to happen:
provider-aws to do not fail the ControlPlane health check with such false-positive results.

How to reproduce it (as minimally and precisely as possible):
See above.

Anything else we need to know?:

Environment:

  • Gardener version (if relevant):
  • Extension version: v1.42.2
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:
@gardener-robot gardener-robot added area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/bug Bug platform/aws Amazon web services platform/infrastructure labels Apr 7, 2023
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Dec 15, 2023
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/quality Output qualification (tests, checks, scans, automation in general, etc.) related kind/bug Bug lifecycle/rotten Nobody worked on this for 12 months (final aging stage) platform/aws Amazon web services platform/infrastructure
Projects
None yet
Development

No branches or pull requests

2 participants