-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-16728: Require Service Deletion for LB Type Updates #1142
base: master
Are you sure you want to change the base?
OCPBUGS-16728: Require Service Deletion for LB Type Updates #1142
Conversation
@gcs278: This pull request references Jira Issue OCPBUGS-16728, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
65eea90
to
8a299a9
Compare
@gcs278 tested and found one issue:
The command to revert the change is wrong, it should be |
Another concern is the ingresscontroller status, should it always show the actual status? If yes, then the |
8a299a9
to
26a1492
Compare
@lihongan Thanks, good catch! Fixed.
Yea it's a great observation, I just talked about this with @Miciah yesterday. This PR won't address having the LB Type in the status reflect the actual LB Type status. It's complicated, and will require much more refactoring than I'm willing to bite off with this PR. I think we should consider this a technical debt item that we should try to solve later. The LB Type in the status will mostly be equivalent to the LB Type in the spec (for now). |
9d3a208
to
b02db5f
Compare
I've taken the WIP off the PR. I'm pretty content with the approach taken from some ideas of @Miciah (i.e. two disjoint "managed" annotation lists). Some parts of the code actually were simplified and code was removed as a result. I've also dropped the usage of "auto effectuation" or "manual effectuation" as I don't think this aptly conveys the idea that you need to delete/recreate the service (based on conversations with other team members). I went for |
e2e-aws-operator unrelated failure:
/test e2e-aws-operator Install failure unrelated |
@gcs278 Another issue is observed. Because PROXY proxy is enabled on Classic but disabled on NLB, so before deleting the LB service to effectuate the change, I can see the router deployment is rolled out with ENV |
Ah good catch. I forgot that somethings rely on the status LB Type to be accurate. I need to think about how to fix this. |
b02db5f
to
5e2d300
Compare
5e2d300
to
c6fd8b6
Compare
c6fd8b6
to
07b7cbb
Compare
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay.
managedLBServiceAnnotations = func() []string { | ||
result := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the generic sets.Set instead of the deprecated sets.Strings
? It seems to me that the idea for using a set was to prevent accidental duplicates which are possible with a slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's a fair point. Updated to use sets.Set
currentVal, have := current.Annotations[annotation] | ||
expectedVal, want := expected.Annotations[annotation] | ||
if (want && (!have || currentVal != expectedVal)) || (have && !want) { | ||
changed = true | ||
changedAnnotations = append(changedAnnotations, annotation) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you mean to collect all changed annotations? We still break
once a change is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops good catch. The changed list is just for logging at the moment, but still important.
func loadBalancerServiceIsProgressing(ic *operatorv1.IngressController, service *corev1.Service, platform *configv1.PlatformStatus, subnetsAWSEnabled bool, eipAllocationsAWSEnabled bool) error { | ||
var errs []error | ||
// Tricky: Scope in the status indicates the *desired* scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will getting the desired state from ic.Spec
be a step towards the right direction (reading the desired state from the spec and actual from the service)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking hypothetically? Or are you suggesting to change how LB scope spec/status work?
Generally speaking, @Miciah and I have mostly agreed that we should push that new API fields in the direction of spec == desired state and status == actual state. We did EIPAllocations, Subnets, AllowedSourceRanges, and OpenStack Floating IP in this manner (status == actual state).
We aren't going to be able to change the behavior of existing spec/status fields, but we can at least implement new ones in this way. For scope, the spec won't indicate the full desire (user desire + defaulting), so we can't change this without changing the behavior.
So yea, I think it's the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scope, the spec won't indicate the full desire (user desire + defaulting), so we can't change this without changing the behavior.
Right, I forgot about the defaulting. Then should the comment say something like:
// Tricky: Scope in the status indicates the *desired* scope. | |
// Tricky: Scope in the status indicates the *desired* scope with the defaulting taken into account. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea sure it's tricky language.
I'll clarify.
if platform.Type == configv1.AWSPlatformType && subnetsAWSEnabled { | ||
var ( | ||
// Tricky: Subnets in the status indicates the *actual* scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question. Can we read subnets from the service annotation directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subnets and EIPs status do reflect the actual state, but yes, we can also read it from the service annotation directly.
It doesn't really matter either way. I implemented this way because I liked the idea of putting the annotation into the IC status value, then using the status as a single point to access the actual value.
// clearIngressControllerInactiveAWSLBTypeParametersStatus clears AWS parameters for the inactive load balancer type, | ||
// unless there is a load balancer type change is pending. When a change is pending, setDefaultPublishingStrategy | ||
// sets the desired parameters for the new LB type. To avoid removing the desired state, we allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of setDefaultPublishingStategy
puzzled me in other reviews too. Do I understand correctly that this function reverts the changes done by setDefaultPublishingStategy
syncing the status with the actual
state of the service (good direction)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but why can't we just stop updating the status in setDefaultPublishingStategy
and let syncIngressControllerStatus
take care of the status exclusively?
If the type of the LB is changed by the user:
- we save the desired state in
spec
- update the status to instruct the user to recreate the service
- keep the LB type in
status
the same as in the service's annotation
Once the service is manually recreated the reconciliation will be triggered and we will reset the status to the new type taken from the service annotation.
Am I missing some hidden value in what setDefaultPublishingStategy
is doing w.r.t. to the status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I originally did this status NLB/CLB parameters clearing in setDefaultPublishingStategy
, then I noticed during pending Scope changes that setDefaultPublishingStategy
was only keeping the desired parameters and deleting the actual parameter's status (it doesn't know or care about pending changes, it just know and cares about desired state).
So, we can't just clear the NLB/CLB parameters based on desired state, we need to know the actual state (if a change is pending), so we don't prematurely clear status that represents the actual state.
Not for this PR but why can't we just stop updating the status in setDefaultPublishingStategy and let syncIngressControllerStatus take care of the status exclusively?
Most of the time yes. The only exception is maybe some defaulting logic, especially in setDefaultProviderParameters
that may need to be added to or maintained. I wanted to put a comment in setDefaultPublishingStategy
to prevent future developers from adding logic to it accidentally (like what happened with Floating IP). I might go ahead and do that.
If the type of the LB is changed by the user:
we save the desired state in spec
update the status to instruct the user to recreate the service
keep the LB type in status the same as in the service's annotation
Yes, in a perfect world, but to be clear, this PR doesn't keep the LB type in status the same as in the service's annotation
, it actually immediately updates the LB type in the status to the desired state while it's pending. Hongan was asking the same question. API Team said that changing the behavior of a status field is incompatible, and we also use the LB Type status field for defaulting, which cause us to loose our defaulting. So this was a dead end.
Am I missing some hidden value in what setDefaultPublishingStategy is doing w.r.t. to the status?
Not really, but maybe some in memory defaulting to avoid nil pointer exceptions like I mentioned earlier.
9722b17
to
39cf30c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I still need to rebase due to the fact that #1147 introduced a spec
field that effectively a managedAtServiceCreationLBServiceAnnotations
, but not an annotation. So I will need to refactor.
managedLBServiceAnnotations = func() []string { | ||
result := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's a fair point. Updated to use sets.Set
currentVal, have := current.Annotations[annotation] | ||
expectedVal, want := expected.Annotations[annotation] | ||
if (want && (!have || currentVal != expectedVal)) || (have && !want) { | ||
changed = true | ||
changedAnnotations = append(changedAnnotations, annotation) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops good catch. The changed list is just for logging at the moment, but still important.
func loadBalancerServiceIsProgressing(ic *operatorv1.IngressController, service *corev1.Service, platform *configv1.PlatformStatus, subnetsAWSEnabled bool, eipAllocationsAWSEnabled bool) error { | ||
var errs []error | ||
// Tricky: Scope in the status indicates the *desired* scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking hypothetically? Or are you suggesting to change how LB scope spec/status work?
Generally speaking, @Miciah and I have mostly agreed that we should push that new API fields in the direction of spec == desired state and status == actual state. We did EIPAllocations, Subnets, AllowedSourceRanges, and OpenStack Floating IP in this manner (status == actual state).
We aren't going to be able to change the behavior of existing spec/status fields, but we can at least implement new ones in this way. For scope, the spec won't indicate the full desire (user desire + defaulting), so we can't change this without changing the behavior.
So yea, I think it's the right direction.
if platform.Type == configv1.AWSPlatformType && subnetsAWSEnabled { | ||
var ( | ||
// Tricky: Subnets in the status indicates the *actual* scope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subnets and EIPs status do reflect the actual state, but yes, we can also read it from the service annotation directly.
It doesn't really matter either way. I implemented this way because I liked the idea of putting the annotation into the IC status value, then using the status as a single point to access the actual value.
// clearIngressControllerInactiveAWSLBTypeParametersStatus clears AWS parameters for the inactive load balancer type, | ||
// unless there is a load balancer type change is pending. When a change is pending, setDefaultPublishingStrategy | ||
// sets the desired parameters for the new LB type. To avoid removing the desired state, we allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I originally did this status NLB/CLB parameters clearing in setDefaultPublishingStategy
, then I noticed during pending Scope changes that setDefaultPublishingStategy
was only keeping the desired parameters and deleting the actual parameter's status (it doesn't know or care about pending changes, it just know and cares about desired state).
So, we can't just clear the NLB/CLB parameters based on desired state, we need to know the actual state (if a change is pending), so we don't prematurely clear status that represents the actual state.
Not for this PR but why can't we just stop updating the status in setDefaultPublishingStategy and let syncIngressControllerStatus take care of the status exclusively?
Most of the time yes. The only exception is maybe some defaulting logic, especially in setDefaultProviderParameters
that may need to be added to or maintained. I wanted to put a comment in setDefaultPublishingStategy
to prevent future developers from adding logic to it accidentally (like what happened with Floating IP). I might go ahead and do that.
If the type of the LB is changed by the user:
we save the desired state in spec
update the status to instruct the user to recreate the service
keep the LB type in status the same as in the service's annotation
Yes, in a perfect world, but to be clear, this PR doesn't keep the LB type in status the same as in the service's annotation
, it actually immediately updates the LB type in the status to the desired state while it's pending. Hongan was asking the same question. API Team said that changing the behavior of a status field is incompatible, and we also use the LB Type status field for defaulting, which cause us to loose our defaulting. So this was a dead end.
Am I missing some hidden value in what setDefaultPublishingStategy is doing w.r.t. to the status?
Not really, but maybe some in memory defaulting to avoid nil pointer exceptions like I mentioned earlier.
Some links in `load_balancer_service.go` were invalid due to K8S documentation URL changes.
39cf30c
to
cc22531
Compare
e424420
to
32fba0a
Compare
/lgtm LGTM, holding for @Miciah to have a look. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Refactor to introduce a new map, `managedAtServiceCreationLBServiceAnnotations`, which tracks annotations that are managed by the operator but are only applied to the load balancer at the time of service creation. These annotations are also responsible for triggering the `Progressing: True` status when modified. This addition not only improves documentation of these annotations, but also simplifies the code. The `loadBalancerServiceAnnotationsChanged` function has been adapted to replace existing logic in `shouldRecreateLoadBalancer`, allowing the removal of the `scopeEqual` function as well.
Due to the AWS CCM leaking resources when the load balancer type is changed on a service, the cloud team is now blocking updates to the load balancer type. As a result, the Ingress Operator will require the service to be deleted and recreated when the Ingress Controller's load balancer type changes. This change introduces a `Progressing: True` status message when the load balancer type is modified, instructing the user on how to effectuate the change. This commit does not alter the function of the LB type in the status, which has always (incorrectly) represented the desired state. However, with this change, a new scenario is introduced where the desired LB type may differ from the actual state. Consequently, the LB type status may no longer reliably reflect the actual state as it previously did, potentially surprising users who mistakenly assumed it always indicated the current state.
IsProxyProtocolNeeded must get the LB Type from the service, as the status now may not accurately reflect the actual LB Type during an update. However, we can't rely only on the service because it doesn’t exist during the initial bootstrap (and potentially when it gets deleted). In that case, it's safe to use the status to determine the anticipated LB Type. Without this commit results in a misalignment of the proxy protocol value between the load balancer and the router deployment when an LB type change is pending. This commit will be squashed into the previous one, as both are dependent on each other. I've kept them separate for easier review.
Previously, the fix for OCPBUGS-38217 cleared inactive LB parameters in the status in setDefaultPublishingStrategy. This was valid at the time, as the desired LB type would always become the actual LB type. However, with the introduction of pending LB type changes, setDefaultPublishingStrategy was now prematurely clearing parameters for the current (but soon-to-be inactive) LB type. This caused issues because certain LB parameters, such as classicLoadBalancer.subnets or networkLoadBalancer.eipAllocations, reflect the actual state. As a result, setDefaultPublishingStrategy was clearing them too early, without knowing a type change was pending. The solution is to move the clearing logic to syncIngressControllerStatus in status.go, where it can detect pending LB type changes and avoid clearing parameters for the still-active LB type. This means that during a pending type change, both networkLoadBalancer and classicLoadBalancer parameters are present. This commit will be squashed into the previous one, as both are dependent on each other. I've kept them separate for easier review.
32fba0a
to
b27545b
Compare
@alebedev87 thanks for the review. Just added a unit test |
/lgtm |
/retest |
@gcs278: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Due to the AWS CCM leaking resources when the load balancer type is changed on a service, the cloud team is now blocking updates to the load balancer type. As a result, the Ingress Operator will require the service to be deleted and recreated when the Ingress Controller's load balancer type changes.
This change introduces a Progressing=True status message when the load balancer type is modified, instructing the user on how to effectuate the change. Additionally, the
service.beta.kubernetes.io/aws-load-balancer-type
annotation is now added to themanagedAtServiceCreationLBServiceAnnotations
map along with other annotations that require service deletion.This commit does not alter the function of the LB type in the status, which has always (incorrectly) represented the desired state. However, with this change, a new scenario is introduced where the desired LB type may differ from the actual state. Consequently, the LB type status may no longer reliably reflect the actual state as it previously did, potentially surprising users who mistakenly assumed it always indicated the current state.