-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-43745: Add support for IdleCloseTerminationPolicy #1166
base: master
Are you sure you want to change the base?
OCPBUGS-43745: Add support for IdleCloseTerminationPolicy #1166
Conversation
@frobware: This pull request references Jira Issue OCPBUGS-43745, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold Many commits should never merge. |
a7dea46
to
991d367
Compare
/testwith openshift/cluster-ingress-operator/release-4.18/e2e-aws-operator openshift/router#639 |
@frobware,
|
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639 |
991d367
to
052f6b6
Compare
052f6b6
to
0aa6244
Compare
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639 |
0aa6244
to
1072f8f
Compare
f89fbb2
to
9707f72
Compare
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639 |
9707f72
to
77673b3
Compare
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639 |
77673b3
to
3029807
Compare
/assign |
/assign @grzpiotrowski |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639 |
650e796
to
6449f4c
Compare
/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639 |
Add a `/custom-response` endpoint that returns a configurable response via the `CUSTOM_RESPONSE` environment variable.
Pickup openshift/api#2102 % git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc Merge: de9de05a8 b7417509c Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com> Date: Wed Dec 18 10:31:50 2024 +0000 Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec Vendoring steps: $ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc $ go mod tidy $ go mod vendor $ make update
Introduce logic in desiredRouterDeployment to set the environment variable `ROUTER_IDLE_CLOSE_ON_RESPONSE` when the `IdleConnectionTerminationPolicy` field in the IngressController spec is set to `Deferred`. This change enables configuring HAProxy with the `idle-close-on-response` option for better control over idle connection termination behaviour.
6449f4c
to
60b3714
Compare
openshift/api#2102 has merged. Vendor bump in 6a882ac. This is ready for review but needs openshift/router#639 for the new e2e test to pass. /hold cancel |
/jira refresh |
@frobware: This pull request references Jira Issue OCPBUGS-43745, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@frobware: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change LGTM. All the comments are about the e2e.
expectedBackendName := fmt.Sprintf("be_http:%s:%s", routeName.Namespace, routeName.Name) | ||
expectedServerName := fmt.Sprintf("pod:%s:%s:http:%s:%d", tc.pods[serviceIndex].Name, service.Name, tc.pods[serviceIndex].Status.PodIP, service.Spec.Ports[0].Port) | ||
|
||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) | ||
defer cancel() | ||
if err := waitForHAProxyConfigUpdate(ctx, t, ic, expectedBackendName, expectedServerName); err != nil { | ||
return nil, fmt.Errorf("error waiting for HAProxy configuration update for route %s to point to service %s/%s: %w", routeName, service.Namespace, service.Name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That adds a lot of moving parts (== code) to the test. Also, will it work in DCM case? The server address may be updated dynamically (I have to check how DCM behaves during a server switch). May be a simple time.Sleep(6 * time.Seconds)
would be enough (and would suit the DCM case too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all good points. Given this is a parallel test I could just sleep 30s (for busy clusters).
podList, err := getPods(t, kclient, deployment) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to fetch pods for deployment %s/%s: %w", deployment.Namespace, deployment.Name, err) | ||
} | ||
|
||
if len(podList.Items) == 0 { | ||
return nil, fmt.Errorf("no pods in deployment %s/%s", deployment.Namespace, deployment.Name) | ||
} | ||
|
||
for i := range podList.Items { | ||
tc.pods = append(tc.pods, &podList.Items[i]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this for the config check? I'm asking because the check for Complete
deployment guarantees us available pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was local assertions, or post-conditions. Exiting this function should have setup everything required by the remainder of the test code.
if err := waitForDeploymentComplete(t, kclient, deployment, 2*time.Minute); err != nil { | ||
return nil, fmt.Errorf("deployment %s/%s is not ready: %w", deployment.Namespace, deployment.Name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do a check for the deployment Complete
in idleConnectionCreateBackendService
already.
if len(tc.deployments) != 2 { | ||
return nil, fmt.Errorf("expected 2 deployments, but got %d", len(tc.deployments)) | ||
} | ||
|
||
if len(tc.services) != 2 { | ||
return nil, fmt.Errorf("expected 2 services, but got %d", len(tc.services)) | ||
} | ||
|
||
if len(tc.pods) != 2 { | ||
return nil, fmt.Errorf("expected 2 pods, but got %d", len(tc.pods)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks seem redundant, if we reached this point in code it means that the API server assured us that these objects are created in idleConnectionCreateBackendService
.
if i == 0 && policy == currentIdleTerminationPolicy { | ||
t.Logf("Skipping initial policy switch as current policy %q already matches %q", currentIdleTerminationPolicy, policy) | ||
} else { | ||
if err := idleConnectionSwitchIdleTerminationPolicy(t, ic, icName, policy); err != nil { | ||
t.Fatalf("failed to switch to policy %q: %v", policy, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking whether 2 test cases (1 for Immediate
and 1 for Deferred
) would be a simpler approach. There would be no need 1) for idleConnectionSwitchIdleTerminationPolicy
function, 2) to care for the order of the policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a lot of repetition to have two test cases for 10 lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it’s important to test the case where the termination policy is switched. If we had two private ICs pre-configured with Deferred and Immediate policies, we would never perform an end-to-end test to validate the behaviour of switching between them.
return nil | ||
} | ||
|
||
func idleConnectionCreateDeployment(ctx context.Context, namespace string, serviceNumber int, serverResponse, image string) (*appsv1.Deployment, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want your test workload to be a Deployment
for a resilience reason? I'm asking because build*Pod
functions from utils_test.go (example) are used in most cases and look shorter.
{Name: "TLS_CERT", Value: "/etc/serving-cert/tls.crt"}, | ||
{Name: "TLS_KEY", Value: "/etc/serving-cert/tls.key"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need all the hustle with the serving cert secret if at the end we hit the plain http port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I rely on the server in test/http2 which requires certificates for the server that listens on port https.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now. Does it make sense to make the listening on HTTPS port conditional (== don't listen on HTTPS if envvar is not provided)?
Introduce logic in desiredRouterDeployment to set the environmentvariable
ROUTER_IDLE_CLOSE_ON_RESPONSE
when theIdleConnectionTerminationPolicy
field in the IngressController spec is set toDeferred
. This change enables configuring HAProxy with theidle-close-on-response
option for better control over idle connection termination behaviour.Requires: