Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-31550: Gateway API - recreating SMCP which breaks Gateway API #1115

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

anirudhAgniRedhat
Copy link
Contributor

Recreating the SMCP and subscription, in case the SMCP or subscriptions are deleted manually.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2024
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Aug 1, 2024
@openshift-ci-robot
Copy link
Contributor

@anirudhAgniRedhat: This pull request references Jira Issue OCPBUGS-31550, which is invalid:

  • expected the bug to target the "4.17.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Recreating the SMCP and subscription, in case the SMCP or subscriptions are deleted manually.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 1, 2024
@anirudhAgniRedhat anirudhAgniRedhat changed the title [WIP] OCPBUGS-31550: Gateway API - recreating SMCP which breaks Gateway API OCPBUGS-31550: Gateway API - recreating SMCP which breaks Gateway API Aug 1, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2024
@anirudhAgniRedhat
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@anirudhAgniRedhat: This pull request references Jira Issue OCPBUGS-31550, which is invalid:

  • expected the bug to target the "4.17.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@anirudhAgniRedhat
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@anirudhAgniRedhat: This pull request references Jira Issue OCPBUGS-31550, which is invalid:

  • expected the bug to target the "4.17.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@anirudhAgniRedhat
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Aug 1, 2024
@openshift-ci-robot
Copy link
Contributor

@anirudhAgniRedhat: This pull request references Jira Issue OCPBUGS-31550, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Aug 1, 2024
@openshift-ci openshift-ci bot requested a review from lihongan August 1, 2024 12:47
@anirudhAgniRedhat anirudhAgniRedhat force-pushed the OCPBUGS-31550-deleting_SMCP_breaks_Gateway_API branch from 33e66d8 to 93c20b7 Compare August 7, 2024 07:24
@anirudhAgniRedhat
Copy link
Contributor Author

/hold
need to add watch

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2024
@anirudhAgniRedhat anirudhAgniRedhat force-pushed the OCPBUGS-31550-deleting_SMCP_breaks_Gateway_API branch 7 times, most recently from 41c7e73 to 9fe51e2 Compare August 9, 2024 12:09
@rfredette
Copy link
Contributor

Hmm, on my cluster, adding a watch for SMCPs when the operator starts is causing some issues.
The operator logs are hitting the error that the ServiceMeshControlPlane.maistra.io CRD doesn't exist:

2024-08-09T18:41:15.444Z ERROR operator.init.controller-runtime.source.EventHandler wait/loop.go:87 if kind is a CRD, it should be installed before calling Start{"kind": "ServiceMeshControlPlane.maistra.io", "error": "failed to get restmapping: no matches for kind \"ServiceMeshControlPlane\" in group \"maistra.io\""}

The OSSM operator installs the maistra.io CRDs, but because of this error, the gatewayclass controller never starts, and therefore it never creates the subscription for the OSSM operator.

We need to create the SMCP watch after the OSSM operator is installed. I messed around with the code and got a version where it seems to work. It may still need some polishing, but this is what I changed on my end: https://gist.github.com/rfredette/366a1f36ed4cf7425a5cc0f4ef270a24

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the OCPBUGS-31550-deleting_SMCP_breaks_Gateway_API branch 2 times, most recently from 57d1c8d to addca2f Compare August 12, 2024 10:09
@anirudhAgniRedhat
Copy link
Contributor Author

@rfredette Added your suggestions Also added the watch to openshift-operator namespace as it was not reconciling on the given type.
PLMK if you think it should be done any other way??

@anirudhAgniRedhat
Copy link
Contributor Author

/assign @candita

@anirudhAgniRedhat
Copy link
Contributor Author

/retest

@candita
Copy link
Contributor

candita commented Aug 23, 2024

/test e2e-aws-gatewayapi

@candita
Copy link
Contributor

candita commented Aug 23, 2024

@anirudhAgniRedhat please add an e2e test for this to the existing e2e-aws-gatewayapi test.

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the OCPBUGS-31550-deleting_SMCP_breaks_Gateway_API branch from addca2f to ef6dbff Compare August 23, 2024 20:46
Copy link
Contributor

openshift-ci bot commented Aug 23, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from candita. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@anirudhAgniRedhat anirudhAgniRedhat force-pushed the OCPBUGS-31550-deleting_SMCP_breaks_Gateway_API branch from ef6dbff to bab5407 Compare August 23, 2024 20:48
@anirudhAgniRedhat
Copy link
Contributor Author

Hey @candita Added the e2e tests in testGatewayAPIIstioInstallation tests which is passing.
I think test testGatewayAPIObjects which is failing is irrespective of my change and is failing across different PRs.
PTAL Thanks!!

@gcs278
Copy link
Contributor

gcs278 commented Sep 3, 2024

DNS Propagation bug should be fixed, retesting gatewayapi:
/test e2e-aws-gatewayapi

@anirudhAgniRedhat
Copy link
Contributor Author

Thanks @gcs278 for the update!!!
/test e2e-azure-ovn

@candita
Copy link
Contributor

candita commented Sep 18, 2024

/assign @rfredette

@Miciah Miciah added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Nov 13, 2024
@anirudhAgniRedhat anirudhAgniRedhat force-pushed the OCPBUGS-31550-deleting_SMCP_breaks_Gateway_API branch from bab5407 to bd84ff9 Compare December 4, 2024 08:37
@rhamini3
Copy link

rhamini3 commented Dec 5, 2024

pre-merge-verified

1. create the gatewayclass and gateway 
oc create -f -<<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: GatewayClass
metadata:
  name: openshift-default
spec:
  controllerName: openshift.io/gateway-controller
EOF

oc create -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: gateway
  namespace: openshift-ingress
spec:
  gatewayClassName: openshift-default
  listeners:
  - name: http
    hostname: "*.$gwapi_domain"
    port: 80
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: All
  - name: https
    hostname: "*.$gwapi_domain"
    port: 443
    protocol: HTTPS
    tls:
      mode: Terminate
      certificateRefs:
      - name: gwapi-wildcard
    allowedRoutes:
      namespaces:
        from: All
EOF

2. Delete servicemeshcontrolplane and check if it is recreated % oc delete servicemeshcontrolplane openshift-gateway -n openshift-ingress servicemeshcontrolplane.maistra.io "openshift-gateway" deleted iamin@iamin-mac Downloads % oc get servicemeshcontrolplane --all-namespaces NAMESPACE NAME READY STATUS PROFILES VERSION AGE openshift-ingress openshift-gateway 5/5 ComponentsReady ["default"] 2.6.3 6s

3. create the HTTProute and check whether it is properly created and connected to the gateway
oc create -f - <<EOF                                               
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: myroute1
spec:
  parentRefs:
  - name: gateway
    namespace: openshift-ingress
  hostnames: ["test.$gwapi_domain"]
  rules:
  - backendRefs:
    - name: service-unsecure
      port: 27017
EOF
httproute.gateway.networking.k8s.io/myroute1 created

% oc get httproute myroute1 -oyaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  creationTimestamp: "2024-12-05T17:40:06Z"
  generation: 1
  name: myroute1
  namespace: default
  resourceVersion: "56779"
  uid: 1fc69414-c4ff-43d6-9dcb-61c3e6f552fb
spec:
  hostnames:
  - test.gwapi.ci-ln-xmn87rt-76ef8.aws-2.ci.openshift.org
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: gateway
    namespace: openshift-ingress
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: service-unsecure
      port: 27017
      weight: 1
    matches:
    - path:
        type: PathPrefix
        value: /
status:
  parents:
  - conditions:
    - lastTransitionTime: "2024-12-05T17:40:06Z"
      message: Route was valid, bound to 2 parents
      observedGeneration: 1
      reason: Accepted
      status: "True"
      type: Accepted
    - lastTransitionTime: "2024-12-05T17:40:06Z"
      message: All references resolved <--
      observedGeneration: 1
      reason: ResolvedRefs
      status: "True" <--
      type: ResolvedRefs
    controllerName: openshift.io/gateway-controller
    parentRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: gateway
      namespace: openshift-ingress

Route is bound to the gateway and SMCP is recreated after controller reconciles
Mark as verified
/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Dec 5, 2024
@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Dec 5, 2024
@openshift-ci-robot
Copy link
Contributor

@anirudhAgniRedhat: This pull request references Jira Issue OCPBUGS-31550, which is invalid:

  • expected the bug to target either version "4.19." or "openshift-4.19.", but it targets "4.17.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Recreating the SMCP and subscription, in case the SMCP or subscriptions are deleted manually.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@anirudhAgniRedhat
Copy link
Contributor Author

/retest

Copy link
Contributor

openshift-ci bot commented Dec 6, 2024

@anirudhAgniRedhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-gatewayapi bd84ff9 link false /test e2e-aws-gatewayapi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

errs = append(errs, err)
} else {
r.startSMCPWatch.Do(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to start a watch every time we reconcile?

Copy link
Contributor Author

@anirudhAgniRedhat anirudhAgniRedhat Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @candita, We are using sync.Once here, to start the watch for SMCP, which should run only once. So, this should not run on every reconciliation.

Since we need to create the SMCP watch after the OSSM operator is installed, we cannot start the watch as usual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. priority/backlog Higher priority than priority/awaiting-more-evidence. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants