Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the reconciler for SubnetConnectionBindingMap #901

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wenyingd
Copy link
Contributor

@wenyingd wenyingd commented Nov 18, 2024

  1. Implement the reconciler for SubnetConnectionBindingMap, it may update the SubnetConnectionBindingMap status with condition ready is false if its dependent Subnet or SubnetSet is not ready (or realized) or it hits errors when realizing NSX SubnetConnectionBindingMaps. It updates the status with ready condition as true if it is successfully realized on NSX. The reconciler also watches the Subnet/SubnetSet CR events to sync the connection binding maps.
  2. The change also modifies the Subnet/SubnetSet reconciler to watch SubnetConnectionBindingMap CR events. If a Subnet/SubnetSet is used by a SubnetConnectionBindingMap, a finalizer is added on the corresponding Subnet/SubnetSet CR, and the finalizer is removed automatically if the CR is not used by any SubnetConnectionBindingMaps.

Test Done:
A child Subnet CR and a SubnetConnectionBindingMap CR are prepared,

# cat test.yaml 
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  name: subnet-child
  namespace: svc-tkg-domain-c10
spec:
  accessMode: Public
  subnetDHCPConfig:
    mode: DHCPDeactivated
  ipAddresses:
  - 192.168.200.0/24
---
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: SubnetConnectionBindingMap
metadata:
  name: binding1
  namespace: svc-tkg-domain-c10
spec:
  subnetName: subnet-child
  targetSubnetSetName: pod-default
  vlanTrafficTag: 201
---
  1. When the dependent child Subnet is not ready
# kubectl get subnetconnectionbindingmaps -n svc-tkg-domain-c10 -oyaml
apiVersion: v1
items:
- apiVersion: crd.nsx.vmware.com/v1alpha1
  kind: SubnetConnectionBindingMap
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"crd.nsx.vmware.com/v1alpha1","kind":"SubnetConnectionBindingMap","metadata":{"annotations":{},"name":"binding1","namespace":"svc-tkg-domain-c10"},"spec":{"subnetName":"subnet-child","targetSubnetSetName":"pod-default","vlanTrafficTag":201}}
    creationTimestamp: "2024-11-18T10:18:50Z"
    generation: 1
    name: binding1
    namespace: svc-tkg-domain-c10
    resourceVersion: "5527560"
    uid: 8abcf1c8-956b-45cc-bf18-5b4a33120a7a
  spec:
    subnetName: subnet-child
    targetSubnetSetName: pod-default
    vlanTrafficTag: 201
  status:
    conditions:
    - lastTransitionTime: "2024-11-18T10:19:35Z"
      message: Subnet CR subnet-child is not realized on NSX
      reason: DependencyNotReady
      status: "False"
      type: Ready
kind: List
metadata:
  resourceVersion: ""

After the SubnetConnectionBindingMap is successfully realized, its ready condition is updated as true

# kubectl get subnetbindings -n svc-tkg-domain-c10 -oyaml
apiVersion: v1
items:
- apiVersion: crd.nsx.vmware.com/v1alpha1
  kind: SubnetConnectionBindingMap
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"crd.nsx.vmware.com/v1alpha1","kind":"SubnetConnectionBindingMap","metadata":{"annotations":{},"name":"binding1","namespace":"svc-tkg-domain-c10"},"spec":{"subnetName":"subnet-child","targetSubnetSetName":"pod-default","vlanTrafficTag":201}}
    creationTimestamp: "2024-11-19T03:02:48Z"
    generation: 1
    name: binding1
    namespace: svc-tkg-domain-c10
    resourceVersion: "669501"
    uid: dcfbf90e-f20a-4b9e-b980-41f04eda1f4f
  spec:
    subnetName: subnet-child
    targetSubnetSetName: pod-default
    vlanTrafficTag: 201
  status:
    conditions:
    - lastTransitionTime: "2024-11-19T03:20:16Z"
      status: "True"
      type: Ready
kind: List
metadata:
  resourceVersion: ""
  1. The finalizer is added on the dependent subnet and/or subnetset,
# kubectl get subnets -n svc-tkg-domain-c10 -oyaml subnet-child
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"crd.nsx.vmware.com/v1alpha1","kind":"Subnet","metadata":{"annotations":{},"name":"subnet-child","namespace":"svc-tkg-domain-c10"},"spec":{"accessMode":"Public","ipAddresses":["192.168.200.0/24"],"subnetDHCPConfig":{"mode":"DHCPDeactivated"}}}
  creationTimestamp: "2024-11-19T04:23:28Z"
  finalizers:
  - subnet.nsx.vmware.com/finalizer
  generation: 2
  name: subnet-child
  namespace: svc-tkg-domain-c10
  resourceVersion: "759588"
  uid: 6cf18f78-5363-4e0c-8765-f039b20623e0
spec:
  accessMode: Public
  ipAddresses:
  - 192.168.200.0/24
  ipv4SubnetSize: 16
  subnetDHCPConfig:
    mode: DHCPDeactivated
status:
 ....


# kubectl get subnetsets -n svc-tkg-domain-c10 -oyaml pod-default
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: SubnetSet
metadata:
  creationTimestamp: "2024-11-18T10:53:14Z"
  finalizers:
  - subnetset.nsx.vmware.com/finalizer
  generation: 2
  labels:
    nsxoperator.vmware.com/default-subnetset-for: Pod
  name: pod-default
  namespace: svc-tkg-domain-c10
  resourceVersion: "759589"
  uid: 8e223cfd-fd47-4011-9a14-11debe823d5d
spec:
  accessMode: PrivateTGW
  ipv4SubnetSize: 16
  subnetDHCPConfig: {}
status:
  ...
  1. If deleting the dependent subnet (child subnet in the example), it is blocking and adding a condition deletionFailed with reason SubnetInUse
# kubectl delete subnets -n svc-tkg-domain-c10 subnet-child
subnet.crd.nsx.vmware.com "subnet-child" deleted
^C
# kubectl get subnets -n svc-tkg-domain-c10 -oyaml subnet-child
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: Subnet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"crd.nsx.vmware.com/v1alpha1","kind":"Subnet","metadata":{"annotations":{},"name":"subnet-child","namespace":"svc-tkg-domain-c10"},"spec":{"accessMode":"Public","ipAddresses":["192.168.200.0/24"],"subnetDHCPConfig":{"mode":"DHCPDeactivated"}}}
  creationTimestamp: "2024-11-19T04:23:28Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2024-11-19T05:35:13Z"
  finalizers:
  - subnet.nsx.vmware.com/finalizer
  generation: 3
  name: subnet-child
  namespace: svc-tkg-domain-c10
  resourceVersion: "761558"
  uid: 6cf18f78-5363-4e0c-8765-f039b20623e0
spec:
  accessMode: Public
  ipAddresses:
  - 192.168.200.0/24
  ipv4SubnetSize: 16
  subnetDHCPConfig:
    mode: DHCPDeactivated
status:
  conditions:
  - lastTransitionTime: "2024-11-19T04:23:29Z"
    message: NSX Subnet with DHCPDeactivated has been successfully created/updated
    reason: SubnetReady
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-11-19T05:35:13Z"
    message: Subnet is used by SubnetConnectionBindingMap binding1 and not able to
      delete
    reason: SubnetInUse
    status: "True"
    type: DeletionFailed
  gatewayAddresses:
  - 192.168.200.1/24
  networkAddresses:
  - 192.168.200.0/24
  1. After the SubnetConnectionBindingMap is deleted, the Subnet is deleted, and the finalizers are removed automatically.
# kubectl delete subnetbindings binding1 -n svc-tkg-domain-c10
subnetconnectionbindingmap.crd.nsx.vmware.com "binding1" deleted

# kubectl get subnets -n svc-tkg-domain-c10 -oyaml subnet-child
Error from server (NotFound): subnets.crd.nsx.vmware.com "subnet-child" not found

# kubectl get subnetsets -n svc-tkg-domain-c10 -oyaml pod-default
apiVersion: crd.nsx.vmware.com/v1alpha1
kind: SubnetSet
metadata:
  creationTimestamp: "2024-11-18T10:53:14Z"
  generation: 2
  labels:
    nsxoperator.vmware.com/default-subnetset-for: Pod
  name: pod-default
  namespace: svc-tkg-domain-c10
  resourceVersion: "763022"
  uid: 8e223cfd-fd47-4011-9a14-11debe823d5d
spec:
  accessMode: PrivateTGW
  ipv4SubnetSize: 16
  subnetDHCPConfig: {}
status:
  ...
  1. SubnetSet scale-out and scale-in.
    After scale-out the target SubnetSet, the number of generated NSX SubnetConnectionBindingMaps is increased from 1 to 2.
    After scale-in the target SubnetSet, the number of generated NSX SubnetConnectionBindingMaps is decreased from 2 to 1,
    No changes was updated to SubnetConnectionBindingMap CR unless errors occurred.

  2. If the dependent SubnetSet has no NSX SubnetPort, and all NSX VpcSubnets are gc, the SubnetConnectionBindingMap CR status is changed to not-ready.

# kubectl describe subnetbindings -n svc-tkg-domain-c10 binding1 
Name:         binding1
Namespace:    svc-tkg-domain-c10
Labels:       <none>
Annotations:  <none>
API Version:  crd.nsx.vmware.com/v1alpha1
Kind:         SubnetConnectionBindingMap
Metadata:
  Creation Timestamp:  2024-11-19T06:57:29Z
  Generation:          1
  Resource Version:    866270
  UID:                 54dce36d-871e-4325-85fa-4ab523ad9f3b
Spec:
  Subnet Name:             subnet-child
  Target Subnet Set Name:  pod-default
  Vlan Traffic Tag:        201
Status:
  Conditions:
    Last Transition Time:  2024-11-19T08:11:28Z
    Message:               SubnetSet CR pod-default is not realized on NSX
    Reason:                DependencyNotReady
    Status:                False
    Type:                  Ready
Events:
  Type     Reason            Age                From                                   Message
  ----     ------            ----               ----                                   -------
  Normal   SuccessfulUpdate  38m                subnetconnectionbindingmap-controller  SubnetConnectionBindingMap CR has been successfully updated
  Warning  FailUpdate        2s (x13 over 22s)  subnetconnectionbindingmap-controller  no existing NSX VpcSubnet created by SubnetSet CR 'svc-tkg-domain-c10/pod-default'

@codecov-commenter
Copy link

codecov-commenter commented Nov 19, 2024

Codecov Report

Attention: Patch coverage is 74.76556% with 296 lines in your changes missing coverage. Please review.

Project coverage is 73.42%. Comparing base (1cf8cb7) to head (000adf5).

Files with missing lines Patch % Lines
pkg/controllers/subnetset/subnetset_controller.go 34.69% 57 Missing and 7 partials ⚠️
pkg/controllers/subnet/subnet_controller.go 8.33% 50 Missing and 5 partials ⚠️
pkg/controllers/subnetbinding/controller.go 84.01% 37 Missing and 2 partials ⚠️
pkg/controllers/subnetset/subnetbinding_handler.go 55.71% 26 Missing and 5 partials ⚠️
pkg/nsx/services/subnetbinding/tree.go 81.21% 21 Missing and 10 partials ⚠️
pkg/controllers/subnet/subnetbinding_handler.go 59.72% 25 Missing and 4 partials ⚠️
pkg/nsx/services/subnetbinding/store.go 79.24% 19 Missing and 3 partials ⚠️
cmd/main.go 0.00% 9 Missing ⚠️
pkg/controllers/subnetbinding/subnets_handler.go 88.57% 6 Missing and 2 partials ⚠️
pkg/controllers/common/dependency_watcher.go 94.23% 2 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #901      +/-   ##
==========================================
+ Coverage   73.38%   73.42%   +0.03%     
==========================================
  Files         109      119      +10     
  Lines       15248    16366    +1118     
==========================================
+ Hits        11190    12016     +826     
- Misses       3307     3560     +253     
- Partials      751      790      +39     
Flag Coverage Δ
unit-tests 73.42% <74.76%> (+0.03%) ⬆️
Files with missing lines Coverage Δ
pkg/clean/clean.go 77.58% <100.00%> (+1.00%) ⬆️
pkg/nsx/client.go 83.69% <100.00%> (+0.27%) ⬆️
pkg/nsx/services/common/types.go 100.00% <ø> (ø)
pkg/nsx/services/subnetbinding/builder.go 100.00% <100.00%> (ø)
pkg/util/utils.go 85.71% <100.00%> (+0.17%) ⬆️
pkg/nsx/services/subnetbinding/compare.go 87.50% <87.50%> (ø)
pkg/controllers/common/dependency_watcher.go 94.23% <94.23%> (ø)
pkg/nsx/services/subnetbinding/subnetbinding.go 97.76% <97.76%> (ø)
pkg/controllers/subnetbinding/subnets_handler.go 88.57% <88.57%> (ø)
cmd/main.go 0.00% <0.00%> (ø)
... and 7 more
---- 🚨 Try these New Features:

// Try to delete stale NSX SubnetConnectionBindingMaps if exists
if err := r.SubnetBindingService.DeleteSubnetConnectionBindingMapsByCRName(req.Name, req.Namespace); err != nil {
log.Error(err, "Failed to delete NSX SubnetConnectionBindingMap", "SubnetConnectionBindingMap", req.NamespacedName)
return ResultRequeue, nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns nil in Reconcile function even if an error happened because we want to use the ResultRequeue to control the delay to re-queue the request rather than the framework's default value. Otherwise, it may print such logs.

2024-11-19 08:16:55.803 INFO    controller/controller.go:314    Warning: Reconciler returned both a non-zero result and a non-nil error. The result will always be ignored if the error is non-nil and the non-nil error causes reqeueuing with exponential backoff. For more details, see: https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/reconcile#Reconciler        {"controller": "subnetconnectionbindingmap", "controllerGroup": "crd.nsx.vmware.com", "controllerKind": "SubnetConnectionBindingMap", "SubnetConnectionBindingMap": {"name":"binding1","namespace":"svc-tkg-domain-c10"}, "namespace": "svc-tkg-domain-c10", "name": "binding1", "reconcileID": "b504bdcb-9a6b-4009-bc02-f00db6a9dfd1"}
2024-11-19 08:16:55.803 ERROR   controller/controller.go:316    Reconciler error        {"controller": "subnetconnectionbindingmap", "controllerGroup": "crd.nsx.vmware.com", "controllerKind": "SubnetConnectionBindingMap", "SubnetConnectionBindingMap": {"name":"binding1","namespace":"svc-tkg-domain-c10"}, "namespace": "svc-tkg-domain-c10", "name": "binding1", "reconcileID": "b504bdcb-9a6b-4009-bc02-f00db6a9dfd1", "error": "no existing NSX VpcSubnet created by SubnetSet CR 'svc-tkg-domain-c10/pod-default'"}

@wenyingd wenyingd force-pushed the subnetbinding branch 3 times, most recently from 0a603b9 to 15795be Compare November 21, 2024 07:52
Copy link

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not finished.

pkg/nsx/services/common/types.go Outdated Show resolved Hide resolved
pkg/nsx/client.go Outdated Show resolved Hide resolved
Comment on lines +81 to +86
if !IsObjectReady(newBindingMap.Status.Conditions) {
return false
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved before line 78 to return earlier with less checks in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shall support two cases with readiness change: 1) unready->ready, 2) ready->unready.
I use it here is to support the third case that binding map is ready, but its spec has changed.

Comment on lines 90 to 92
CreateFunc: func(e event.CreateEvent) bool {
return false
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one and GenericFunc can be removed, the default is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having checked the code in the library used by nsx-operator, the default return value is true if we don't explicitly provide the functions.

pkg/controllers/common/dependency_watcher_test.go Outdated Show resolved Hide resolved
pkg/controllers/subnet/subnet_controller.go Outdated Show resolved Hide resolved
Comment on lines +298 to +303
if msg != "" {
newConditions[0].Message = msg
}
if reason != "" {
newConditions[0].Reason = reason
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's ok to assign directly without checking since the default value of string is "", it won't impact the final result if we assign it when msg and reason are empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the empty will overwrite the default values in line 290-296.

)

func requeueSubnetByBindingMapUpdate(ctx context.Context, c client.Client, objOld client.Object, objNew client.Object, q workqueue.TypedRateLimitingInterface[reconcile.Request]) {
bindingMap := objNew.(*v1alpha1.SubnetConnectionBindingMap)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bindingMap := objNew.(*v1alpha1.SubnetConnectionBindingMap)
newBM := objNew.(*v1alpha1.SubnetConnectionBindingMap)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Comment on lines 40 to 48
reasonDependencyNotReady = "DependencyNotReady"
reasonConfigureFailure = "ConfigureFailed"
msgGetSubnetCR = "Unable to get Subnet CR %s"
msgGetSubnetSetCR = "Unable to get SubnetSet CR %s"
msgGetNSXSubnetsBySubnet = "Subnet CR %s is not realized on NSX"
msgGetNSXSubnetsBySubnetSet = "SubnetSet CR %s is not realized on NSX"
msgChildWorkAsParent = "Subnet CR %s is working as target by %s"
msgParentWorkAsChild = "Target Subnet CR %s is attached by %s"
msgRealizeSubnetBinding = "Failed to realize SubnetConnectionBindingMap %s on NSX"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, please check if it's necessary to define these in global variables.
I would avoid define them here if it's used once in one function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, although I prefer to place such CR condition messages/reasons together to help quickly understand what conditions may exist for this CR.


var (
log = &logger.Log
MetricResTypeSubnetConnectionBindingMap = common.MetricResTypeSubnetConnectionBindingMap

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this assignment needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned from other nsx services in nsx-operator.

1. Implement the reconciler for SubnetConnectionBindingMap, it may update the
SubnetConnectionBindingMap status with condition ready is false if its dependent
Subnet or SubnetSet is not ready (or realized) or it hits errors when realizing
NSX SubnetConnectionBindingMaps. It updates the status with ready condition as
true if it is successfully realized on NSX. The reconciler also watches the
Subnet/SubnetSet CR events to sync the connection binding maps.
2. The change also modifies the Subnet/SubnetSet reconciler to watch
SubnetConnectionBindingMap CR events. If a Subnet/SubnetSet is used by a
SubnetConnectionBindingMap, a finalizer is added on the corresponding
Subnet/SubnetSet CR, and the finalizer is removed automatically if the CR is not
used by any SubnetConnectionBindingMaps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants