add cilium netpol creation #1502

QuantumEnigmaa · 2024-02-05T14:17:53Z

Towards https://github.com/giantswarm/giantswarm/issues/29698

⚠️ This is still a work in progress.

This PR enables PMO to create a ciliumNetworkPolicy resource for each MC prometheus.

Checklist

I have:

Described why this change is being introduced
Separated out refactoring/reformatting in a dedicated PR
Updated changelog in CHANGELOG.md

QuantumEnigmaa · 2024-02-06T13:13:11Z

Issue with the prometheus resource :

cannot use corev1.ResourceRequirements{…} (value of type "k8s.io/api/core/v1".ResourceRequirements) as "k8s.io/api/core/v1".VolumeResourceRequirements value in struct literal

Copilot explanation :

"The error message you're seeing is indicating that you're trying to use a value of type corev1.ResourceRequirements where a value of type corev1.VolumeResourceRequirements is expected.

In the Kubernetes API, ResourceRequirements and VolumeResourceRequirements are different types and are not interchangeable. ResourceRequirements is used to set the CPU and memory requests and limits for a container, while VolumeResourceRequirements is used to set the storage requests and limits for a volume.

In your code, you're trying to set the storage requests for a volume, so you should be using VolumeResourceRequirements, not ResourceRequirements.

However, as of Kubernetes API version v1, there is no VolumeResourceRequirements type. Instead, you should use ResourceRequirements to set the storage requests for a volume, like you're doing in your code.

If you're seeing this error, it's possible that you're using a different version of the Kubernetes API where VolumeResourceRequirements is a valid type. In that case, you should check the documentation for that version of the API to see how to set the storage requests for a volume.

If you're using Kubernetes API version v1, your code is correct and you should not be seeing this error. In that case, the error might be caused by something else in your code. Could you provide more information about your code and the context in which this error is occurring?"

This occured because of the version change for k8s.io/api from v0.28.4 to v0.29.0-rc.1 when using go mod tidy

QuantumEnigmaa · 2024-02-06T16:07:26Z

When deploying it :

"stack":{"annotation":"Object 'Kind' is missing in 'unstructured object has no kind'","kind":"unknown","stack: [{"file":"/root/project/service/controller/resource/ciliumnetpol/create.go","line":26},[...]]

QuantumEnigmaa · 2024-02-07T13:15:07Z

PMO doesn't seem to like dynamic client :

panic: runtime error: invalid memory address or nil pointer dereference [recovered]                                                                                                                                               
        panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                   
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a3ae0d]                                                                                                                                                          
                                                                                                                                                                                                                                  
goroutine 212 [running]:                                                                                                                                                                                                          
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()                                                                                                                                            
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5                                                                                                                       
panic({0x1c9b2c0?, 0x32a5d90?})                                                                                                                                                                                                   
        /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:914 +0x21f                                                                                                                              
github.com/giantswarm/prometheus-meta-operator/v2/service/controller/resource/ciliumnetpol.(*Resource).EnsureCreated(0xc0005d41a0, {0x2297308, 0xc000a1ee40}, {0x1f4fde0, 0xc0006b0a00})                                          
        /root/project/service/controller/resource/ciliumnetpol/create.go:28 +0x12d                                                                                                                                                
github.com/giantswarm/operatorkit/v7/pkg/resource/wrapper/retryresource.(*basicResource).EnsureCreated.func1()

QuantumEnigmaa · 2024-02-07T15:59:05Z

There is some progress :

"stack":{"annotation":"ciliumnetworkpolicies.cilium.io \"gaia-prometheus\" is forbidden: User \"system:serviceaccount:monitoring:prometheus-meta-operator\" cannot get resource \"ciliumnetworkpolicies\" in API group \"cilium.io\" at the cluster scope","kind":"unknown"

QuantumEnigmaa · 2024-02-07T16:48:57Z

Adding the capabilities for cilium netpols in the clusterrole didn't change the error

helm/prometheus-meta-operator/templates/rbac.yaml

Co-authored-by: Fernando Ripoll <[email protected]>

QuantumEnigmaa · 2024-02-12T09:31:41Z

When trying to create the ciliumnetpol for WCs, PMO is panicking with a segmentation error :

2024-02-12T09:20:56Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "Cluster": {"name":"gj83r","namespace":"org-zirko"}, "namespace": "org-zirko", "name": "gj83r", "reconcileID": "13c22583-216b-48e5-87c5-3d2a277eb2f1"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19ffe0d]

goroutine 111 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1ca3220?, 0x32b5d50?})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:914 +0x21f
github.com/giantswarm/prometheus-meta-operator/v2/service/controller/resource/ciliumnetpol.(*Resource).EnsureCreated(0xc000358600, {0x22a0cb8, 0xc00034def0}, {0x1f4a000, 0xc0006bab60})
        /root/project/service/controller/resource/ciliumnetpol/create.go:28 +0x12d
github.com/giantswarm/operatorkit/v7/pkg/resource/wrapper/retryresource.(*basicResource).EnsureCreated.func1()
        /go/pkg/mod/github.com/giantswarm/operatorkit/[email protected]/pkg/resource/wrapper/retryresource/basic_resource.go:50 +0x39
github.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1()

QuantumEnigmaa · 2024-02-12T10:21:57Z

Tested it on gaia with a manually created WC and it works :

NAMESPACE          NAME                                                    AGE
gaia-prometheus    gaia-prometheus                                         22s
gj83r-prometheus   gj83r-prometheus                                        22s

zirco@Zirko-laptop:~/scripts$ k get ciliumnetworkpolicies.cilium.io -n gj83r-prometheus gj83r-prometheus -oyaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  creationTimestamp: "2024-02-12T10:19:26Z"
  generation: 1
  labels:
    app.kubernetes.io/name: prometheus
  name: gj83r-prometheus
  namespace: gj83r-prometheus
  resourceVersion: "1535543869"
  uid: 41cdf603-0732-4a65-964a-fa8a5f670562
spec:
  egress:
  - toEntities:
    - kube-apiserver
    - cluster
  - toEntities:
    - world
    toPorts:
    - ports:
      - port: "443"
      - port: "6443"
  endpointSelector:
    matchLabels:
      app.kubernetes.io/name: prometheus
  ingress:
  - fromEntities:
    - cluster
    toPorts:
    - ports:
      - port: "9090"

go.mod

service/controller/managementcluster/resource.go

service/controller/resource/ciliumnetpol/create.go

service/service.go

go.mod

QuentinBisson

Apart from the go client then it's alright

QuantumEnigmaa · 2024-02-12T11:19:58Z

Gonna do a last round of testing before merging

QuantumEnigmaa · 2024-02-12T11:30:58Z

So it still works :)

QuentinBisson · 2024-02-12T11:37:40Z

Can you test it on gerbil as well ?

QuantumEnigmaa · 2024-02-12T11:48:41Z

Seems to be working as well :

NAMESPACE           NAME                                                                   AGE
gerbil-prometheus   gerbil-prometheus                                                      6m42s
monitoring          prometheus                                                             6m52s

QuentinBisson · 2024-02-12T12:16:05Z

Can you check that alertmanager can communicate with all prometheus ? Also, I think that pormetheus cnp in monitoring is not needed but alright :)

QuantumEnigmaa · 2024-02-12T12:35:19Z

Can you check that alertmanager can communicate with all prometheus ?

From what I see, it seems alright as the alerts redirect to the right prometheus instance. Don't know if there are more checks to do 🤷

I think that pormetheus cnp in monitoring is not needed

Yeah but I don't see a quick way to avoid this in the code :/

QuentinBisson · 2024-02-12T12:42:05Z

The cnp is coming from the helmchart 🙈

Now for alertmanager you can open hubble using port-forwarding and check if there are any dropped connection in the monitoring ns and gerbil-prometheus :) That should be okay

QuantumEnigmaa · 2024-02-12T13:05:42Z

The cnp is coming from the helmchart 🙈

Oh yeah indeed ! I can remove it if you want :)

Concerning hubble :

QuentinBisson · 2024-02-12T13:16:24Z

If you do not see any dropped below then let's go

QuantumEnigmaa · 2024-02-12T13:21:16Z

Everything is "forwarded" :)

add cilium netpol creation

07adcac

QuantumEnigmaa self-assigned this Feb 5, 2024

QuantumEnigmaa force-pushed the add-cilium-netpol branch 2 times, most recently from b7788f8 to 5bc6db1 Compare February 5, 2024 15:50

fix code errors

7661b8d

QuantumEnigmaa force-pushed the add-cilium-netpol branch from 5bc6db1 to 7661b8d Compare February 5, 2024 15:51

remove unused libraries

f406834

QuantumEnigmaa force-pushed the add-cilium-netpol branch 4 times, most recently from c4b743d to ade7067 Compare February 5, 2024 16:37

fix go build error

1644d09

QuantumEnigmaa force-pushed the add-cilium-netpol branch from ade7067 to 1644d09 Compare February 6, 2024 08:02

QuantumEnigmaa added 3 commits February 6, 2024 12:13

use correct k8s client

2eaf4b4

merge with main

1633c76

add ciliumnetpol resource to managementcluster controller

d5625a4

QuantumEnigmaa added 2 commits February 6, 2024 14:19

fix prometheus resource

2a1a677

add missing kind and apiVersion to object

e2f1c6a

QuantumEnigmaa added 3 commits February 7, 2024 11:48

try with dynamic client

e317e83

add dynamic client to controller

b7aea77

merge with main

bb32750

add dynamic client definition in service

8448a1a

add cilium netpol in clusterrole

4648317

pipo02mix reviewed Feb 8, 2024

View reviewed changes

helm/prometheus-meta-operator/templates/rbac.yaml Outdated Show resolved Hide resolved

Update helm/prometheus-meta-operator/templates/rbac.yaml

15833da

Co-authored-by: Fernando Ripoll <[email protected]>

QuantumEnigmaa force-pushed the add-cilium-netpol branch from 37a687a to 986c0a1 Compare February 8, 2024 14:50

add cilium netpol creation to clusterapi controller

f402eb1

add dynamic client to clusterapi controller in service

900a025

QuantumEnigmaa marked this pull request as ready for review February 12, 2024 10:21

QuantumEnigmaa requested a review from a team as a code owner February 12, 2024 10:22