Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cilium netpol creation #1502

Merged
merged 23 commits into from
Feb 12, 2024
Merged

add cilium netpol creation #1502

merged 23 commits into from
Feb 12, 2024

Conversation

QuantumEnigmaa
Copy link
Contributor

Towards https://github.com/giantswarm/giantswarm/issues/29698

⚠️ This is still a work in progress.

This PR enables PMO to create a ciliumNetworkPolicy resource for each MC prometheus.

Checklist

I have:

  • Described why this change is being introduced
  • Separated out refactoring/reformatting in a dedicated PR
  • Updated changelog in CHANGELOG.md

@QuantumEnigmaa QuantumEnigmaa self-assigned this Feb 5, 2024
@QuantumEnigmaa QuantumEnigmaa force-pushed the add-cilium-netpol branch 2 times, most recently from b7788f8 to 5bc6db1 Compare February 5, 2024 15:50
@QuantumEnigmaa QuantumEnigmaa force-pushed the add-cilium-netpol branch 4 times, most recently from c4b743d to ade7067 Compare February 5, 2024 16:37
@QuantumEnigmaa
Copy link
Contributor Author

QuantumEnigmaa commented Feb 6, 2024

Issue with the prometheus resource :

cannot use corev1.ResourceRequirements{…} (value of type "k8s.io/api/core/v1".ResourceRequirements) as "k8s.io/api/core/v1".VolumeResourceRequirements value in struct literal

Copilot explanation :

"The error message you're seeing is indicating that you're trying to use a value of type corev1.ResourceRequirements where a value of type corev1.VolumeResourceRequirements is expected.

In the Kubernetes API, ResourceRequirements and VolumeResourceRequirements are different types and are not interchangeable. ResourceRequirements is used to set the CPU and memory requests and limits for a container, while VolumeResourceRequirements is used to set the storage requests and limits for a volume.

In your code, you're trying to set the storage requests for a volume, so you should be using VolumeResourceRequirements, not ResourceRequirements.

However, as of Kubernetes API version v1, there is no VolumeResourceRequirements type. Instead, you should use ResourceRequirements to set the storage requests for a volume, like you're doing in your code.

If you're seeing this error, it's possible that you're using a different version of the Kubernetes API where VolumeResourceRequirements is a valid type. In that case, you should check the documentation for that version of the API to see how to set the storage requests for a volume.

If you're using Kubernetes API version v1, your code is correct and you should not be seeing this error. In that case, the error might be caused by something else in your code. Could you provide more information about your code and the context in which this error is occurring?"

This occured because of the version change for k8s.io/api from v0.28.4 to v0.29.0-rc.1 when using go mod tidy

@QuantumEnigmaa
Copy link
Contributor Author

When deploying it :

"stack":{"annotation":"Object 'Kind' is missing in 'unstructured object has no kind'","kind":"unknown","stack: [{"file":"/root/project/service/controller/resource/ciliumnetpol/create.go","line":26},[...]]

@QuantumEnigmaa
Copy link
Contributor Author

QuantumEnigmaa commented Feb 7, 2024

PMO doesn't seem to like dynamic client :

panic: runtime error: invalid memory address or nil pointer dereference [recovered]                                                                                                                                               
        panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                   
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a3ae0d]                                                                                                                                                          
                                                                                                                                                                                                                                  
goroutine 212 [running]:                                                                                                                                                                                                          
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()                                                                                                                                            
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5                                                                                                                       
panic({0x1c9b2c0?, 0x32a5d90?})                                                                                                                                                                                                   
        /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:914 +0x21f                                                                                                                              
github.com/giantswarm/prometheus-meta-operator/v2/service/controller/resource/ciliumnetpol.(*Resource).EnsureCreated(0xc0005d41a0, {0x2297308, 0xc000a1ee40}, {0x1f4fde0, 0xc0006b0a00})                                          
        /root/project/service/controller/resource/ciliumnetpol/create.go:28 +0x12d                                                                                                                                                
github.com/giantswarm/operatorkit/v7/pkg/resource/wrapper/retryresource.(*basicResource).EnsureCreated.func1()

@QuantumEnigmaa
Copy link
Contributor Author

There is some progress :

"stack":{"annotation":"ciliumnetworkpolicies.cilium.io \"gaia-prometheus\" is forbidden: User \"system:serviceaccount:monitoring:prometheus-meta-operator\" cannot get resource \"ciliumnetworkpolicies\" in API group \"cilium.io\" at the cluster scope","kind":"unknown"

@QuantumEnigmaa
Copy link
Contributor Author

Adding the capabilities for cilium netpols in the clusterrole didn't change the error

@QuantumEnigmaa
Copy link
Contributor Author

When trying to create the ciliumnetpol for WCs, PMO is panicking with a segmentation error :

2024-02-12T09:20:56Z    INFO    Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference        {"controller": "cluster", "controllerGroup": "cluster.x-k8s.io", "controllerKind": "Cluster", "Cluster": {"name":"gj83r","namespace":"org-zirko"}, "namespace": "org-zirko", "name": "gj83r", "reconcileID": "13c22583-216b-48e5-87c5-3d2a277eb2f1"}
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x19ffe0d]

goroutine 111 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1ca3220?, 0x32b5d50?})
        /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:914 +0x21f
github.com/giantswarm/prometheus-meta-operator/v2/service/controller/resource/ciliumnetpol.(*Resource).EnsureCreated(0xc000358600, {0x22a0cb8, 0xc00034def0}, {0x1f4a000, 0xc0006bab60})
        /root/project/service/controller/resource/ciliumnetpol/create.go:28 +0x12d
github.com/giantswarm/operatorkit/v7/pkg/resource/wrapper/retryresource.(*basicResource).EnsureCreated.func1()
        /go/pkg/mod/github.com/giantswarm/operatorkit/[email protected]/pkg/resource/wrapper/retryresource/basic_resource.go:50 +0x39
github.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1()

@QuantumEnigmaa
Copy link
Contributor Author

QuantumEnigmaa commented Feb 12, 2024

Tested it on gaia with a manually created WC and it works :

NAMESPACE          NAME                                                    AGE
gaia-prometheus    gaia-prometheus                                         22s
gj83r-prometheus   gj83r-prometheus                                        22s
zirco@Zirko-laptop:~/scripts$ k get ciliumnetworkpolicies.cilium.io -n gj83r-prometheus gj83r-prometheus -oyaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  creationTimestamp: "2024-02-12T10:19:26Z"
  generation: 1
  labels:
    app.kubernetes.io/name: prometheus
  name: gj83r-prometheus
  namespace: gj83r-prometheus
  resourceVersion: "1535543869"
  uid: 41cdf603-0732-4a65-964a-fa8a5f670562
spec:
  egress:
  - toEntities:
    - kube-apiserver
    - cluster
  - toEntities:
    - world
    toPorts:
    - ports:
      - port: "443"
      - port: "6443"
  endpointSelector:
    matchLabels:
      app.kubernetes.io/name: prometheus
  ingress:
  - fromEntities:
    - cluster
    toPorts:
    - ports:
      - port: "9090"

@QuantumEnigmaa QuantumEnigmaa marked this pull request as ready for review February 12, 2024 10:21
@QuantumEnigmaa QuantumEnigmaa requested a review from a team as a code owner February 12, 2024 10:22
go.mod Outdated Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
Copy link
Contributor

@QuentinBisson QuentinBisson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the go client then it's alright

@QuantumEnigmaa
Copy link
Contributor Author

Gonna do a last round of testing before merging

@QuantumEnigmaa
Copy link
Contributor Author

So it still works :)

@QuentinBisson
Copy link
Contributor

Can you test it on gerbil as well ?

@QuantumEnigmaa
Copy link
Contributor Author

Seems to be working as well :

NAMESPACE           NAME                                                                   AGE
gerbil-prometheus   gerbil-prometheus                                                      6m42s
monitoring          prometheus                                                             6m52s

@QuentinBisson
Copy link
Contributor

Can you check that alertmanager can communicate with all prometheus ? Also, I think that pormetheus cnp in monitoring is not needed but alright :)

@QuantumEnigmaa
Copy link
Contributor Author

Can you check that alertmanager can communicate with all prometheus ?

From what I see, it seems alright as the alerts redirect to the right prometheus instance. Don't know if there are more checks to do 🤷

I think that pormetheus cnp in monitoring is not needed

Yeah but I don't see a quick way to avoid this in the code :/

@QuentinBisson
Copy link
Contributor

The cnp is coming from the helmchart 🙈

Now for alertmanager you can open hubble using port-forwarding and check if there are any dropped connection in the monitoring ns and gerbil-prometheus :) That should be okay

@QuantumEnigmaa
Copy link
Contributor Author

The cnp is coming from the helmchart 🙈

Oh yeah indeed ! I can remove it if you want :)

Concerning hubble :
image

@QuentinBisson
Copy link
Contributor

If you do not see any dropped below then let's go

@QuantumEnigmaa
Copy link
Contributor Author

Everything is "forwarded" :)

@QuantumEnigmaa QuantumEnigmaa merged commit 96ea389 into master Feb 12, 2024
5 checks passed
@QuantumEnigmaa QuantumEnigmaa deleted the add-cilium-netpol branch February 12, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants