Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: unable to create a cluster with 3 control plane replicas #383

Open
pli01 opened this issue Oct 29, 2024 · 3 comments
Open

[Bug]: unable to create a cluster with 3 control plane replicas #383

pli01 opened this issue Oct 29, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@pli01
Copy link

pli01 commented Oct 29, 2024

What happened

Unable to add 3 control plane with all templates provided in templates or example directory

Only configuration with 1 ctrl plane are working

Step to reproduce

Choose any templates, or default https://github.com/outscale/cluster-api-provider-outscale/blob/main/templates/cluster-template.yaml
Choose any images ubuntu-2204-2204-kubernetes-v1xxxx
Add 3 replicas in control-plane section

Expected to happen

a cluster with 3 ctrl plane

Add anything

Second control plane failed to to join the cluster

...
[  388.990481] cloud-init[1077]: [2024-10-29 14:39:23] {"level":"warn","ts":"2024-10-29T14:39:23.610631Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001f8e00/10.0.4.234:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
[  388.990591] cloud-init[1077]: [2024-10-29 14:39:25] [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[  388.990708] cloud-init[1077]: [2024-10-29 14:39:25] The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[  388.990820] cloud-init[1077]: [2024-10-29 14:39:25] [mark-control-plane] Marking the node ip-10-0-4-95 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[  388.990986] cloud-init[1077]: [2024-10-29 14:39:25] [mark-control-plane] Marking the node ip-10-0-4-95 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[  388.991056] cloud-init[1077]: [2024-10-29 14:39:58] [kubelet-check] Initial timeout of 40s passed.
[  388.991170] cloud-init[1077]: [2024-10-29 14:41:25] error execution phase control-plane-join/mark-control-plane: error applying control-plane label and taints: nodes "ip-10-0-4-95" not found
[  388.991285] cloud-init[1077]: [2024-10-29 14:41:25] To see the stack trace of this error execute with --v=5 or higher
[  388.991403] cloud-init[1077]: [2024-10-29 14:41:25] 2024-10-29 14:41:25,857 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
[  388.991514] cloud-init[1077]: [2024-10-29 14:41:25] 2024-10-29 14:41:25,857 - util.py[WARNING]: Running module scripts_user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed

cluster-api output

# logs capi-controller-manager
I1029 14:36:39.251111       1 machine_controller_noderef.go:61] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" reconcileID="484c2cb6-c494-4601-9107
-1c965c79ee2a" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6dxf4"
I1029 14:44:32.319060       1 machine_controller_phases.go:306] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" recon
cileID="cd3f6a37-ecb7-4354-bdcb-a64c5d0b8cb4" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6
dxf4"
I1029 14:44:32.319170       1 machine_controller_noderef.go:61] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" reconcileID="cd3f6a37-ecb7-4354-bdcb
-a64c5d0b8cb4" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6dxf4"

Environment

- Kubernetes version: (use `kubectl version`): 
- OS (e.g. from `/etc/os-release`):
- Kernel (e.g. `uname -a`): ubuntu
- cluster-api-provider-outscale version: v0.3.1
- cluster-api version: v1.8.4
- Install tools:
- Kubernetes Distribution:
- Kubernetes Distribution version:
@pli01 pli01 added the bug Something isn't working label Oct 29, 2024
@pierreozoux
Copy link
Contributor

I work with @pli01 and I came to same conclusion.

I think it is linked to this issue:
#380

With my tests, when I add a public IP to the first node and/or the second node, at some point, it starts to work.
I didn't maange to find the failing curl :/

@outscale-hmi, I'd love to pair program with you to debug this :)

@rouja
Copy link

rouja commented Nov 5, 2024

Hello,

The bug seems fixed in the main branch. Is-it possible to make an official release ?

@outscale-hmi
Copy link
Contributor

Hello
Yes we will release ASAP
I still have some work on progress regarding the reconcile of the lbu and some test optimization and then we can release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

4 participants