You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been playing with Talos Linux and Cluster API to automate the management of our clusters, and are currently facing some questions/issues around the bootstrap process using the vSphere infrastructure provider.
Versions / Environment
Kubernetes: 1.27.5
Talos: 1.5.2 (OVA)
Cluster API Infrastructure: vSphere 1.8.1
Cluster API Bootstrap: Talos 0.6.2
Cluster API CP: Talos 0.5.3
VMWare ESXi 7.0.3
Description
According to the Talos - VMware documentation, we have to install a custom talos-vmtools with some dedicated Talos config.
This totally makes senses, however, my concern if the following:
In order to bootstrap the cluster via Cluster API, and especially the CACPPT controller, I need my CAPV controller to retrieve the IP address of the VM via the vCenter API. However, such IP is only available upon successful installation and configuration of the VMTools. Unfortunately, to install the VMTools, I need to necessarily have the Talos bootstrap done due to the fact that it is deployed as a DaemonSet. This makes us hit the chicken/egg problem.
Our current workaround is to manually bootstrap the cluster via the IP addresses provided by the DHCP. However, this is quite a pain as we wish to automate everything via GitOps since we will manage quite a lot of permanent clusters, but also some ephemeral ones.
Do you have any insights or recommendations to achieve such goal using the VMware ecosystem ?
Reproduce Steps
The following steps can be performed to easily reproduce the issue:
Create a transient cluster that will be used to spawn the first permanent management cluster via Cluster API.
The cluster can either be created directly on vSphere or kind/k3d/...
Initialize Cluster API components on the transient cluster with clusterctl with CAPV, CABPT and CACPPT
Once the VMs are created, confirm that the bootstrap cannot occur since VMTools cannot be installed and the bootstrap cannot be done either as it cannot reach the VMs due to the lack of IP Addresses at vCenter level.
Useful outputs/content
Talos console:
vSphere machine (no IP due to VMtools not being installable at this point in time):
CACPPT logs:
2023-10-20T06:56:47Z INFO reconcile TalosControlPlane {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane", "TalosControlPlane": {"name":"observability-cluster-poc","namespace":"cluster-api-system"}, "namespace": "cluster-api-system", "name": "observability-cluster-poc", "reconcileID": "be96027a-b052-4819-bd53-8215a326733f", "cluster": "observability-cluster-poc"}
2023-10-20T06:56:47Z INFO controllers.TalosControlPlane bootstrap failed, retrying in 20 seconds {"namespace": "cluster-api-system", "talosControlPlane": "observability-cluster-poc", "error": "no addresses were found for node \"observability-cluster-poc-bzpgr\""}
2023-10-20T06:56:47Z INFO controllers.TalosControlPlane attempting to set control plane status
2023-10-20T06:56:57Z INFO controllers.TalosControlPlane failed to get kubeconfig for the cluster {"error": "failed to create cluster accessor: error creating client for remote cluster \"cluster-api-system/observability-cluster-poc\": error getting rest mapping: failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://172.30.11.10:6443/api/v1?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "errorVerbose": "failed to get API group resources: unable to retrieve the complete list of server APIs: v1: Get \"https://172.30.11.10:6443/api/v1?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\nerror creating client for remote cluster \"cluster-api-system/observability-cluster-poc\": error getting rest mapping\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).createClient\n\t/.cache/mod/sigs.k8s.io/[email protected]/controllers/remote/cluster_cache_tracker.go:396\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).newClusterAccessor\n\t/.cache/mod/sigs.k8s.io/[email protected]/controllers/remote/cluster_cache_tracker.go:299\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).getClusterAccessor\n\t/.cache/mod/sigs.k8s.io/[email protected]/controllers/remote/cluster_cache_tracker.go:273\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).GetClient\n\t/.cache/mod/sigs.k8s.io/[email protected]/controllers/remote/cluster_cache_tracker.go:180\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).updateStatus\n\t/src/controllers/taloscontrolplane_controller.go:562\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile.func1\n\t/src/controllers/taloscontrolplane_controller.go:155\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile\n\t/src/controllers/taloscontrolplane_controller.go:184\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/toolchain/go/src/runtime/asm_amd64.s:1598\nfailed to create cluster accessor\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).getClusterAccessor\n\t/.cache/mod/sigs.k8s.io/[email protected]/controllers/remote/cluster_cache_tracker.go:275\nsigs.k8s.io/cluster-api/controllers/remote.(*ClusterCacheTracker).GetClient\n\t/.cache/mod/sigs.k8s.io/[email protected]/controllers/remote/cluster_cache_tracker.go:180\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).updateStatus\n\t/src/controllers/taloscontrolplane_controller.go:562\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile.func1\n\t/src/controllers/taloscontrolplane_controller.go:155\ngithub.com/siderolabs/cluster-api-control-plane-provider-talos/controllers.(*TalosControlPlaneReconciler).Reconcile\n\t/src/controllers/taloscontrolplane_controller.go:184\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/.cache/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/toolchain/go/src/runtime/asm_amd64.s:1598"}
2023-10-20T06:56:57Z INFO controllers.TalosControlPlane successfully updated control plane status {"namespace": "cluster-api-system", "talosControlPlane": "observability-cluster-poc", "cluster": "observability-cluster-poc"}
2023-10-20T06:56:57Z INFO reconcile TalosControlPlane {"controller": "taloscontrolplane", "controllerGroup": "controlplane.cluster.x-k8s.io", "controllerKind": "TalosControlPlane", "TalosControlPlane": {"name":"observability-cluster-poc","namespace":"cluster-api-system"}, "namespace": "cluster-api-system", "name": "observability-cluster-poc", "reconcileID": "2bb6e4b1-8a51-4c48-b463-eb6b0a915de8", "cluster": "observability-cluster-poc"}
2023-10-20T06:56:57Z INFO controllers.TalosControlPlane bootstrap failed, retrying in 20 seconds {"namespace": "cluster-api-system", "talosControlPlane": "observability-cluster-poc", "error": "no addresses were found for node \"observability-cluster-poc-bzpgr\""}
2023-10-20T06:56:57Z INFO controllers.TalosControlPlane attempting to set control plane status
Thanks in advance for your help and insights.
The text was updated successfully, but these errors were encountered:
I found a way to deploy, just create a TalosOS with vmtoolds installed by default using Talos image fabric and the use that one as baseline template for the deployment, please check here [https://factory.talos.dev/].
Greetings,
We've been playing with Talos Linux and Cluster API to automate the management of our clusters, and are currently facing some questions/issues around the bootstrap process using the vSphere infrastructure provider.
Versions / Environment
Description
According to the Talos - VMware documentation, we have to install a custom talos-vmtools with some dedicated Talos config.
This totally makes senses, however, my concern if the following:
In order to bootstrap the cluster via Cluster API, and especially the CACPPT controller, I need my CAPV controller to retrieve the IP address of the VM via the vCenter API. However, such IP is only available upon successful installation and configuration of the VMTools. Unfortunately, to install the VMTools, I need to necessarily have the Talos bootstrap done due to the fact that it is deployed as a DaemonSet. This makes us hit the chicken/egg problem.
Our current workaround is to manually bootstrap the cluster via the IP addresses provided by the DHCP. However, this is quite a pain as we wish to automate everything via GitOps since we will manage quite a lot of permanent clusters, but also some ephemeral ones.
Do you have any insights or recommendations to achieve such goal using the VMware ecosystem ?
Reproduce Steps
The following steps can be performed to easily reproduce the issue:
clusterctl
withCAPV
,CABPT
andCACPPT
Click to expand manifests
Useful outputs/content
Talos console:
vSphere machine (no IP due to VMtools not being installable at this point in time):
CACPPT logs:
Thanks in advance for your help and insights.
The text was updated successfully, but these errors were encountered: