Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add providerID not set issue and fix to troubleshooting guide #6685

Merged
merged 1 commit into from
Sep 15, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions docs/content/en/docs/troubleshooting/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,27 @@ Status:
Phase: Running
```

### Machine gets stuck at `Provisioned` state without `providerID` set

The VM can be created in the provider infrastructure with proper IP assigned but running `kubectl get machines -n eksa-system` indicates that the machine is in `Provisioned` state and never gets to `Running`.

Check the CAPI controller manager log with `k logs -f -n capi-system -l control-plane="controller-manager" --tail -1`:

```sh
E0218 03:41:53.126751 1 machine_controller_noderef.go:152] controllers/Machine "msg"="Failed to parse ProviderID" "error"="providerID is empty" "providerID"={} "node"="test-cluster-6ffd74bd5b-khxzr"
E0218 03:42:09.155577 1 machine_controller.go:685] controllers/Machine "msg"="Unable to retrieve machine from node" "error"="no matching Machine" "node"="test-cluster-6ffd74bd5b-khxzr"
```

When inspecting the CAPI `machine` object, you may find out that the `Node.Spec.ProviderID` is not set.
This can happen when the workload environment does not have proper network access to the underlying provider infrastructure. For example in vSphere, without the network access to vCenter endpoint, the `vsphere-cloud-controller-manager` in the workload cluster cannot set node's providerID, thus the machine will never get to `Running` state, blocking cluster provisioning from continuing.

To fix it, make sure to validate the network/firewall settings from the workload cluster to the infrastructure provider environment. Read through the `Requirements` page, especially around the networking requirements in each provider before retrying the cluster provisioning:
* [Requirements for EKS Anywhere on VMware vSphere]({{< relref "../getting-started/vsphere/vsphere-prereq" >}})
* [Network Requirements for EKS Anywhere on Bare Metal]({{< relref "../getting-started/baremetal/bare-prereq" >}})
* [Requirements for EKS Anywhere on CloudStack]({{< relref "../getting-started/cloudstack/cloudstack-prereq" >}})
* [Prerequisite Checklist for EKS Anywhere on Snow]({{< relref "../getting-started/snow/snow-getstarted/#prerequisite-checklist" >}})
* [Requirements for EKS Anywhere on Nutanix Cloud Infrastructure]({{< relref "../getting-started/nutanix/nutanix-prereq" >}})

## Bare Metal troubleshooting

### Creating new workload cluster hangs or fails
Expand Down