Skip to content

Commit

Permalink
Add providerID not set issue and fix to troubleshooting guide
Browse files Browse the repository at this point in the history
  • Loading branch information
jiayiwang7 committed Sep 15, 2023
1 parent fc3f71b commit 92849cc
Showing 1 changed file with 21 additions and 0 deletions.
21 changes: 21 additions & 0 deletions docs/content/en/docs/troubleshooting/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,27 @@ Status:
Phase: Running
```

### Machine gets stuck at `Provisioned` state without `providerID` set

The VM can be created in the provider infrastructure with proper IP assigned but running `kubectl get machines -n eksa-system` indicates that the machine is in `Provisioned` state and never gets to `Running`.

Check the CAPI controller manager log with `kubectl logs -f capi-controller-manager-7f754cf76b-g92ht -n capi-system`:

```sh
E0218 03:41:53.126751 1 machine_controller_noderef.go:152] controllers/Machine "msg"="Failed to parse ProviderID" "error"="providerID is empty" "providerID"={} "node"="test-cluster-6ffd74bd5b-khxzr"
E0218 03:42:09.155577 1 machine_controller.go:685] controllers/Machine "msg"="Unable to retrieve machine from node" "error"="no matching Machine" "node"="test-cluster-6ffd74bd5b-khxzr"
```

When inspecting the CAPI `machine` object, you may find out that the `Node.Spec.ProviderID` is not set.
This can happen when the workload environment does not have proper network access to the underlying provider infrastructure. For example in vSphere, without the network access to vCenter endpoint, the `vsphere-cloud-controller-manager` in the workload cluster cannot set node's providerID, thus the machine will never get to `Running` state, blocking cluster provisioning from continuing.

To fix it, make sure to validate the network/firewall settings from the workload cluster to the infrastructure provider environment. Read through the `Requirements` page, especially around the networking requirements in each provider before retrying the cluster provisioning:
* [Requirements for EKS Anywhere on VMware vSphere]({{< relref "../getting-started/vsphere/vsphere-prereq" >}})
* [Network Requirements for EKS Anywhere on Bare Metal]({{< relref "../getting-started/baremetal/bare-prereq" >}})
* [Requirements for EKS Anywhere on CloudStack]({{< relref "../getting-started/cloudstack/cloudstack-prereq" >}})
* [Prerequisite Checklist for EKS Anywhere on Snow]({{< relref "../getting-started/snow/snow-getstarted/#prerequisite-checklist" >}})
* [Requirements for EKS Anywhere on Nutanix Cloud Infrastructure]({{< relref "../getting-started/nutanix/nutanix-prereq" >}})

## Bare Metal troubleshooting

### Creating new workload cluster hangs or fails
Expand Down

0 comments on commit 92849cc

Please sign in to comment.