Skip to content

Commit

Permalink
docs: add vsphere issue (#3916)
Browse files Browse the repository at this point in the history
Co-authored-by: Lenny Chen <[email protected]>
  • Loading branch information
lennessyy and lennessyy authored Sep 13, 2024
1 parent e264b1d commit 9fd9948
Show file tree
Hide file tree
Showing 6 changed files with 262 additions and 14 deletions.
9 changes: 9 additions & 0 deletions docs/docs-content/enterprise-version/upgrade/upgrade-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,15 @@ Palette 4.0 includes the following major enhancements that require user interven

### Upgrade with VMware

:::warning

A known issue impacts all self-hosted Palette instances older then 4.4.14. Before upgrading a Palette instance with
version older than 4.4.14, ensure that you execute a utility script to make all your cluster IDs unique in your
Persistent Volume Claim (PVC) metadata. For more information, refer to the
[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping).

:::

From the Palette system console, click the **Update version** button. Palette will be temporarily unavailable while
system services update.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ keywords: ["self-hosted", "enterprise"]
---

This guide takes you through the process of upgrading a self-hosted airgap Palette instance installed on VMware vSphere.

:::warning

Before upgrading Palette to a new major version, you must first update it to the latest patch version of the latest
minor version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths) section for
details.

:::warning

Before upgrading Palette to a new major version, you must first update it to the latest minor version available. Refer
to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths) section for details.

:::

If your setup includes a PCG, you must also
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ tags: ["palette", "self-hosted", "vmware", "non-airgap", "upgrade"]
keywords: ["self-hosted", "enterprise"]
---

This guide takes you through the process of upgrading a self-hosted Palette instance installed on VMware vSphere.
This guide takes you through the process of upgrading a self-hosted Palette instance installed on VMware vSphere. Before
upgrading Palette to a new major version, you must first update it to the latest patch version of the latest minor
version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths) section for details.

:::warning

Before upgrading Palette to a new major version, you must first update it to the latest patch version of the latest
minor version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths) section for
details.
If you are upgrading from a Palette version that is older than 4.4.14, ensure that you have executed the utility script
to make the CNS mapping unique for the associated PVC. For more information, refer to the
[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping).

:::

Expand Down
230 changes: 230 additions & 0 deletions docs/docs-content/troubleshooting/enterprise-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,233 @@ This error may occur if the self-hosted pack registry specified in the installat
After a few moments, a system profile will be created and Palette or VerteX will be able to self-link successfully. If
you continue to encounter issues, contact our support team by emailing
[[email protected]](mailto:[email protected]) so that we can provide you with further guidance.

## Scenario - Enterprise Backup Stuck

In the scenario where an enterprise backup is stuck, a restart of the management pod may resolve the issue. Use the
following steps to restart the management pod.

### Debug Steps

1. Open up a terminal session in an environment that has network access to the Kubernetes cluster. Refer to the
[Access Cluster with CLI](../clusters/cluster-management/palette-webctl.md) for additional guidance.

2. Identify the `mgmt` pod in the `hubble-system` namespace. Use the following command to list all pods in the
`hubble-system` namespace and filter for the `mgmt` pod.

```shell
kubectl get pods --namespace hubble-system | grep mgmt
```

```shell hideClipboard
mgmt-f7f97f4fd-lds69 1/1 Running 0 45m
```

3. Restart the `mgmt` pod by deleting it. Use the following command to delete the `mgmt` pod. Replace `<mgmt-pod-name>`
with the actual name of the `mgmt` pod that you identified in step 2.

```shell
kubectl delete pod <mgmt-pod-name> --namespace hubble-system
```

```shell hideClipboard
pod "mgmt-f7f97f4fd-lds69" deleted
```

## Non-unique vSphere CNS Mapping

In Palette and VerteX releases 4.4.8 and earlier, Persistent Volume Claims (PVCs) metadata do not use a unique
identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere,
potentially leading to issues during node operations and upgrades.

This issue is resolved in Palette and VerteX releases starting with 4.4.14. However, upgrading to 4.4.14 will not
automatically resolve this issue. If you have self-hosted instances of Palette in your vSphere environment older than
4.4.14, you should execute the following utility script manually to make the CNS mapping unique for the associated PVC.

### Debug Steps

1. Ensure your machine has network access to your self-hosted Palette instance with `kubectl`. Alternatively, establish
an SSH connection to a machine where you can access your self-hosted Palette instance with `kubectl`.

2. Log in to your self-hosted Palette instance System Console.

3. In the **Main Menu**, click **Enterprise Cluster**.

4. In the cluster details page, scroll down to the **Kubernetes Config File** field and download the kubeconfig file.

5. Issue the following command to download the utility script.

```bash
curl --output csi-helper https://software.spectrocloud.com/tools/csi-helper/csi-helper
```

6. Adjust the permission of the script.

```bash
chmod +x csi-helper
```

7. Issue the following command to execute the utility script. Replace the placeholder with the path to your kubeconfig
file.

```bash
./csi-helper --kubeconfig=<PATH_TO_KUBECONFIG>
```

8. Issue the following command to verify that the script has updated the cluster ID.

```bash
kubectl describe configmap vsphere-cloud-config --namespace=kube-syste
```

If the update is successful, the cluster ID in the ConfigMap will have a unique ID assigned instead of
`spectro-mgmt/spectro-mgmt-cluster`.

```hideClipboard {12}
Name: vsphere-cloud-config
Namespace: kube-system
Labels: component=cloud-controller-manager
vsphere-cpi-infra=config
Annotations: cluster.spectrocloud.com/last-applied-hash: 17721994478134573986
Data
====
vsphere.conf:
----
[Global]
cluster-id = "896d25b9-bfac-414f-bb6f-52fd469d3a6c/spectro-mgmt-cluster"
[VirtualCenter "vcenter.spectrocloud.dev"]
insecure-flag = "true"
user = "[email protected]"
password = "************"
[Labels]
zone = "k8s-zone"
region = "k8s-region"
BinaryData
====
Events: <none>
```

## Volume Attachment Errors Volume in VMware Environment

If you deployed Palette in a VMware vSphere environment and are experiencing volume attachment errors for the MongoDB
pods during the upgrade process, it may be due to duplicate resources in the cluster causing resource creation errors.
Palette versions between 4.0.0 and 4.3.0 are affected by a known issue where cluster resources are not receiving unique
IDs. Use the following steps to correctly identify the issue and resolve it.

### Debug Steps

1. Open up a terminal session in an environment that has network access to the Kubernetes cluster.

2. Configure kubectl CLI to connect to the self-hosted Palette or VerteX's Kubernetes cluster. Refer to the
[Access Cluster with CLI](../clusters/cluster-management/palette-webctl.md) for additional guidance.
3. Verify the MongoDB pods are not starting correctly by issuing the following command.

```shell
kubectl get pods --namespace=hubble-system --selector='app=spectro,role=mongo'
```

```shell {4} hideClipboard
NAME READY STATUS RESTARTS AGE
mongo-0 2/2 Running 0 17h
mongo-1 2/2 Running 0 17h
mongo-2 0/2 ContainerCreating 0 16m
```

4. Inspect the pod that is not starting correctly. Use the following command to describe the pod. Replace `mongo-2`
with the name of the pod that is not starting.

```shell
kubectl describe pod mongo-2 --namespace=hubble-system
```

5. Review the event output for any errors. If an error related to the volume attachment is present, proceed to the next
step.

```shell hideClipboard
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 106s (x16 over 18m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-94cbb8f5-9145-4b18-9bf9-ee027b64d0c7" : volume attachment is being deleted
Warning FailedMount 21s (x4 over 16m) kubelet Unable to attach or mount volumes: unmounted volumes=[mongo-data], unattached volumes=[spectromongokey kube-api-access-sz5lz mongo-data spectromongoinit spectromongopost]: timed out waiting for the condition
```

6. The remaining steps may need to be performed on all MongoDB pods and their associated Persistent Volume (PV), and
Persistent Volume Claim (PVC). Do each step sequentially for each MongoDB pod that is encountering the volume
attachment error.

:::warning

Only do the steps for one MongoDB pod at a time to prevent data loss. Wait for the pod to come up correctly before
proceeding to the next pod.

:::

7. Delete the PVC associated with the MongoDB pod. Replace `mongo-2` with the name of the pod that is not starting.

```shell
kubectl delete pvc mongo-data-mongo-2 --namespace=hubble-system
```

8. Delete the PV associated with the MongoDB pod. Use the following command to list all PVs and find the PV associated
with the MongoDB pod you started with. In this example, the PV associated with `mongo-2` is
`pvc-94cbb8f5-9145-4b18-9bf9-ee027b64d0c7`. Make a note of this name.

```shell
kubectl get pv | grep 'mongo-data-mongo-2'
```

```shell hideClipboard
pvc-94cbb8f5-9145-4b18-9bf9-ee027b64d0c7 20Gi RWO Delete Bound hubble-system/mongo-data-mongo-2 spectro-storage-class 18h
```

9. Using the PV name from the previous step, delete the PV.

```shell
kubectl delete pv pvc-94cbb8f5-9145-4b18-9bf9-ee027b64d0c7
```

:::tip

The kubectl command may hang after issuing the delete command, press `Ctrl+C` to exit the command and proceed to the
next step.

:::

10. Delete the MongoDB pod that was not starting correctly. Replace `mongo-2` with the name of the pod that is not
starting.

```shell
kubectl delete pod mongo-2 --namespace=hubble-system
```

11. Wait for the pod to come up correctly. Use the following command to verify the pod is up and available.

```shell
kubectl get pods --namespace=hubble-system --selector='app=spectro,role=mongo'
```

```shell {4} hideClipboard
NAME READY STATUS RESTARTS AGE
mongo-0 2/2 Running 0 18h
mongo-1 2/2 Running 0 18h
mongo-2 2/2 Running 0 68s
```

:::warning

Once the pod is in the **Running** status, wait for at least five minutes for the replication to complete before
proceeding with the other pods.

:::

Palette will proceed with the upgrade and attempt to upgrade the remaining MongoDB pods. Repeat the steps for each
of the MongoDB pods that are not starting correctly due to the volume attachment error.

The upgrade process will continue once all MongoDB pods are up and available. Verify the new nodes deployed
successfully by checking the status of the nodes. Log in to the
[system console](../enterprise-version/system-management/system-management.md#access-the-system-console), navigate
to left **Main Menu** and select **Enterprise Cluster**. The **Nodes** tab will display the status of the nodes in
the cluster.

If you continue to encounter issues, contact our support team by emailing
[[email protected]](mailto:[email protected]) so that we can provide you with further guidance.
10 changes: 6 additions & 4 deletions docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ keywords: ["self-hosted", "vertex"]
---

This guide takes you through the process of upgrading a self-hosted airgap Palette VerteX instance installed on VMware
vSphere.
vSphere. Before upgrading Palette VerteX to a new major version, you must first update it to the latest patch version of
the latest minor version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths)
section for details.

:::warning

Before upgrading Palette VerteX to a new major version, you must first update it to the latest patch version of the
latest minor version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths) section
for details.
If you are upgrading from a Palette VerteX version that is older than 4.4.14, ensure that you have executed the utility
script to make the CNS mapping unique for the associated PVC. For more information, refer to the
[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping).

:::

Expand Down
9 changes: 6 additions & 3 deletions docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,16 @@ keywords: ["self-hosted", "vertex"]
---

This guide takes you through the process of upgrading a self-hosted Palette VerteX instance installed on VMware vSphere.

:::warning

Before upgrading Palette VerteX to a new major version, you must first update it to the latest patch version of the
latest minor version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supported-upgrade-paths) section
for details.

:::warning

If you are upgrading from a Palette VerteX version that is older than 4.4.14, ensure that you have executed the utility
script to make the CNS mapping unique for the associated PVC. For more information, refer to the
[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping).

:::

If your setup includes a PCG, you must also
Expand Down

0 comments on commit 9fd9948

Please sign in to comment.