diff --git a/docs/docs-content/automation/palette-cli/palette-cli.md b/docs/docs-content/automation/palette-cli/palette-cli.md index c740920d89..5cce0992c3 100644 --- a/docs/docs-content/automation/palette-cli/palette-cli.md +++ b/docs/docs-content/automation/palette-cli/palette-cli.md @@ -7,7 +7,7 @@ tags: ["palette-cli"] --- The Palette CLI contains various functionalities that you can use to interact with Palette and manage resources. The -Palette CLI is well suited for Continuous Delivery/Continuous Deployment (CI/CD) pipelines and recommended for +Palette CLI is well suited for Continuous Integration/Continuous Deployment (CI/CD) pipelines and recommended for automation tasks, where Terraform or direct API queries are not ideal. To get started with the Palette CLI, check out the [Install](install-palette-cli.md) guide. diff --git a/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md b/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md index 2eae950a9e..c7f5c31d61 100644 --- a/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md +++ b/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md @@ -57,7 +57,7 @@ Palette 4.0 includes the following major enhancements that require user interven A known issue impacts all self-hosted Palette instances older then 4.4.14. Before upgrading a Palette instance with version older than 4.4.14, ensure that you execute a utility script to make all your cluster IDs unique in your Persistent Volume Claim (PVC) metadata. For more information, refer to the -[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md index fd512b6af2..73ce1362b8 100644 --- a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md +++ b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md @@ -17,7 +17,7 @@ details. If you are upgrading from a Palette version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md index 4229176347..3f7c70e2a2 100644 --- a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md +++ b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md @@ -16,7 +16,7 @@ version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supporte If you are upgrading from a Palette version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/release-notes/known-issues.md b/docs/docs-content/release-notes/known-issues.md index 50315cdbc4..52d8f154a0 100644 --- a/docs/docs-content/release-notes/known-issues.md +++ b/docs/docs-content/release-notes/known-issues.md @@ -14,55 +14,62 @@ to review and stay informed about the status of known issues in Palette. As issu The following table lists all known issues that are currently active and affecting users. -| Description | Workaround | Publish Date | Product Component | -| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | ---------------------------- | -| For Edge airgap clusters, manifests attached to packs are not applied during cluster deployment. | Add the manifest as a layer directly instead of attaching it to a pack. For more information, refer to [Add a Manifest](../profiles/cluster-profiles/create-cluster-profiles/create-addon-profile/create-manifest-addon.md). | November 15, 2024 | Edge | -| The Certificate Authority (CA) certificate for Mutating Webhook Handler (MWH) expires after 90 days and does not get automaticallly renewed, which affects cluster health. | Access your cluster [via kubectl](../clusters/cluster-management/palette-webctl.md) and issue the commands `kubectl delete secret --namespace spectro-system stylus-webhook-tls && kubectl delete mutatingwebhookconfiguration stylus-webhook` to manually delete the MWH TLS secret and MWH configuration. The MWH will then be automatically recreated with a new certificate. | Nov 4, 2024 | Edge | -| Upgrading the RKE2 version from 1.29 to 1.30 fails due to [an upstream issue](https://github.com/rancher/rancher/issues/46726) with RKE2 and Cilium. | Refer to the [Troubleshooting section](../troubleshooting/edge.md#scenario---clusters-with-cilium-and-rke2-experiences-kubernetes-upgrade-failure) for the workaround. | October 11, 2024 | Edge | -| Clusters deployed with Microk8s cannot accept kubectl commands if the pack is added to the cluster's cluster profile. The reason behind these issues is Microk8s' lack of support for `certSANs` . This causes the Kubernetes API server to reject Spectro Proxy certificates. | Use the CLI flag [`--insecure-skip-tls-verify`](https://kubernetes.io/docs/reference/kubectl/kubectl/) with kubectl commands or use the [admin kubeconfig file](../clusters/cluster-management/kubeconfig.md#kubeconfig-files) to access the cluster API, as it does not use the Spectro Proxy server. This option may be limited to environments where you can access the cluster directly from a network perspective. | October 1, 2024 | Clusters, Pack | -| Deploying new [Nutanix clusters](../clusters/data-center/nutanix/nutanix.md) fails for self-hosted Palette or VerteX users on version 4.4.18 or newer. | No workaround is available. | September 26, 2024 | Clusters | -| OCI Helm registries added to Palette or VerteX before support for OCI Helm registries hosted in AWS ECR was available in Palette have an invalid API payload that is causing cluster imports to fail if the OCI Helm Registry is referenced in the cluster profile. | Log in to Palette as a tenant administrator and navigate to the left **Main Menu** . Select **Registries** and click on the **OCI Registries** tab. For each OCI registry of the Helm type, click on the **three-dot Menu** at the end of the row. Select **Edit**. To fix the invalid API payload, click on **Confirm**. Palette will automatically add the correct provider type behind the scenes to address the issue. | September 25, 2024 | Helm Registries | -| Airgap self-hosted Palette or VerteX instances cannot use the Container service in App Profiles. The required dependency, [DevSpace](https://github.com/devspace-sh/devspace), is unavailable from the Palette pack registry and is downloaded from the Internet at runtime. | Use the manifest service in an [App Profile](../profiles/app-profiles/app-profiles.md) to specify a custom container image. | September 25, 2024 | App Mode | -| If an Edge host operating a cluster in connected mode loses connection to Palette, the cluster will not auto-renew its Public Key Infrastructure (PKI) certificates. When it re-establishes the connection to Palette, the Edge host will renew the certificates if the existing certificates have less than 30 days before expiry. | No workaround available. | September 14, 2024 | Edge | -| Using the Flannel Container Network Interface (CSI) pack together with a Red Hat Enterprise Linux (RHEL)-based provider image may lead to a pod becoming stuck during deployment. This is caused by an upstream issue with Flannel that was discovered in a K3s GitHub issue. Refer to [the K3s issue page](https://github.com/k3s-io/k3s/issues/5013) for more information. | No workaround is available | September 14, 2024 | Edge | -| Palette OVA import operations fail if the VMO cluster is using a storageClass with the volume bind method `WaitForFirstConsumer`. | Refer to the [OVA Imports Fail Due To Storage Class Attribute](../troubleshooting/vmo-issues.md#scenario---ova-imports-fail-due-to-storage-class-attribute) troubleshooting guide for workaround steps. | September 13, 2024 | Palette CLI, VMO | -| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | -| Third-party binaries downloaded and used by the Palette CLI may become stale and incompatible with the CLI. | Refer to the [Incompatible Stale Palette CLI Binaries](../troubleshooting/automation.md#scenario---incompatible-stale-palette-cli-binaries) troubleshooting guide for workaround guidance. | September 11, 2024 | CLI | -| An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | -| The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | -| On AWS IaaS Microk8s clusters, OS patching can get stuck and fail. | Refer to the [Troubleshooting](../troubleshooting/nodes.md#os-patch-fails-on-aws-with-microk8s-127) section for debug steps. | August 17, 2024 | Palette | -| When upgrading a self-hosted Palette instance from 4.3 to 4.4 the MongoDB pod may be stuck with the following error: `ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.` | Delete the PVC, PV and the pod manually. All resources will be recreated with the correct configuration. | August 17, 2024 | Self-Hosted Palette | -| For existing clusters that have added a new machine and all new clusters, pods may be stuck in the draining process and require manual intervention to drain the pod. | Manually delete the pod if it is stuck in the draining process. | August 17, 2024 | Palette | -| Clusters with the Virtual Machine Orchestrator (VMO) pack may experience VMs getting stuck in a continuous migration loop, as indicated by a `Migrating` or `Migration` VM status. | Review the [Virtual Machine Orchestrator (VMO) Troubleshooting](../troubleshooting/vmo-issues.md) section for workarounds. | August 1, 2024 | Virtual Machine Orchestrator | -| Palette CLI users who authenticated with the `login` command and specified a Palette console endpoint that does not contain the tenant name are encountering issues with expired JWT tokens. | Re-authenticate using your tenant URL, for example, `https://my-org.console.spectrocloud.com.` If the issue persists after re-authenticating, remove the `~/.palette/palette.yaml` file that is auto-generated by the Palette CLI. Re-authenticate with the `login` command if other commands require it. | July 25, 2024 | CLI | -| Adding new cloud providers, such as Nutanix, is currently unavailable. Private Cloud Gateway (PCG) deployments in new Nutanix environments fail to complete the installation. As a result, adding a new Nutanix environment to launch new host clusters is unavailable. This does not impact existing Nutanix deployments with a PCG deployed. | No workarounds are available. | July 20, 2024 | Clusters, Self-Hosted, PCG | -| Single-node Private Cloud Gateway (PCG) clusters are experiencing an issue upgrading to 4.4.11. The vSphere CSI controller pod fails to start because there are no matching affinity rules. | Check out the [vSphere Controller Pod Fails to Start in Single Node PCG Cluster](../troubleshooting/pcg.md#scenario---vsphere-controller-pod-fails-to-start-in-single-node-pcg-cluster) guide for workaround steps. | July 20, 2024 | PCG | -| When provisioning an Edge cluster, it's possible that some Operating System (OS) user credentials will be lost once the cluster is active. This is because the cloud-init stages from different sources merge during the deployment process, and sometimes, the same stages without distinct names overwrite each other. | Give each of your cloud-init stages in the OS pack and in the Edge installer **user-data** file a unique name. For more information about cloud-init stages and examples of cloud-init stages with names, refer to [Cloud-init Stages](../clusters/edge/edge-configuration/cloud-init.md). | July 17, 2024 | Edge | -| When you use a content bundle to provision a new cluster without using the local Harbor registry, it's possible for the images to be pulled from external networks instead of from the content bundle, consuming network bandwidth. If your Edge host has no connection to external networks or if it cannot locate the image on a remote registry, some pods may enter the `ImagePullBackOff` state at first, but eventually the pods will be created using images from the content bundle. | For connected clusters, you can make sure that the remote images are not reachable by the Edge host, which will stop the Palette agent from downloading the image and consuming bandwidth, and eventually the cluster will be created using images from the content bundle. For airgap clusters, the `ImagePullBackOff` error will eventually resolve on its own and there is no action to take. | July 11, 2024 | Edge | -| When you add a new VMware vSphere Edge host to an Edge cluster, the IP address may fail to be assigned to the Edge host after a reboot. | Review the [Edge Troubleshooting](../troubleshooting/edge.md) section for workarounds. | July 9, 2024 | Edge | -| When you install Palette Edge using an Edge Installer ISO with a RHEL 8 operating system on a Virtual Machine (VM) with insufficient video memory, the QR code in the registration screen does not display correctly. | Increase the video memory of your VM to 8 MB or higher. The steps to do this vary depending on the platform you use to deploy your VM. In vSphere, you can right click on the VM, click **Edit Settings** and adjust the video card memory in the **Video card** tab. | July 9, 2024 | Edge | -| Custom Certificate Authority (CA) is not supported for accessing Azure AKS clusters. Using a custom CA prevents the `spectro-proxy` pack from working correctly with Azure AKS clusters. | No workaround is available. | July 9, 2024 | Packs, Clusters | -| Manifests attached to an Infrastructure Pack, such as OS, Kubernetes, Network, or Storage, are not applied to the Edge cluster. This issue does not impact the infrastructure pack's YAML definition, which is applied to the cluster. | Specify custom configurations through an add-on pack or a custom manifest pack applied after the infrastructure packs. | Jul 9, 2024 | Edge, Packs | -| Clusters using Cilium and deployed to VMware environments with the VXLAN tunnel protocol may encounter an I/O timeout error. This issue is caused by the VXMNET3 adapter, which is dropping network traffic and resulting in VXLAN traffic being dropped. You can learn more about this issue in the [Cilium's GitHub issue #21801](https://github.com/cilium/cilium/issues/21801). | Review the section for workarounds. | June 27, 2024 | Packs, Clusters, Edge | -| [Sonobuoy](../clusters/cluster-management/compliance-scan.md#conformance-testing) scans fail to generate reports on airgapped Palette Edge clusters. | No workaround is available. | June 24, 2024 | Edge | -| Clusters configured with OpenID Connect (OIDC) at the Kubernetes layer encounter issues when authenticating with the [non-admin Kubeconfig file](../clusters/cluster-management/kubeconfig.md#cluster-admin). Kubeconfig files using OIDC to authenticate will not work if the SSL certificate is set at the OIDC provider level. | Use the admin Kubeconfig file to authenticate with the cluster, as it does not use OIDC to authenticate. | June 21, 2024 | Clusters | -| During the platform upgrade from Palette 4.3 to 4.4, Virtual Clusters may encounter a scenario where the pod `palette-controller-manager` is not upgraded to the newer version of Palette. The virtual cluster will continue to be operational, and this does not impact its functionality. | Refer to the [Controller Manager Pod Not Upgraded](../troubleshooting/palette-dev-engine.md#scenario---controller-manager-pod-not-upgraded) troubleshooting guide. | June 15, 2024 | Virtual Clusters | -| Edge hosts with FIPS-compliant Red Hat Enterprise Linux (RHEL) and Ubuntu Operating Systems (OS) may encounter the error where the `systemd-resolved.service` service enters the **failed** state. This prevents the nameserver from being configured, which will result in cluster deployment failure. | Refer to [TroubleShooting](../troubleshooting/edge.md#scenario---systemd-resolvedservice-enters-failed-state) for a workaround. | June 15, 2024 | Edge | -| The GKE cluster's Kubernetes pods are failing to start because the Kubernetes patch version is unavailable. This is encountered during pod restarts or node scaling operations. | Deploy a new cluster and use a GKE cluster profile that does not contain a Kubernetes pack layer with a patch version. Migrate the workloads from the existing cluster to the new cluster. This is a breaking change introduced in Palette 4.4.0 | June 15, 2024 | Packs, Clusters | -| does not support multi-node control plane clusters. The upgrade strategy, `InPlaceUpgrade`, is the only option available for use. | No workaround is available. | June 15, 2024 | Packs | -| Clusters using as the Kubernetes distribution, the control plane node fails to upgrade when using the `InPlaceUpgrade` strategy for sequential upgrades, such as upgrading from version 1.25.x to version 1.26.x and then to version 1.27.x. | Refer to the [Control Plane Node Fails to Upgrade in Sequential MicroK8s Upgrades](../troubleshooting/pack-issues.md) troubleshooting guide for resolution steps. | June 15, 2024 | Packs | -| Azure IaaS clusters are having issues with deployed load balancers and ingress deployments when using Kubernetes versions 1.29.0 and 1.29.4. Incoming connections time out as a result due to a lack of network path inside the cluster. Azure AKS clusters are not impacted. | Use a Kubernetes version lower than 1.29.0 | June 12, 2024 | Clusters | -| OIDC integration with Virtual Clusters is not functional. All other operations related to Virtual Clusters are operational. | No workaround is available. | Jun 11, 2024 | Virtual Clusters | -| Deploying self-hosted Palette or VerteX to a vSphere environment fails if vCenter has standalone hosts directly under a Datacenter. Persistent Volume (PV) provisioning fails due to an upstream issue with the vSphere Container Storage Interface (CSI) for all versions before v3.2.0. Palette and VerteX use the vSphere CSI version 3.1.2 internally. The issue may also occur in workload clusters deployed on vSphere using the same vSphere CSI for storage volume provisioning. | If you encounter the following error message when deploying self-hosted Palette or VerteX: `'ProvisioningFailed failed to provision volume with StorageClass "spectro-storage-class". Error: failed to fetch hosts from entity ComputeResource:domain-xyz` then use the following workaround. Remove standalone hosts directly under the Datacenter from vCenter and allow the volume provisioning to complete. After the volume is provisioned, you can add the standalone hosts back. You can also use a service account that does not have access to the standalone hosts as the user that deployed Palette. | May 21, 2024 | Self-Hosted | -| Conducting cluster node scaling operations on a cluster undergoing a backup can lead to issues and potential unresponsiveness. | To avoid this, ensure no backup operations are in progress before scaling nodes or performing other cluster operations that change the cluster state | April 14, 2024 | Clusters | -| Palette automatically creates an AWS security group for worker nodes using the format `-node`. If a security group with the same name already exists in the VPC, the cluster creation process fails. | To avoid this, ensure that no security group with the same name exists in the VPC before creating a cluster. | April 14, 2024 | Clusters | -| K3s version 1.27.7 has been marked as _Deprecated_. This version has a known issue that causes clusters to crash. | Upgrade to a newer version of K3s to avoid the issue, such as versions 1.26.12, 1.28.5, and 1.27.11. You can learn more about the issue in the [K3s GitHub issue](https://github.com/k3s-io/k3s/issues/9047) page. | April 14, 2024 | Packs, Clusters | -| When deploying a multi-node AWS EKS cluster with the Container Network Interface (CNI) , the cluster deployment fails. | A workaround is to use the AWS VPC CNI in the interim while the issue is resolved. | April 14, 2024 | Packs, Clusters | -| If a Kubernetes cluster deployed onto VMware is deleted, and later re-created with the same name, the cluster creation process fails. The issue is caused by existing resources remaining inside the PCG, or the System PCG, that are not cleaned up during the cluster deletion process. | Refer to the [VMware Resources Remain After Cluster Deletion](../troubleshooting/pcg.md#scenario---vmware-resources-remain-after-cluster-deletion) troubleshooting guide for resolution steps. | April 14, 2024 | Clusters | -| Day-2 operations related to infrastructure changes, such as modifying the node size and count, when using MicroK8s are not taking effect. | No workaround is available. | April 14, 2024 | Packs, Clusters | -| If a cluster that uses the Rook-Ceph pack experiences network issues, it's possible for the file mount to become and remain unavailable even after the network is restored. | This a known issue disclosed in the [Rook GitHub repository](https://github.com/rook/rook/issues/13818). To resolve this issue, refer to pack documentation. | April 14, 2024 | Packs, Edge | -| Edge clusters on Edge hosts with ARM64 processors may experience instability issues that cause cluster failures. | ARM64 support is limited to a specific set of Edge devices. Currently, Nvidia Jetson devices are supported. | April 14, 2024 | Edge | -| During the cluster provisioning process of new edge clusters, the Palette webhook pods may not always deploy successfully, causing the cluster to be stuck in the provisioning phase. This issue does not impact deployed clusters. | Review the [Palette Webhook Pods Fail to Start](../troubleshooting/edge.md#scenario---palette-webhook-pods-fail-to-start) troubleshooting guide for resolution steps. | April 14, 2024 | Edge | +| Description | Workaround | Publish Date | Product Component | +| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | ---------------------------- | +| For Edge airgap clusters, manifests attached to packs are not applied during cluster deployment. | Add the manifest as a layer directly instead of attaching it to a pack. For more information, refer to [Add a Manifest](../profiles/cluster-profiles/create-cluster-profiles/create-addon-profile/create-manifest-addon.md). | November 15, 2024 | Edge | +| In some cases, the differential editor incorrectly reports YAML differences for customizations not created by you. The issue is more common when items in a list or array are removed. Clicking the **Keep** button when non-user-generated customization is the focus causes the button to become unresponsive after the first usage. | Skip differential highlights not created by you. Click the arrow button to skip and proceed. | November 11, 2024 | Cluster Profiles | +| Palette fails to provision virtual clusters on airgapped and proxy Edge cluster groups. This error is caused by Palette incorrectly defaulting to fetch charts from an external repository, which is unreachable from these environments. | No workaround. | November 9, 2024 | Virtual Clusters | +| The resource limits on Palette Virtual Clusters are too low and may cause the Palette agent to experience resource exhaustion. As a result, Palette pods required for Palette operations may experience Out-of-Memory (OOM) errors. | Refer to the [Apply Host Cluster Resource Limits to Virtual Cluster](../troubleshooting/palette-dev-engine.md#scenario---apply-host-cluster-resource-limits-to-virtual-cluster) guide for workaround steps. | November 4, 2024 | Virtual Clusters | +| Palette incorrectly modifies the indentation of the pack after it is configured as a cluster profile layer. The modified indentation does not cause errors, but you may observe changes to the pack **values.yaml**. | No workaround available. | October 30, 2024 | Cluster Profiles, Pack | +| Palette does not correctly configure multiple search domains when provided during the self-hosted installation. The configuration file **resolve.conf** ends up containing incorrect values. | Connect remotely to each node in the Palette self-hosted instance and edit the **resolution.conf** configuration file. | October 17, 2024 | Self-Hosted, PCG | +| Upgrading the RKE2 version from 1.29 to 1.30 fails due to [an upstream issue](https://github.com/rancher/rancher/issues/46726) with RKE2 and Cilium. | Refer to the [Troubleshooting section](../troubleshooting/edge.md#scenario---clusters-with-cilium-and-rke2-experiences-kubernetes-upgrade-failure) for the workaround. | October 12, 2024 | Edge | +| Kubernetes clusters deployed on MAAS with Microk8s are experiencing deployment issues when using the upgrade strategy `RollingUpgrade`. This issue is affecting new cluster deployments and node provisioning. | Use the `InPlaceUpgrade` strategy to upgrade nodes deployed in MAAS. | October 12, 2024 | Clusters, Pack | +| Clusters using Mircrok8s and conducting backup and restore operations using Velero with [restic](https://github.com/restic/restic) are encountering restic pods going into the _crashloopbackoff_ state. This issue stems from an upstream problem in the Velero project. You can learn more about it in the GitHub issue [4035](https://github.com/vmware-tanzu/velero/issues/4035) page. | Refer to the Additional Details section for troubleshooting workaround steps. | October 12, 2024 | Clusters | +| Clusters deployed with Microk8s cannot accept kubectl commands if the pack is added to the cluster's cluster profile. The reason behind this issue is Microk8s' lack of support for `certSANs`. This causes the Kubernetes API server to reject Spectro Proxy certificates. Check out GitHub issue [114](https://github.com/canonical/cluster-api-bootstrap-provider-microk8s/issues/114) in the MircoK8s cluster-api-bootstrap-provider-microk8s repository to learn more. | Use the [admin kubeconfig file](../clusters/cluster-management/kubeconfig.md#kubeconfig-files) to access the cluster API, as it does not use the Spectro Proxy server. This option may be limited to environments where you can access the cluster directly from a network perspective. | October 1, 2024 | Clusters, Pack | +| Clusters deployed with Microk8s cannot accept kubectl commands if the pack is added to the cluster's cluster profile. The reason behind these issues is Microk8s' lack of support for `certSANs` . This causes the Kubernetes API server to reject Spectro Proxy certificates. | Use the CLI flag [`--insecure-skip-tls-verify`](https://kubernetes.io/docs/reference/kubectl/kubectl/) with kubectl commands or use the [admin kubeconfig file](../clusters/cluster-management/kubeconfig.md#kubeconfig-files) to access the cluster API, as it does not use the Spectro Proxy server. This option may be limited to environments where you can access the cluster directly from a network perspective. | October 1, 2024 | Clusters, Pack | +| Deploying new [Nutanix clusters](../clusters/data-center/nutanix/nutanix.md) fails for self-hosted Palette or VerteX users on version 4.4.18 or newer. | No workaround is available. | September 26, 2024 | Clusters | +| OCI Helm registries added to Palette or VerteX before support for OCI Helm registries hosted in AWS ECR was available in Palette have an invalid API payload that is causing cluster imports to fail if the OCI Helm Registry is referenced in the cluster profile. | Log in to Palette as a tenant administrator and navigate to the left **Main Menu** . Select **Registries** and click on the **OCI Registries** tab. For each OCI registry of the Helm type, click on the **three-dot Menu** at the end of the row. Select **Edit**. To fix the invalid API payload, click on **Confirm**. Palette will automatically add the correct provider type behind the scenes to address the issue. | September 25, 2024 | Helm Registries | +| Airgap self-hosted Palette or VerteX instances cannot use the Container service in App Profiles. The required dependency, [DevSpace](https://github.com/devspace-sh/devspace), is unavailable from the Palette pack registry and is downloaded from the Internet at runtime. | Use the manifest service in an [App Profile](../profiles/app-profiles/app-profiles.md) to specify a custom container image. | September 25, 2024 | App Mode | +| If an Edge host operating a cluster in connected mode loses connection to Palette, the cluster will not auto-renew its Public Key Infrastructure (PKI) certificates. When it re-establishes the connection to Palette, the Edge host will renew the certificates if the existing certificates have less than 30 days before expiry. | No workaround available. | September 14, 2024 | Edge | +| Using the Flannel Container Network Interface (CSI) pack together with a Red Hat Enterprise Linux (RHEL)-based provider image may lead to a pod becoming stuck during deployment. This is caused by an upstream issue with Flannel that was discovered in a K3s GitHub issue. Refer to [the K3s issue page](https://github.com/k3s-io/k3s/issues/5013) for more information. | No workaround is available | September 14, 2024 | Edge | +| Palette OVA import operations fail if the VMO cluster is using a storageClass with the volume bind method `WaitForFirstConsumer`. | Refer to the [OVA Imports Fail Due To Storage Class Attribute](../troubleshooting/vmo-issues.md#scenario---ova-imports-fail-due-to-storage-class-attribute) troubleshooting guide for workaround steps. | September 13, 2024 | Palette CLI, VMO | +| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | +| Third-party binaries downloaded and used by the Palette CLI may become stale and incompatible with the CLI. | Refer to the [Incompatible Stale Palette CLI Binaries](../troubleshooting/automation.md#scenario---incompatible-stale-palette-cli-binaries) troubleshooting guide for workaround guidance. | September 11, 2024 | CLI | +| An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | +| The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | +| On AWS IaaS Microk8s clusters, OS patching can get stuck and fail. | Refer to the [Troubleshooting](../troubleshooting/nodes.md#os-patch-fails-on-aws-with-microk8s-127) section for debug steps. | August 17, 2024 | Palette | +| When upgrading a self-hosted Palette instance from 4.3 to 4.4 the MongoDB pod may be stuck with the following error: `ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.` | Delete the PVC, PV and the pod manually. All resources will be recreated with the correct configuration. | August 17, 2024 | Self-Hosted Palette | +| For existing clusters that have added a new machine and all new clusters, pods may be stuck in the draining process and require manual intervention to drain the pod. | Manually delete the pod if it is stuck in the draining process. | August 17, 2024 | Palette | +| Clusters with the Virtual Machine Orchestrator (VMO) pack may experience VMs getting stuck in a continuous migration loop, as indicated by a `Migrating` or `Migration` VM status. | Review the [Virtual Machine Orchestrator (VMO) Troubleshooting](../troubleshooting/vmo-issues.md) section for workarounds. | August 1, 2024 | Virtual Machine Orchestrator | +| Palette CLI users who authenticated with the `login` command and specified a Palette console endpoint that does not contain the tenant name are encountering issues with expired JWT tokens. | Re-authenticate using your tenant URL, for example, `https://my-org.console.spectrocloud.com.` If the issue persists after re-authenticating, remove the `~/.palette/palette.yaml` file that is auto-generated by the Palette CLI. Re-authenticate with the `login` command if other commands require it. | July 25, 2024 | CLI | +| Adding new cloud providers, such as Nutanix, is currently unavailable. Private Cloud Gateway (PCG) deployments in new Nutanix environments fail to complete the installation. As a result, adding a new Nutanix environment to launch new host clusters is unavailable. This does not impact existing Nutanix deployments with a PCG deployed. | No workarounds are available. | July 20, 2024 | Clusters, Self-Hosted, PCG | +| Single-node Private Cloud Gateway (PCG) clusters are experiencing an issue upgrading to 4.4.11. The vSphere CSI controller pod fails to start because there are no matching affinity rules. | Check out the [vSphere Controller Pod Fails to Start in Single Node PCG Cluster](../troubleshooting/pcg.md#scenario---vsphere-controller-pod-fails-to-start-in-single-node-pcg-cluster) guide for workaround steps. | July 20, 2024 | PCG | +| When provisioning an Edge cluster, it's possible that some Operating System (OS) user credentials will be lost once the cluster is active. This is because the cloud-init stages from different sources merge during the deployment process, and sometimes, the same stages without distinct names overwrite each other. | Give each of your cloud-init stages in the OS pack and in the Edge installer **user-data** file a unique name. For more information about cloud-init stages and examples of cloud-init stages with names, refer to [Cloud-init Stages](../clusters/edge/edge-configuration/cloud-init.md). | July 17, 2024 | Edge | +| When you use a content bundle to provision a new cluster without using the local Harbor registry, it's possible for the images to be pulled from external networks instead of from the content bundle, consuming network bandwidth. If your Edge host has no connection to external networks or if it cannot locate the image on a remote registry, some pods may enter the `ImagePullBackOff` state at first, but eventually the pods will be created using images from the content bundle. | For connected clusters, you can make sure that the remote images are not reachable by the Edge host, which will stop the Palette agent from downloading the image and consuming bandwidth, and eventually the cluster will be created using images from the content bundle. For airgap clusters, the `ImagePullBackOff` error will eventually resolve on its own and there is no action to take. | July 11, 2024 | Edge | +| When you add a new VMware vSphere Edge host to an Edge cluster, the IP address may fail to be assigned to the Edge host after a reboot. | Review the [Edge Troubleshooting](../troubleshooting/edge.md) section for workarounds. | July 9, 2024 | Edge | +| When you install Palette Edge using an Edge Installer ISO with a RHEL 8 operating system on a Virtual Machine (VM) with insufficient video memory, the QR code in the registration screen does not display correctly. | Increase the video memory of your VM to 8 MB or higher. The steps to do this vary depending on the platform you use to deploy your VM. In vSphere, you can right click on the VM, click **Edit Settings** and adjust the video card memory in the **Video card** tab. | July 9, 2024 | Edge | +| Custom Certificate Authority (CA) is not supported for accessing Azure AKS clusters. Using a custom CA prevents the `spectro-proxy` pack from working correctly with Azure AKS clusters. | No workaround is available. | July 9, 2024 | Packs, Clusters | +| Manifests attached to an Infrastructure Pack, such as OS, Kubernetes, Network, or Storage, are not applied to the Edge cluster. This issue does not impact the infrastructure pack's YAML definition, which is applied to the cluster. | Specify custom configurations through an add-on pack or a custom manifest pack applied after the infrastructure packs. | Jul 9, 2024 | Edge, Packs | +| Clusters using Cilium and deployed to VMware environments with the VXLAN tunnel protocol may encounter an I/O timeout error. This issue is caused by the VXMNET3 adapter, which is dropping network traffic and resulting in VXLAN traffic being dropped. You can learn more about this issue in the [Cilium's GitHub issue #21801](https://github.com/cilium/cilium/issues/21801). | Review the section for workarounds. | June 27, 2024 | Packs, Clusters, Edge | +| [Sonobuoy](../clusters/cluster-management/compliance-scan.md#conformance-testing) scans fail to generate reports on airgapped Palette Edge clusters. | No workaround is available. | June 24, 2024 | Edge | +| Clusters configured with OpenID Connect (OIDC) at the Kubernetes layer encounter issues when authenticating with the [non-admin Kubeconfig file](../clusters/cluster-management/kubeconfig.md#cluster-admin). Kubeconfig files using OIDC to authenticate will not work if the SSL certificate is set at the OIDC provider level. | Use the admin Kubeconfig file to authenticate with the cluster, as it does not use OIDC to authenticate. | June 21, 2024 | Clusters | +| During the platform upgrade from Palette 4.3 to 4.4, Virtual Clusters may encounter a scenario where the pod `palette-controller-manager` is not upgraded to the newer version of Palette. The virtual cluster will continue to be operational, and this does not impact its functionality. | Refer to the [Controller Manager Pod Not Upgraded](../troubleshooting/palette-dev-engine.md#scenario---controller-manager-pod-not-upgraded) troubleshooting guide. | June 15, 2024 | Virtual Clusters | +| Edge hosts with FIPS-compliant Red Hat Enterprise Linux (RHEL) and Ubuntu Operating Systems (OS) may encounter the error where the `systemd-resolved.service` service enters the **failed** state. This prevents the nameserver from being configured, which will result in cluster deployment failure. | Refer to [TroubleShooting](../troubleshooting/edge.md#scenario---systemd-resolvedservice-enters-failed-state) for a workaround. | June 15, 2024 | Edge | +| The GKE cluster's Kubernetes pods are failing to start because the Kubernetes patch version is unavailable. This is encountered during pod restarts or node scaling operations. | Deploy a new cluster and use a GKE cluster profile that does not contain a Kubernetes pack layer with a patch version. Migrate the workloads from the existing cluster to the new cluster. This is a breaking change introduced in Palette 4.4.0 | June 15, 2024 | Packs, Clusters | +| does not support multi-node control plane clusters. The upgrade strategy, `InPlaceUpgrade`, is the only option available for use. | No workaround is available. | June 15, 2024 | Packs | +| Clusters using as the Kubernetes distribution, the control plane node fails to upgrade when using the `InPlaceUpgrade` strategy for sequential upgrades, such as upgrading from version 1.25.x to version 1.26.x and then to version 1.27.x. | Refer to the [Control Plane Node Fails to Upgrade in Sequential MicroK8s Upgrades](../troubleshooting/pack-issues.md) troubleshooting guide for resolution steps. | June 15, 2024 | Packs | +| Azure IaaS clusters are having issues with deployed load balancers and ingress deployments when using Kubernetes versions 1.29.0 and 1.29.4. Incoming connections time out as a result due to a lack of network path inside the cluster. Azure AKS clusters are not impacted. | Use a Kubernetes version lower than 1.29.0 | June 12, 2024 | Clusters | +| OIDC integration with Virtual Clusters is not functional. All other operations related to Virtual Clusters are operational. | No workaround is available. | Jun 11, 2024 | Virtual Clusters | +| Deploying self-hosted Palette or VerteX to a vSphere environment fails if vCenter has standalone hosts directly under a Datacenter. Persistent Volume (PV) provisioning fails due to an upstream issue with the vSphere Container Storage Interface (CSI) for all versions before v3.2.0. Palette and VerteX use the vSphere CSI version 3.1.2 internally. The issue may also occur in workload clusters deployed on vSphere using the same vSphere CSI for storage volume provisioning. | If you encounter the following error message when deploying self-hosted Palette or VerteX: `'ProvisioningFailed failed to provision volume with StorageClass "spectro-storage-class". Error: failed to fetch hosts from entity ComputeResource:domain-xyz` then use the following workaround. Remove standalone hosts directly under the Datacenter from vCenter and allow the volume provisioning to complete. After the volume is provisioned, you can add the standalone hosts back. You can also use a service account that does not have access to the standalone hosts as the user that deployed Palette. | May 21, 2024 | Self-Hosted | +| Conducting cluster node scaling operations on a cluster undergoing a backup can lead to issues and potential unresponsiveness. | To avoid this, ensure no backup operations are in progress before scaling nodes or performing other cluster operations that change the cluster state | April 14, 2024 | Clusters | +| Palette automatically creates an AWS security group for worker nodes using the format `-node`. If a security group with the same name already exists in the VPC, the cluster creation process fails. | To avoid this, ensure that no security group with the same name exists in the VPC before creating a cluster. | April 14, 2024 | Clusters | +| K3s version 1.27.7 has been marked as _Deprecated_. This version has a known issue that causes clusters to crash. | Upgrade to a newer version of K3s to avoid the issue, such as versions 1.26.12, 1.28.5, and 1.27.11. You can learn more about the issue in the [K3s GitHub issue](https://github.com/k3s-io/k3s/issues/9047) page. | April 14, 2024 | Packs, Clusters | +| When deploying a multi-node AWS EKS cluster with the Container Network Interface (CNI) , the cluster deployment fails. | A workaround is to use the AWS VPC CNI in the interim while the issue is resolved. | April 14, 2024 | Packs, Clusters | +| If a Kubernetes cluster deployed onto VMware is deleted, and later re-created with the same name, the cluster creation process fails. The issue is caused by existing resources remaining inside the PCG, or the System PCG, that are not cleaned up during the cluster deletion process. | Refer to the [VMware Resources Remain After Cluster Deletion](../troubleshooting/pcg.md#scenario---vmware-resources-remain-after-cluster-deletion) troubleshooting guide for resolution steps. | April 14, 2024 | Clusters | +| Day-2 operations related to infrastructure changes, such as modifying the node size and count, when using MicroK8s are not taking effect. | No workaround is available. | April 14, 2024 | Packs, Clusters | +| If a cluster that uses the Rook-Ceph pack experiences network issues, it's possible for the file mount to become and remain unavailable even after the network is restored. | This a known issue disclosed in the [Rook GitHub repository](https://github.com/rook/rook/issues/13818). To resolve this issue, refer to pack documentation. | April 14, 2024 | Packs, Edge | +| Edge clusters on Edge hosts with ARM64 processors may experience instability issues that cause cluster failures. | ARM64 support is limited to a specific set of Edge devices. Currently, Nvidia Jetson devices are supported. | April 14, 2024 | Edge | +| During the cluster provisioning process of new edge clusters, the Palette webhook pods may not always deploy successfully, causing the cluster to be stuck in the provisioning phase. This issue does not impact deployed clusters. | Review the [Palette Webhook Pods Fail to Start](../troubleshooting/edge.md#scenario---palette-webhook-pods-fail-to-start) troubleshooting guide for resolution steps. | April 14, 2024 | Edge | ## Resolved Known Issues diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 9c96310dfa..40db70817f 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -10,7 +10,7 @@ tags: ["troubleshooting", "self-hosted", "palette", "vertex"] Refer to the following sections to troubleshoot errors encountered when installing an Enterprise Cluster. -## Scenario - Self-linking Error +## Scenario - Self-Linking Error When installing an Enterprise Cluster, you may encounter an error stating that the enterprise cluster is unable to self-link. Self-linking is the process of Palette or VerteX becoming aware of the Kubernetes cluster it is installed on. @@ -78,7 +78,7 @@ following steps to restart the management pod. pod "mgmt-f7f97f4fd-lds69" deleted ``` -## Non-unique vSphere CNS Mapping +## Scenario - Non-Unique vSphere CNS Mapping In Palette and VerteX releases 4.4.8 and earlier, Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, @@ -156,3 +156,57 @@ automatically resolve this issue. If you have self-hosted instances of Palette i Events: ``` + +## Scenario - "Too Many Open Files" in Cluster + +When viewing logs for Enterprise or [Private Cloud Gateway](../clusters/pcg/pcg.md) clusters, you may encounter a "too +many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase +the maximum number of file descriptors for each node on your cluster. + +### Debug Steps + +Repeat the following process for each node in your cluster. + +1. Log in to a node in your cluster. + + ```bash + ssh -i + ``` + +2. Switch to `sudo` mode using the command that best fits your system and preferences. + + ```bash + sudo --login + ``` + +3. Increase the maximum number of file descriptors that the kernel can allocate system-wide. + + ```bash + echo "fs.file-max = 1000000" > /etc/sysctl.d/99-maxfiles.conf + ``` + +4. Apply the updated `sysctl` settings. The increased limit is returned. + + ```bash + sysctl -p /etc/sysctl.d/99-maxfiles.conf + ``` + + ```bash hideClipboard + fs.file-max = 1000000 + ``` + +5. Restart the `kubelet` and `containerd` services. + + ```bash + systemctl restart kubelet containerd + ``` + +6. Confirm that the change was applied. + + ```bash + sysctl fs.file-max + ``` + + ```bash hideClipboard + fs.file-max = 1000000 + ``` diff --git a/docs/docs-content/vertex/upgrade/upgrade-notes.md b/docs/docs-content/vertex/upgrade/upgrade-notes.md index 4807e45a9f..197513df22 100644 --- a/docs/docs-content/vertex/upgrade/upgrade-notes.md +++ b/docs/docs-content/vertex/upgrade/upgrade-notes.md @@ -27,4 +27,4 @@ troubleshooting guide for resolution steps. A known issue impacts all self-hosted Palette instances older then 4.4.14. Before upgrading an Palette instance with version older than 4.4.14, ensure that you execute a utility script to make all your cluster IDs unique in your Persistent Volume Claim (PVC) metadata. For more information, refer to the -[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). diff --git a/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md b/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md index b2b4ccd348..8f7d6236c5 100644 --- a/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md +++ b/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md @@ -17,7 +17,7 @@ section for details. If you are upgrading from a Palette VerteX version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md b/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md index 4c9d117c7b..7decb68be3 100644 --- a/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md +++ b/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md @@ -17,7 +17,7 @@ for details. If you are upgrading from a Palette VerteX version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). :::