From 6a3dacb8b25f30e6c89b97caaa56285f50dd805b Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Tue, 10 Dec 2024 10:56:19 -0500 Subject: [PATCH 1/9] Initial commit for too many open files error --- .../troubleshooting/enterprise-install.md | 42 ++++++++++++++++++- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 9c96310dfa..3ad2269d2b 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -10,7 +10,7 @@ tags: ["troubleshooting", "self-hosted", "palette", "vertex"] Refer to the following sections to troubleshoot errors encountered when installing an Enterprise Cluster. -## Scenario - Self-linking Error +## Scenario - Self-Linking Error When installing an Enterprise Cluster, you may encounter an error stating that the enterprise cluster is unable to self-link. Self-linking is the process of Palette or VerteX becoming aware of the Kubernetes cluster it is installed on. @@ -78,7 +78,7 @@ following steps to restart the management pod. pod "mgmt-f7f97f4fd-lds69" deleted ``` -## Non-unique vSphere CNS Mapping +## Scenario - Non-Unique vSphere CNS Mapping In Palette and VerteX releases 4.4.8 and earlier, Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, @@ -156,3 +156,41 @@ automatically resolve this issue. If you have self-hosted instances of Palette i Events: ``` + +## Scenario - "Too Many Open Files" in Management Cluster + +When viewing logs for enterprise or private cloud gateway management clusters, you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. + +### Debug Steps + +1. Log in to a node in your management cluster. + +```bash +ssh -i +``` + +2. Switch to `sudo` mode. + +```bash +sudo --login +``` + +3. Increase the maximum number of file descriptors that the kernel can allocate system-wide. + +```bash +echo "fs.file-max = 1000000" > /etc/sysctl.d/99-maxfiles.conf +``` + +4. Apply the updated `sysctl` settings. + +```bash +sysctl -p /etc/sysctl.d/99-maxfiles.conf +``` + +5. Restart the `kubelet` and `containerd` services. + +```bash +systemctl restart kubelet containerd +``` + +6. Repeat the above process for each node. From 40c4bebf802e1c8984355e19302be5e9a57d86d5 Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Tue, 10 Dec 2024 14:21:34 -0500 Subject: [PATCH 2/9] Additional troubleshooting tweaks; fixed incorrect CI/CD reference --- docs/docs-content/automation/palette-cli/palette-cli.md | 2 +- docs/docs-content/troubleshooting/enterprise-install.md | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/docs-content/automation/palette-cli/palette-cli.md b/docs/docs-content/automation/palette-cli/palette-cli.md index c740920d89..5cce0992c3 100644 --- a/docs/docs-content/automation/palette-cli/palette-cli.md +++ b/docs/docs-content/automation/palette-cli/palette-cli.md @@ -7,7 +7,7 @@ tags: ["palette-cli"] --- The Palette CLI contains various functionalities that you can use to interact with Palette and manage resources. The -Palette CLI is well suited for Continuous Delivery/Continuous Deployment (CI/CD) pipelines and recommended for +Palette CLI is well suited for Continuous Integration/Continuous Deployment (CI/CD) pipelines and recommended for automation tasks, where Terraform or direct API queries are not ideal. To get started with the Palette CLI, check out the [Install](install-palette-cli.md) guide. diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 3ad2269d2b..aeaba83826 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -159,10 +159,12 @@ automatically resolve this issue. If you have self-hosted instances of Palette i ## Scenario - "Too Many Open Files" in Management Cluster -When viewing logs for enterprise or private cloud gateway management clusters, you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. +When viewing logs for enterprise or private cloud gateway management clusters, you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. ### Debug Steps +Repeat the following process for each node in your management cluster. + 1. Log in to a node in your management cluster. ```bash @@ -191,6 +193,4 @@ sysctl -p /etc/sysctl.d/99-maxfiles.conf ```bash systemctl restart kubelet containerd -``` - -6. Repeat the above process for each node. +``` \ No newline at end of file From 21799b5c9fd8afc9d8744dd9bdca7da80714b1a8 Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Tue, 10 Dec 2024 14:51:52 -0500 Subject: [PATCH 3/9] Fixed broken links due to updated heading --- docs/docs-content/enterprise-version/upgrade/upgrade-notes.md | 2 +- .../enterprise-version/upgrade/upgrade-vmware/airgap.md | 2 +- .../enterprise-version/upgrade/upgrade-vmware/non-airgap.md | 2 +- docs/docs-content/release-notes/known-issues.md | 2 +- docs/docs-content/vertex/upgrade/upgrade-notes.md | 2 +- docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md | 2 +- docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md | 2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md b/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md index 2eae950a9e..c7f5c31d61 100644 --- a/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md +++ b/docs/docs-content/enterprise-version/upgrade/upgrade-notes.md @@ -57,7 +57,7 @@ Palette 4.0 includes the following major enhancements that require user interven A known issue impacts all self-hosted Palette instances older then 4.4.14. Before upgrading a Palette instance with version older than 4.4.14, ensure that you execute a utility script to make all your cluster IDs unique in your Persistent Volume Claim (PVC) metadata. For more information, refer to the -[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md index fd512b6af2..73ce1362b8 100644 --- a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md +++ b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/airgap.md @@ -17,7 +17,7 @@ details. If you are upgrading from a Palette version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md index 4229176347..3f7c70e2a2 100644 --- a/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md +++ b/docs/docs-content/enterprise-version/upgrade/upgrade-vmware/non-airgap.md @@ -16,7 +16,7 @@ version available. Refer to the [Supported Upgrade Paths](../upgrade.md#supporte If you are upgrading from a Palette version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/release-notes/known-issues.md b/docs/docs-content/release-notes/known-issues.md index ae4aaf487b..6de779d4a7 100644 --- a/docs/docs-content/release-notes/known-issues.md +++ b/docs/docs-content/release-notes/known-issues.md @@ -33,7 +33,7 @@ The following table lists all known issues that are currently active and affecti | If an Edge host operating a cluster in connected mode loses connection to Palette, the cluster will not auto-renew its Public Key Infrastructure (PKI) certificates. When it re-establishes the connection to Palette, the Edge host will renew the certificates if the existing certificates have less than 30 days before expiry. | No workaround available. | September 14, 2024 | Edge | | Using the Flannel Container Network Interface (CSI) pack together with a Red Hat Enterprise Linux (RHEL)-based provider image may lead to a pod becoming stuck during deployment. This is caused by an upstream issue with Flannel that was discovered in a K3s GitHub issue. Refer to [the K3s issue page](https://github.com/k3s-io/k3s/issues/5013) for more information. | No workaround is available | September 14, 2024 | Edge | | Palette OVA import operations fail if the VMO cluster is using a storageClass with the volume bind method `WaitForFirstConsumer`. | Refer to the [OVA Imports Fail Due To Storage Class Attribute](../troubleshooting/vmo-issues.md#scenario---ova-imports-fail-due-to-storage-class-attribute) troubleshooting guide for workaround steps. | September 13, 2024 | Palette CLI, VMO | -| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | +| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | | Third-party binaries downloaded and used by the Palette CLI may become stale and incompatible with the CLI. | Refer to the [Incompatible Stale Palette CLI Binaries](../troubleshooting/automation.md#scenario---incompatible-stale-palette-cli-binaries) troubleshooting guide for workaround guidance. | September 11, 2024 | CLI | | An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | | The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | diff --git a/docs/docs-content/vertex/upgrade/upgrade-notes.md b/docs/docs-content/vertex/upgrade/upgrade-notes.md index 4807e45a9f..197513df22 100644 --- a/docs/docs-content/vertex/upgrade/upgrade-notes.md +++ b/docs/docs-content/vertex/upgrade/upgrade-notes.md @@ -27,4 +27,4 @@ troubleshooting guide for resolution steps. A known issue impacts all self-hosted Palette instances older then 4.4.14. Before upgrading an Palette instance with version older than 4.4.14, ensure that you execute a utility script to make all your cluster IDs unique in your Persistent Volume Claim (PVC) metadata. For more information, refer to the -[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting Guide](../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). diff --git a/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md b/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md index b2b4ccd348..8f7d6236c5 100644 --- a/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md +++ b/docs/docs-content/vertex/upgrade/upgrade-vmware/airgap.md @@ -17,7 +17,7 @@ section for details. If you are upgrading from a Palette VerteX version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: diff --git a/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md b/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md index 4c9d117c7b..7decb68be3 100644 --- a/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md +++ b/docs/docs-content/vertex/upgrade/upgrade-vmware/non-airgap.md @@ -17,7 +17,7 @@ for details. If you are upgrading from a Palette VerteX version that is older than 4.4.14, ensure that you have executed the utility script to make the CNS mapping unique for the associated PVC. For more information, refer to the -[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#non-unique-vsphere-cns-mapping). +[Troubleshooting guide](../../../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping). ::: From 66793f149025c5415c0ad9fc23eaea8c9328f459 Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Tue, 10 Dec 2024 15:21:42 -0500 Subject: [PATCH 4/9] Adjusted PCG verbiage and added x-ref --- docs/docs-content/troubleshooting/enterprise-install.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index aeaba83826..1ada376720 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -159,7 +159,7 @@ automatically resolve this issue. If you have self-hosted instances of Palette i ## Scenario - "Too Many Open Files" in Management Cluster -When viewing logs for enterprise or private cloud gateway management clusters, you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. +When viewing logs for enterprise management clusters or management clusters using a [Private Cloud Gateway](../clusters/pcg/pcg.md), you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. ### Debug Steps From 2bb83bf4adf9797d5d40e2bb75bb52ab59d937ca Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Tue, 10 Dec 2024 16:10:01 -0500 Subject: [PATCH 5/9] Fixed code block indentation --- .../troubleshooting/enterprise-install.md | 38 +++++++++---------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 1ada376720..56a92f5f95 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -166,31 +166,31 @@ When viewing logs for enterprise management clusters or management clusters usin Repeat the following process for each node in your management cluster. 1. Log in to a node in your management cluster. - -```bash -ssh -i -``` - + + ```bash + ssh -i + ``` + 2. Switch to `sudo` mode. - -```bash -sudo --login -``` + + ```bash + sudo --login + ``` 3. Increase the maximum number of file descriptors that the kernel can allocate system-wide. - -```bash -echo "fs.file-max = 1000000" > /etc/sysctl.d/99-maxfiles.conf -``` + + ```bash + echo "fs.file-max = 1000000" > /etc/sysctl.d/99-maxfiles.conf + ``` 4. Apply the updated `sysctl` settings. -```bash -sysctl -p /etc/sysctl.d/99-maxfiles.conf -``` + ```bash + sysctl -p /etc/sysctl.d/99-maxfiles.conf + ``` 5. Restart the `kubelet` and `containerd` services. -```bash -systemctl restart kubelet containerd -``` \ No newline at end of file + ```bash + systemctl restart kubelet containerd + ``` \ No newline at end of file From 247723779d03aaaa74e831f4d669770f37d60d78 Mon Sep 17 00:00:00 2001 From: achuribooks Date: Wed, 11 Dec 2024 14:29:34 +0000 Subject: [PATCH 6/9] ci: auto-formatting prettier issues --- docs/docs-content/release-notes/known-issues.md | 2 +- .../troubleshooting/enterprise-install.md | 15 +++++++++------ 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/docs-content/release-notes/known-issues.md b/docs/docs-content/release-notes/known-issues.md index 6de779d4a7..64af20b465 100644 --- a/docs/docs-content/release-notes/known-issues.md +++ b/docs/docs-content/release-notes/known-issues.md @@ -33,7 +33,7 @@ The following table lists all known issues that are currently active and affecti | If an Edge host operating a cluster in connected mode loses connection to Palette, the cluster will not auto-renew its Public Key Infrastructure (PKI) certificates. When it re-establishes the connection to Palette, the Edge host will renew the certificates if the existing certificates have less than 30 days before expiry. | No workaround available. | September 14, 2024 | Edge | | Using the Flannel Container Network Interface (CSI) pack together with a Red Hat Enterprise Linux (RHEL)-based provider image may lead to a pod becoming stuck during deployment. This is caused by an upstream issue with Flannel that was discovered in a K3s GitHub issue. Refer to [the K3s issue page](https://github.com/k3s-io/k3s/issues/5013) for more information. | No workaround is available | September 14, 2024 | Edge | | Palette OVA import operations fail if the VMO cluster is using a storageClass with the volume bind method `WaitForFirstConsumer`. | Refer to the [OVA Imports Fail Due To Storage Class Attribute](../troubleshooting/vmo-issues.md#scenario---ova-imports-fail-due-to-storage-class-attribute) troubleshooting guide for workaround steps. | September 13, 2024 | Palette CLI, VMO | -| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | +| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | | Third-party binaries downloaded and used by the Palette CLI may become stale and incompatible with the CLI. | Refer to the [Incompatible Stale Palette CLI Binaries](../troubleshooting/automation.md#scenario---incompatible-stale-palette-cli-binaries) troubleshooting guide for workaround guidance. | September 11, 2024 | CLI | | An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | | The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 56a92f5f95..e369d5685b 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -159,26 +159,29 @@ automatically resolve this issue. If you have self-hosted instances of Palette i ## Scenario - "Too Many Open Files" in Management Cluster -When viewing logs for enterprise management clusters or management clusters using a [Private Cloud Gateway](../clusters/pcg/pcg.md), you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. +When viewing logs for enterprise management clusters or management clusters using a +[Private Cloud Gateway](../clusters/pcg/pcg.md), you may encounter a "too many open files" error, which prevents logs +from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for +each node on your cluster. ### Debug Steps Repeat the following process for each node in your management cluster. 1. Log in to a node in your management cluster. - + ```bash ssh -i ``` - + 2. Switch to `sudo` mode. - + ```bash sudo --login ``` 3. Increase the maximum number of file descriptors that the kernel can allocate system-wide. - + ```bash echo "fs.file-max = 1000000" > /etc/sysctl.d/99-maxfiles.conf ``` @@ -193,4 +196,4 @@ Repeat the following process for each node in your management cluster. ```bash systemctl restart kubelet containerd - ``` \ No newline at end of file + ``` From 77f180e16221289f206bd6dc567595c2a8c9b087 Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Wed, 11 Dec 2024 12:16:45 -0500 Subject: [PATCH 7/9] Incorporating suggestions from Ben --- .../troubleshooting/enterprise-install.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 56a92f5f95..9aa0523dae 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -171,7 +171,7 @@ Repeat the following process for each node in your management cluster. ssh -i ``` -2. Switch to `sudo` mode. +2. Switch to `sudo` mode using the command that best fits your system and preferences. ```bash sudo --login @@ -193,4 +193,14 @@ Repeat the following process for each node in your management cluster. ```bash systemctl restart kubelet containerd + ``` + +6. Confirm the changes are applied. + + ```bash + sysctl fs.file-max + ``` + + ```bash hideClipboard + fs.file-max = 1000000 ``` \ No newline at end of file From 4ce2eb86cb49b189b928c1d618917f3aecf0ae9d Mon Sep 17 00:00:00 2001 From: Amanda Churi Filanowski Date: Wed, 11 Dec 2024 15:43:36 -0500 Subject: [PATCH 8/9] Incorporates additional suggestions per Carolina --- .../troubleshooting/enterprise-install.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 2c995508cf..149227dd67 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -157,21 +157,21 @@ automatically resolve this issue. If you have self-hosted instances of Palette i Events: ``` -## Scenario - "Too Many Open Files" in Management Cluster +## Scenario - "Too Many Open Files" in Cluster -When viewing logs for enterprise management clusters or management clusters using a -[Private Cloud Gateway](../clusters/pcg/pcg.md), you may encounter a "too many open files" error, which prevents logs +When viewing logs for Enterprise or +[Private Cloud Gateway](../clusters/pcg/pcg.md) clusters, you may encounter a "too many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for each node on your cluster. ### Debug Steps -Repeat the following process for each node in your management cluster. +Repeat the following process for each node in your cluster. -1. Log in to a node in your management cluster. +1. Log in to a node in your cluster. ```bash - ssh -i + ssh -i ``` 2. Switch to `sudo` mode using the command that best fits your system and preferences. @@ -186,12 +186,16 @@ Repeat the following process for each node in your management cluster. echo "fs.file-max = 1000000" > /etc/sysctl.d/99-maxfiles.conf ``` -4. Apply the updated `sysctl` settings. +4. Apply the updated `sysctl` settings. The increased limit is returned. ```bash sysctl -p /etc/sysctl.d/99-maxfiles.conf ``` + ```bash hideClipboard + fs.file-max = 1000000 + ``` + 5. Restart the `kubelet` and `containerd` services. ```bash From b5f999e76871e4958560fb1b0b09c94b1c6cdf88 Mon Sep 17 00:00:00 2001 From: achuribooks Date: Wed, 11 Dec 2024 20:50:56 +0000 Subject: [PATCH 9/9] ci: auto-formatting prettier issues --- docs/docs-content/release-notes/known-issues.md | 2 +- .../troubleshooting/enterprise-install.md | 13 ++++++------- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/docs-content/release-notes/known-issues.md b/docs/docs-content/release-notes/known-issues.md index 6de779d4a7..64af20b465 100644 --- a/docs/docs-content/release-notes/known-issues.md +++ b/docs/docs-content/release-notes/known-issues.md @@ -33,7 +33,7 @@ The following table lists all known issues that are currently active and affecti | If an Edge host operating a cluster in connected mode loses connection to Palette, the cluster will not auto-renew its Public Key Infrastructure (PKI) certificates. When it re-establishes the connection to Palette, the Edge host will renew the certificates if the existing certificates have less than 30 days before expiry. | No workaround available. | September 14, 2024 | Edge | | Using the Flannel Container Network Interface (CSI) pack together with a Red Hat Enterprise Linux (RHEL)-based provider image may lead to a pod becoming stuck during deployment. This is caused by an upstream issue with Flannel that was discovered in a K3s GitHub issue. Refer to [the K3s issue page](https://github.com/k3s-io/k3s/issues/5013) for more information. | No workaround is available | September 14, 2024 | Edge | | Palette OVA import operations fail if the VMO cluster is using a storageClass with the volume bind method `WaitForFirstConsumer`. | Refer to the [OVA Imports Fail Due To Storage Class Attribute](../troubleshooting/vmo-issues.md#scenario---ova-imports-fail-due-to-storage-class-attribute) troubleshooting guide for workaround steps. | September 13, 2024 | Palette CLI, VMO | -| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | +| Persistent Volume Claims (PVCs) metadata do not use a unique identifier for self-hosted Palette clusters. This causes incorrect Cloud Native Storage (CNS) mappings in vSphere, potentially leading to issues during node operations and cluster upgrades. | Refer to the [Troubleshooting section](../troubleshooting/enterprise-install.md#scenario---non-unique-vsphere-cns-mapping) for guidance. | September 13, 2024 | Self-hosted | | Third-party binaries downloaded and used by the Palette CLI may become stale and incompatible with the CLI. | Refer to the [Incompatible Stale Palette CLI Binaries](../troubleshooting/automation.md#scenario---incompatible-stale-palette-cli-binaries) troubleshooting guide for workaround guidance. | September 11, 2024 | CLI | | An issue with Edge hosts using [Trusted Boot](../clusters/edge/trusted-boot/trusted-boot.md) and encrypted drives occurs when TRIM is not enabled. As a result, Solid-State Drive and Nonvolatile Memory Express drives experience degraded performance and potentially cause cluster failures. This [issue](https://github.com/kairos-io/kairos/issues/2693) stems from [Kairos](https://kairos.io/) not passing through the `--allow-discards` flag to the `systemd-cryptsetup attach` command. | Check out the [Degreated Performance on Disk Drives](../troubleshooting/edge.md#scenario---degreated-performance-on-disk-drives) troubleshooting guide for guidance on workaround. | September 4, 2024 | Edge | | The AWS CSI pack has a [Pod Disruption Budget](https://kubernetes.io/docs/tasks/run-application/configure-pdb/) (PDB) that allows for a maximum of one unavailable pod. This behavior causes an issue for single-node clusters as well as clusters with a single control plane node and a single worker node where the control plane lacks worker capability. [Operating System (OS) patch](../clusters/cluster-management/os-patching.md) updates may attempt to evict the CSI controller without success, resulting in the node remaining in the un-schedulable state. | If OS patching is enabled, allow the control plane nodes to have worker capability. For single-node clusters, turn off the OS patching feature. | September 4, 2024 | Cluster, Packs | diff --git a/docs/docs-content/troubleshooting/enterprise-install.md b/docs/docs-content/troubleshooting/enterprise-install.md index 149227dd67..40db70817f 100644 --- a/docs/docs-content/troubleshooting/enterprise-install.md +++ b/docs/docs-content/troubleshooting/enterprise-install.md @@ -159,10 +159,9 @@ automatically resolve this issue. If you have self-hosted instances of Palette i ## Scenario - "Too Many Open Files" in Cluster -When viewing logs for Enterprise or -[Private Cloud Gateway](../clusters/pcg/pcg.md) clusters, you may encounter a "too many open files" error, which prevents logs -from tailing after a certain point. To resolve this issue, you must increase the maximum number of file descriptors for -each node on your cluster. +When viewing logs for Enterprise or [Private Cloud Gateway](../clusters/pcg/pcg.md) clusters, you may encounter a "too +many open files" error, which prevents logs from tailing after a certain point. To resolve this issue, you must increase +the maximum number of file descriptors for each node on your cluster. ### Debug Steps @@ -173,9 +172,9 @@ Repeat the following process for each node in your cluster. ```bash ssh -i ``` - -2. Switch to `sudo` mode using the command that best fits your system and preferences. - + +2. Switch to `sudo` mode using the command that best fits your system and preferences. + ```bash sudo --login ```