Skip to content

Commit

Permalink
docs: document etcd backups (#2072)
Browse files Browse the repository at this point in the history
* first draft

* add edge k3s

* etcd draft

* remove restore page

* fix broken link

* add label for RKE2

* adjust page position

* correct typo

* Apply suggestions from code review

Co-authored-by: Karl Cardenas <[email protected]>

* address review comments

* Update docs/docs-content/clusters/cluster-management/backup-restore/enable-etcd-backup.md

* remove redundant heading

---------

Co-authored-by: Lenny Chen <[email protected]>
Co-authored-by: Karl Cardenas <[email protected]>
  • Loading branch information
3 people authored Feb 9, 2024
1 parent dd3fb43 commit 8573d92
Show file tree
Hide file tree
Showing 3 changed files with 170 additions and 17 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ group disable disk backups. You can back up all the volumes within a virtual clu
## Prerequisites

- A project or tenant backup location. Refer to the
[cluster backup and restore](../cluster-management/backup-restore/backup-restore.md#get-started) document to learn how
to configure a backup location.
[cluster backup and restore](../cluster-management/backup-restore/backup-restore.md) document to learn how to
configure a backup location.

- Cluster group modification [permissions](../../user-management/palette-rbac/palette-rbac.md).

Expand Down Expand Up @@ -83,8 +83,7 @@ You can validate that the disk backups are occurring by deploying a virtual clus
how to deploy Palette Virtual clusters.

3. Create a backup of your virtual cluster and include all disks. Use the
[Create a Cluster Backup](../cluster-management/backup-restore/backup-restore.md#get-started) guide for additional
guidance.
[Create a Cluster Backup](../cluster-management/backup-restore/backup-restore.md) guide for additional guidance.

4. Access the backup location's blob storage and review the backup files.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,34 @@ sidebar_position: 70
tags: ["clusters", "cluster management"]
---

Palette supports backup and restore capabilities for Kubernetes clusters.
Palette supports backup and restore capabilities for Kubernetes clusters. Two kinds of backups are supported: _cluster
backups_ and _etcd backups_.

A backup is a persistent state of Kubernetes resources, ranging from objects such as Pods, DaemonSets, and Services to
persistent volumes. A backup allows you to save the current state of a cluster and restore it at a later point in time
if needed. You can restore a backup to the same or a different cluster.
A cluster backup is a persistent state of Kubernetes resources, ranging from objects such as Pods, DaemonSets, and
Services to persistent volumes. A backup allows you to save the current state of a cluster and restore it at a later
point in time if needed. You can restore a backup to the same or a different cluster. You can schedule a backup of a
specific cluster or an entire [workspace](../../../workspace/workspace.md). You can also maintain multiple backups of a
cluster or workspace.

You can schedule a backup of a specific cluster or an entire [workspace](../../../workspace/workspace.md). You can also
maintain multiple backups of a cluster or workspace.
An etcd backup is a snapshot of the etcd key-value store used as the backend for all cluster information. etcd snapshots
are required to remediate data corruption problems that can occur in Kubernetes clusters. etcd backups are usually used
to restore the same cluster. etcd snapshots are usually small in size and automated backups are turned on by default.

## Get Started
## Cluster Backup vs etcd Backup

The following table offers a overview of the differences between a cluster backup and an etcd backup.

| Aspect | etcd Backup | Cluster Backup |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------- |
| **Scope** | Only backs up etcd data, which includes cluster state, configuration, and resource definition data. | Backs up entire Kubernetes cluster resources, including pods, services, deployments, and associated data in persistent volumes. |
| **Enabled by default?** | Yes | No |
| **Use case** | Restoring etcd in case of data corruption or loss. | Migrating workloads between clusters. Restoring after accidental deletion or corruption of Kubernetes resources. |
| **Restoration target** | Typically used for restoring etcd on the same cluster | Can be used to restore on the same cluster or migrate to a different cluster |
| **Operational overhead** | Restoring is manual and requires technical expertise in etcd and command-line operations. Requires an SSH connection to the cluster. | Restore can be performed from the Palette user interface. Does not require an SSH connection to the cluster. |
| **Source cluster availability** | Not required. | Required. |
| **Typical file size** | Relatively small (megabytes to low gigabytes). | Usually larger, depending on the size of cluster and volume data. |

## Cluster Backup

To get started with creating a backup, check out the
[Add a Backup Location using Static Credentials](add-backup-location-static.md) or
Expand All @@ -29,9 +47,7 @@ learn more about backup and restore actions for a workspace.

:::

<br />

## What is a Backup Location?
### Backup Locations

A backup location is an object storage, such as an AWS Simple Storage Service (S3) bucket, where you store and retrieve
the backup files. Before you create a backup, the initial step is configuring a backup location. You can configure a
Expand All @@ -46,8 +62,6 @@ object storage solutions as backup locations.

- Azure blob storage

<br />

:::info

Palette uses open-source Velero to provide backup and restore capabilities. You can learn more about Velero by checking
Expand All @@ -60,7 +74,7 @@ You can add a backup location to the same cloud account you use to deploy Kubern
account. Both authentication methods require an Identity Access Management (IAM) entity in the cloud account and access
credentials for the IAM entity.

## Backup Locations and Credentials
### Backup Locations and Credentials

Palette uses the access credentials to authenticate itself while accessing the storage bucket. Palette supports static
credentials for all cloud service providers. You can also use dynamic credentials with the backup and restore workflow.
Expand All @@ -80,6 +94,13 @@ or
[Add a Backup Location using Dynamic Credentials](/clusters/cluster-management/backup-restore/add-backup-location-dynamic)
guide.

## etcd Backups

etcd backups are enabled by default. You can edit the YAML file for a cluster's Kubernetes layer to configure its
frequency, maximum number of copies to retain. Use the following resource to learn more about etcd backups:

- [Enable etcd Backups](./enable-etcd-backup.md)

## Resources

- [Add a Backup Location using Static Credentials](add-backup-location-static.md)
Expand All @@ -89,3 +110,5 @@ guide.
- [Create a Cluster Backup](create-cluster-backup.md)

- [Restore a Cluster Backup](restore-cluster-backup.md)

- [etcd Backups](./enable-etcd-backup.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
sidebar_label: "Configure etcd Backup"
title: "Configure etcd Backup"
description: "Learn how to enable scheduled etcd backups."
hide_table_of_contents: false
sidebar_position: 45
tags: ["clusters", "cluster management", "backup"]
---

etcd backups are enabled by default and backups are performed on each etcd node. You can adjust the backup frequency,
the location of the backup files, and the maximum number of backup files to retain by editing the YAML file of your
cluster profile's Kubernetes layer.

## Prerequisite

- An active cluster in Palette using PXK, PXK-E, RKE2, or K3s as its Kubernetes layer.
- The MicroK8s distribution of Kubernetes uses dqlite3 instead of etcd for its data store by default, so these steps
are not applicable to MicroK8s instances.

## Configure etcd Backup

1. Log in to [Palette](https://console.spectrocloud.com).

2. From the **Main Menu**, select **Profiles**.

3. Select the profile you use to deploy clusters. And then, select the Kubernetes layer of your profile.

4. Edit the YAML file of your Kubernetes layer and add the following configurations. Depending on your Kubernetes
distribution, different configuration parameters are available.

<Tabs group="distribution">

<TabItem value="PXK" label="PXK/PXK-E">

In Palette eXtended Kubernetes (PXK), etcd backup is enabled by default and cannot be disabled. You also cannot
change the directory where the backups are stored, which is `/var/lib/etcd`.

| Parameter | Description | Default |
| --------------------------------------------------- | -------------------------------------------------------------------- | ------- |
| `kubeadmconfig.etcd.local.extraArgs.snapshot-count` | The number of committed transactions required to trigger a snapshot. | 100000 |
| `kubeadmconfig.etcd.local.extraArgs.max-snapshots` | The maximum number of etcd backups to retain. | 5 |

For example,

```yaml
kubeadmconfig:
etcd:
local:
extraArgs:
max-snapshots: 10
snapshot-count: "10000"
```
</TabItem>
<TabItem value="rke2" label="RKE2">
Use the following parameters to configure scheduled backups.
| Parameter | Description | Default |
| -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- |
| `cluster.config.etcd-snapshot-schedule-cron` | A cron expression that controls the time and frequency of scheduled backups. For example, the default value `0 */1 * * *` means the scheduled backup runs every hour. | `0 */1 * * *` |
| `cluster.config.etcd-snapshot-retention` | The maximum number of etcd backups to retain. | 12 |
| `cluster.config.etcd-disable-snapshots` | Controls whether or not to disable scheduled etcd snapshots. | `false` |
| `cluster.config.etcd-snapshot-dir` | Specifies the directory where the etcd snapshots are saved. | `/var/lib/rancher/rke2/server/db/snapshots` |

For example,

```yaml
cluster:
config:
etcd-snapshot-schedule-cron: 0 */1 * * *
etcd-snapshot-retention: 12
etcd-disable-snapshots: false
etcd-snapshot-dir: /var/lib/rancher/rke2/server/db/snapshots
```

</TabItem>

<TabItem value="k3s" label="K3S">

Use the following parameters to configure scheduled backups.

| Parameter | Description | Default |
| -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ |
| `cluster.config.etcd-snapshot-schedule-cron` | A cron expression that controls the time and frequency of scheduled backups. For example, the default value `0 */1 * * *` means the scheduled backup runs every hour. | `0 */1 * * *` |
| `cluster.config.etcd-snapshot-retention` | The maximum number of etcd backups to retain. | 12 |
| `cluster.config.etcd-disable-snapshots` | Controls whether or not to disable scheduled etcd snapshots. | `false` |
| `cluster.config.etcd-snapshot-dir` | Specifies the directory where the etcd snapshots are saved. By default, this value is `/var/lib/k3s/rancher/server/db/snapshots`. | `/var/lib/rancher/k3s/server/db/snapshots` |

For example,

```yaml
cluster:
config:
etcd-snapshot-schedule-cron: 0 */1 * * *
etcd-snapshot-retention: 12
etcd-disable-snapshots: false
etcd-snapshot-dir: /var/lib/rancher/k3s/server/db/snapshots
```

</TabItem>

</Tabs>

5. If you have not deployed a cluster, finish the cluster profile creation and deploy a cluster. For more information,
refer to [Create Cluster Profile](../../../profiles/cluster-profiles/create-cluster-profiles/) for non-Edge and
[Model Cluster Profile](../../edge/site-deployment/model-profile.md).

If you are editing the profile of an active cluster, updating the profile will trigger and update to the active
cluster. We recommend that you publish a new version of the profile instead of updating a profile directly. For more
information, refer to [Update a Cluster](../cluster-updates.md).

## Validate

To validate the snapshots are successfully configured, connect to any control plane node of the cluster via SSH and
change into the directory where the etcd snapshots are saved. Confirm that the snapshots are being created in the
directory.

## Next Steps

If your cluster experiences data corruption issues, you can use the etcd snapshots to restore the cluster to working
conditions. Restoring a cluster may be a challenging procedure, depending on your experience and technical skill level.
The restoration also requires SSH access to the node and intimate knowledge of etcd. If you have concerns or need
assistance, contact us [[email protected]](mailto:[email protected]) for additional guidance.

The following is list of helpful resources that can help you understand disaster recovery for etcd:

- [Disaster recovery](https://etcd.io/docs/v3.5/op-guide/recovery/)

- [Operating etcd clusters for Kubernetes](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/)

0 comments on commit 8573d92

Please sign in to comment.