diff --git a/docs/build-test-envs.md b/docs/build-test-envs.md index 462279ec..d499ab59 100644 --- a/docs/build-test-envs.md +++ b/docs/build-test-envs.md @@ -10,8 +10,8 @@ Take a moment to orient yourself, there are a few items to consider before movin ### Clone Genestack -> Your local genestack repository will be transferred to the eventual launcher instance for convenience (_perfect for development_). -See [[Getting Started|https://github.com/rackerlabs/genestack/wiki#getting-started]] for an example on how to recursively clone the repository and its submodules. +> Your local genestack repository will be transferred to the eventual launcher instance for convenience **perfect for development**. +See [Getting Started](quickstart.md] for an example on how to recursively clone the repository and its submodules. ### Create a VirtualEnv @@ -29,9 +29,9 @@ pip install ansible openstacksdk The openstacksdk used by the ansible playbook needs a valid configuration to your environment to stand up the test resources. -An example `clouds.yaml` that could be placed in [ansible/playbooks/](../../tree/main/ansible/playbooks): +An example `clouds.yaml`: -``` +``` yaml cache: auth: true expiration_time: 3600 @@ -50,7 +50,7 @@ clouds: identity_api_version: "3" ``` -See the configuration guide [[here|https://docs.openstack.org/openstacksdk/latest/user/config/configuration.html]] for more examples. +See the configuration guide [here](https://docs.openstack.org/openstacksdk/latest/user/config/configuration.html) for more examples. ## Create a Test Environment diff --git a/docs/index.md b/docs/index.md index e4025c5b..afcfd163 100644 --- a/docs/index.md +++ b/docs/index.md @@ -13,36 +13,36 @@ to manage cloud infrastructure in the way you need it. They say a picture is worth 1000 words, so here's a picture. -![Genestack Architecture Diagram](assets/images/diagram-genestack.png) +![Genestack Architecture Diagram](../assets/images/diagram-genestack.png) --- Building our cloud future has never been this simple. -### 0.Getting Started +## 0.Getting Started * [Getting Started](getting-started.md) * [Building Virtual Environments for Testing](build-test-envs.md) -### 1.Kubernetes +## 1.Kubernetes * [Building Your Kubernetes Environment](build-k8s.md) * [Retrieve kube config](kube-config.md) -### 2.Storage +## 2.Storage * [Create Persistent Storage](Create-Persistent-Storage.md) -### 3.Infrastructure +## 3.Infrastructure * [Deploy Required Infrastructure](deploy-required-infrastructure.md) * [Deploy Prometheus](prometheus.md) * [Deploy Vault](vault.md) -### 4.Openstack Infrastructure +## 4.Openstack Infrastructure * [Deploy Openstack on k8s](Deploy-Openstack.md) -#### Post Deployment +## Post Deployment * [Post Deploy Operations](post-deploy-ops.md) * [Building Local Images](build-local-images.md) * [OVN Database Backup](ovn-db-backup.md) -#### Upgrades +## Upgrades * [Running Genestack Upgrade](genestack-upgrade.md) * [Running Kubernetes Upgrade](k8s-upgrade.md) diff --git a/docs/ovn-db-backup.md b/docs/ovn-db-backup.md index f0e7c8c6..248d4391 100644 --- a/docs/ovn-db-backup.md +++ b/docs/ovn-db-backup.md @@ -1,40 +1,28 @@ -- [Background](#background) -- [Backup](#backup) -- [Restoration and recovery](#restoration-and-recovery) - - [Recovering when a majority of OVN DB nodes work fine](#recovering-when-a-majority-of-ovn-db-nodes-work-fine) - - [Recovering from a majority of OVN DB node failures or a total cluster failure](#recovering-from-a-majority-of-ovn-db-node-failures-or-a-total-cluster-failure) - - [Trying to use _OVN_ DB files in `/etc/origin/ovn` on the _k8s_ nodes](#trying-to-use-ovn-db-files-in-etcoriginovn-on-the-k8s-nodes) - - [Finding the first node](#finding-the-first-node) - - [Trying to create a pod for `ovsdb-tool`](#trying-to-create-a-pod-for-ovsdb-tool) - - [`ovsdb-tool` from your Linux distribution's packaging system](#ovsdb-tool-from-your-linux-distributions-packaging-system) - - [Conclusion of using the OVN DB files on your _k8s_ nodes](#conclusion-of-using-the-ovn-db-files-on-your-k8s-nodes) - - [Full recovery](#full-recovery) - # Background By default, _Genestack_ creates a pod that runs _OVN_ snapshots daily in the `kube-system` namespace where you find other centralized _OVN_ things. These get stored on a persistent storage volume associated with the `ovndb-backup` _PersistentVolumeClaim_. Snapshots older than 30 days get deleted. You should primarily follow the [Kube-OVN documentation on backup and recovery](https://kubeovn.github.io/docs/stable/en/ops/recover-db/) and consider the information here supplementary. -# Backup +## Backup A default _Genestack_ installation creates a _k8s_ _CronJob_ in the `kube-system` namespace along side the other central OVN components that will store snapshots of the OVN NB and SB in the _PersistentVolume_ for the _PersistentVolumeClaim_ named `ovndb-backup`. Storing these on the persistent volume like this matches the conventions for _MariaDB_ in _Genestack_. -You may wish to implement shipping these off of the cluster to a permanent location, as you might have cluster problems that could interfere with your ability to get these off of the _PersistentVolume_ when you need these backups. +## Restoration and recovery -# Restoration and recovery +You may wish to implement shipping these off of the cluster to a permanent location, as you might have cluster problems that could interfere with your ability to get these off of the _PersistentVolume_ when you need these backups. -## Recovering when a majority of OVN DB nodes work fine +### Recovering when a majority of OVN DB nodes work fine If you have a majority of _k8s_ nodes running `ovn-central` working fine, you can just follow the directions in the _Kube-OVN_ documentation for kicking a node out. Things mostly work normally when you have a majority because OVSDB HA uses a raft algorithm which only requires a majority of the nodes for full functionality, so you don't have to do anything too strange or extreme to recover. You essentially kick the bad node out and let it recover. -## Recovering from a majority of OVN DB node failures or a total cluster failure +### Recovering from a majority of OVN DB node failures or a total cluster failure **You probably shouldn't use this section if you don't have a majority OVN DB node failure. Just kick out the minority of bad nodes as indicated above instead**. Use this section to recover from a failure of the **majority** of nodes. As a first step, you will need to get database files to run the recovery. You can try to use files on your nodes as described below, or use one of the backup snapshots. -### Trying to use _OVN_ DB files in `/etc/origin/ovn` on the _k8s_ nodes +#### Trying to use _OVN_ DB files in `/etc/origin/ovn` on the _k8s_ nodes You can use the information in this section to try to get the files to use for your recovery from your running _k8s_ nodes. @@ -42,7 +30,7 @@ The _Kube-OVN_ shows trying to use _OVN_ DB files from `/etc/origin/ovn` on the The directions in the _Kube-OVN_ documentation use `docker run` to get a working `ovsdb-tool` to try to work with the OVN DB files on the nodes, but _k8s_ installations mostly use `CRI-O`, `containerd`, or other container runtimes, so you probably can't pull the image and run it with `docker` as shown. I will cover this and some alternatives below. -#### Finding the first node +##### Finding the first node The _Kube-OVN_ documentation directs you to pick the node running the `ovn-central` pod associated with the first IP of the `NODE_IPS` environment variable. You should find the `NODE_IPS` environment variable defined on an `ovn-central` pod or the `ovn-central` _Deployment_. Assuming you can run the `kubectl` commands, the following example gets the node IPs off of one of the the deployment: @@ -60,14 +48,13 @@ k8s-controller01 Ready control-plane 3d17h v1.28.6 10.130.140.246 root@k8s-controller01:~# ``` - -#### Trying to create a pod for `ovsdb-tool` +##### Trying to create a pod for `ovsdb-tool` As an alternative to `docker run` since your _k8s_ cluster probably doesn't use _Docker_ itself, you can **possibly** try to create a pod instead of running a container directly, but you should **try it before scaling your _OVN_ replicas down to 0**, as not having `ovn-central` available should interfere with pod creation. The broken `ovn-central` might still prevent _k8s_ from creating the pod even if you haven't scaled your replicas down, however. **Read below the pod manifest for edits you may need to make** -``` +``` yaml apiVersion: v1 kind: Pod metadata: @@ -115,15 +102,15 @@ To reiterate, if you reached this step, this pod creation may not work because o If creating this pod worked, **scale your replicas to 0**, use `ovsdb-tool` to make the files you will use for restore (both north and south DB), then jump to _Full Recovery_ as described below here and in the _Kube-OVN_ documentation. -#### `ovsdb-tool` from your Linux distribution's packaging system +##### `ovsdb-tool` from your Linux distribution's packaging system As an alternative to the `docker run`, which may not work on your cluster, and the pod creation, which may not work because of your broken OVN, if you still want to try to use the OVN DB files on your _k8s_ nodes instead of going to one of your snapshot backups, you can try to install your distribution's package with the `ovsdb-tool`, `openvswitch-common` on Ubuntu, although you risk (and will probably have) a slight version mismatch with the OVS version within your normal `ovn-central` pods. OVSDB has a stable format and this likely will not cause any problems, although you should probably restore a previously saved snapshot in preference to using an `ovsdb-tool` with a slightly mismatched version, but you may consider using the mismatch version if you don't have other options. -#### Conclusion of using the OVN DB files on your _k8s_ nodes +##### Conclusion of using the OVN DB files on your _k8s_ nodes The entire section on using the OVN DB files from your nodes just gives you an alternative way to a planned snapshot backup to try to get something to restore the database from. From here forward, the directions converge with full recovery as described below and in the full _Kube-OVN_ documentation. -### Full recovery +#### Full recovery You start here when you have north database and south database files you want to use to run your recovery, whether you retrieved it from one of your _k8s_ nodes as described above, or got it from one of your snapshots. Technically, the south database should get rebuilt with only the north database, but if you have the two that go together, you can save the time it would take for a full rebuild by also restoring the south DB. It also avoids relying on the ability to rebuild the south DB in case something goes wrong.