Skip to content

Commit

Permalink
[deploy] Fix helm config and maintenance scripts (#3404)
Browse files Browse the repository at this point in the history
Update deploy_qa.yml workflow's step for removing old images
Restore pullSecretName in deploy/helm/thecombine/values.yaml
Remove one-shot maintenance script db_update_audio_type.py
Restore python3 shebang in maintenance scripts
Update kubernetes documentation, including prep for OpenTelemetry
  • Loading branch information
imnasnainaec authored Oct 21, 2024
1 parent 6a9de6e commit 4078438
Show file tree
Hide file tree
Showing 16 changed files with 67 additions and 92 deletions.
16 changes: 12 additions & 4 deletions .github/workflows/deploy_qa.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,6 @@ jobs:
build_component: ${{ matrix.component }}
clean_ecr_repo:
needs: build
env:
RM_PATTERN_1: \d+\.\d+\.\d+-master\.\d+
RM_PATTERN_2: \d+\.\d+\.\d+-[a-z]+\.\d+-master\.\d+
runs-on: ubuntu-latest
steps:
# See https://docs.stepsecurity.io/harden-runner/getting-started/ for instructions on
Expand All @@ -89,7 +86,18 @@ jobs:
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_DEFAULT_REGION }}
- name: Remove old AWS ECR images
run: scripts/clean_aws_repo.py combine_frontend combine_backend combine_maint combine_database --keep ${{ needs.build.outputs.image_tag }} --remove "${{ env.RM_PATTERN_1 }}" "${{ env.RM_PATTERN_2 }}" --verbose
# Remove all images for previous version numbers.
# Example: for tag beginning with v1.2.5-, remove all images with tag v1.2.4-*
# Example: for tag beginning with v2.4.0-, remove all images with tag v2.3.*
run: |
TAG=${{ needs.build.outputs.image_tag }}
if [[ $TAG =~ ^v([0-9]+)\.([0-9]+)\.([0-9]+)-.* ]]; then
VA=${BASH_REMATCH[1]}; VB=${BASH_REMATCH[2]}; VC=${BASH_REMATCH[3]}
if [[ $VC > 0 ]]; then REM="v${VA}\.${VB}\.$((VC - 1))-.*"
elif [[ $VB > 0 ]]; then REM="v${VA}\.$((VB - 1))\..*"
else REM="v$((VA - 1))\..*"; fi
scripts/clean_aws_repo.py combine_frontend combine_backend combine_maint combine_database --remove "${REM}" --verbose
fi
deploy_update:
needs: build
# Only push to the QA server when built on the master branch
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ scripts/*.js
!scripts/createBackendLicenses.js
!scripts/printVersion.js
!scripts/setRelease.js

setup_cluster.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ metadata:
"helm.sh/hook": post-install, post-upgrade
"helm.sh/hook-delete-policy": before-hook-creation
spec:
ttlSecondsAfterFinished: 300
# keep completed jobs for 24 hrs so that logs are
# available in case of issues
ttlSecondsAfterFinished: 86400
template:
metadata:
creationTimestamp: null
Expand Down
2 changes: 1 addition & 1 deletion deploy/helm/thecombine/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ global:
# Define the image registry to use (may be blank for local images)
imageRegistry: awsEcr
imageTag: "latest"
pullSecretName: "None"
pullSecretName: aws-login-credentials
# Update strategy should be "Recreate" or "Rolling Update"
updateStrategy: Recreate

Expand Down
2 changes: 1 addition & 1 deletion deploy/scripts/helm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def create_secrets(

def get_installed_charts(helm_cmd: List[str], helm_namespace: str) -> List[str]:
"""Create a list of the helm charts that are already installed on the target."""
lookup_results = run_cmd(helm_cmd + ["list", "-n", helm_namespace, "-o", "yaml"])
lookup_results = run_cmd(helm_cmd + ["list", "-a", "-n", helm_namespace, "-o", "yaml"])
chart_info: List[Dict[str, str]] = yaml.safe_load(lookup_results.stdout)
chart_list: List[str] = []
for chart in chart_info:
Expand Down
2 changes: 2 additions & 0 deletions deploy/scripts/setup_combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,10 @@ def main() -> None:
chart_namespace = config["charts"][chart]["namespace"]
logging.debug(f"Namespace: {chart_namespace}")
if add_namespace(chart_namespace, kube_env.get_kubectl_cmd()):
logging.debug(f"Namespace '{chart_namespace}' created")
installed_charts: List[str] = []
else:
logging.debug(f"Namespace '{chart_namespace}' already exists")
# Get list of charts in target namespace
installed_charts = get_installed_charts(helm_cmd, chart_namespace)
logging.debug(f"Installed charts: {installed_charts}")
Expand Down
44 changes: 32 additions & 12 deletions docs/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ separate organization. The characteristics of these systems are:
- the namespace `thecombine` is created
- the TLS certificate for the server is installed in `thecombine` namespace as a `kubernetes.io/tls` secret with the
name `thecombine-app-tls`
- PersistentVolumeClaims for `backend-data`, `database-data`, and `font-data`

- The QA server has services to login to a private AWS Elastic Container Registry to run private images for _The
Combine_. In contrast, the Production server only runs public images.
Expand Down Expand Up @@ -284,10 +285,25 @@ setup automatically by the Ansible playbook run in the previous section.

For the Production or QA server,

1. login to the Rancher Dashboard for the Production (or QA) server. You need to have an account on the server that was
1. Login to the Rancher Dashboard for the Production (or QA) server. You need to have an account on the server that was
created by the operations group.
2. Copy your `kubectl` configuration to the clipboard and paste it into a file on your host machine, e.g.
`${HOME}/.kube/prod/config` for the production server.
3. Check that the PVCs are annotated and labeled:
- Get the full list of `<pvc>`s with `kubectl [--context <context>] -n thecombine get pvc`
- Check the content of a `<pvc>` with `kubectl [--context <context>] -n thecombine get pvc <pvc> -o yaml`
- For all of them, make sure that `metadata:` includes the following lines:
```
annotations:
meta.helm.sh/release-name: thecombine
meta.helm.sh/release-namespace: thecombine
```
and
```
labels:
app.kubernetes.io/managed-by: Helm
```
- You can edit a `<pvc>` with `kubectl [--context <context>] -n thecombine edit pvc <pvc>`
### Setup Environment
Expand All @@ -308,6 +324,7 @@ deployments (NUC):
- COMBINE_CAPTCHA_SECRET_KEY
- COMBINE_SMTP_USERNAME
- COMBINE_SMTP_PASSWORD
- HONEYCOMB_API_KEY
You may also set the KUBECONFIG environment variable to the location of the `kubectl` configuration file. This is not
necessary if the configuration file is at `${HOME}/.kube/config`.
Expand Down Expand Up @@ -343,7 +360,8 @@ If using the Docker image,

## Install Helm Charts Required by _The Combine_

This step sets up the NGINX Ingress Controller and the Certificate Manager, [cert-manager.io](https://cert-manager.io/).
This step sets up the NGINX Ingress Controller, the Certificate Manager ([cert-manager.io](https://cert-manager.io/)),
and the OpenTelemetry analytics collector.

If using the Docker image, [open the Docker image terminal](#open-docker-image-terminal) and run:

Expand All @@ -358,6 +376,10 @@ cd <COMBINE>/deploy/scripts
./setup_cluster.py
```

Note: This script is not used for the QA/Production deployments. If you need to do a completely fresh install for either
of those, you can see all the cluster setup steps by executing `setup_cluster.py` with
`--type development --debug 2> setup_cluster.log`.

## Install _The Combine_

This step installs _The Combine_ application itself.
Expand Down Expand Up @@ -397,19 +419,13 @@ Notes:

### Maintenance Scripts for Kubernetes

There are several maintenance scripts that can be run in the kubernetes cluster:

- `combine-backup-job.sh` - performs a backup of _The Combine_ database and backend files, pushes the backup to AWS S3
storage and then removes old backups keeping the latest 3 backups.
- `combine_backup.py` - just performs the backup and pushes the result to AWS S3 storage.
- `combine-clean-aws.py` - removes the oldest backups, keeping up to `max_backups`. The default for `max_backups` is 3.
- `combine_restore.py` - restores _The Combine_ database and backend files from one of the backups in AWS S3 storage.
There are several maintenance scripts that can be run in the kubernetes cluster; they are listed in
[./kubernetes_design/README.md#combine_maint-image](./kubernetes_design/README.md#combine_maint-image).

The `combine-backup-job.sh` is currently being run daily on _The Combine_ QA and Production servers as a Kubernetes
CronJob.

In addition to the daily backup, any of the scripts can be run on-demand using the `kubectl` command. Using the
`kubectl` command takes the form:
In addition to the daily backup, any of the scripts can be run on-demand using the `kubectl` command as follows:

```bash
kubectl [--kubeconfig=<path-to-kubernetes-file>] [-n thecombine] exec -it deployment/maintenance -- <maintenance script> <script options>
Expand All @@ -429,7 +445,7 @@ Notes:
kubectl [--kubeconfig=<path-to-kubernetes-file>] [-n thecombine] exec -it deployment/maintenance -- <maintenance scripts> --help
```

The only exception is `combine-backup-job.sh` which does not have any script options.
The exception is `combine-backup-job.sh` which does not have any script options.

- The `-n thecombine` option is not required if you set `thecombine` as the default namespace for your kubeconfig file
by running:
Expand All @@ -438,6 +454,10 @@ Notes:
kubectl config set-context --current --namespace=thecombine
```

- The `maintenance/scripts/*.py` scripts begin with `#!/usr/bin/env python3` so that they can be run directly in the
`maintenance` deployment. If you need to execute one of them in a Python virtual environment `(venv)`, precede the
script name with `python`.

### Checking Certificate Expiration

The `check_cert.py` will print the expiration timestamp for _The Combine's_ TLS certificate.
Expand Down
14 changes: 8 additions & 6 deletions docs/deploy/kubernetes_design/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,15 @@ features:
`sillsdev/aws-kubectl`)
- _Python 3_ - the maintenance scripts included in `combine_maint` are written in _Python_
- A collection of Maintenance Scripts:
- `add_user_to_proj.py` - add specified user to specified project (as harvester if role not specified)
- `combine_backup.py` - create a compressed tarball of the backend files and database contents and push it to AWS S3
storage
- `combine_restore.py` - pull a backup from AWS S3 storage and replace the database and backend files with the
contents of the backup
- `combine-clean-aws.sh` - a `bash` script to cleanup old backups from AWS S3 storage
- `combine-backup-job.sh` - a `bash` script to run the backup and then cleanup the S3 storage
- `combine-clean-aws.sh` - a `bash` script to cleanup old backups from AWS S3 storage
- `monitor.py` - monitor a set of TLS secrets for updates; when a secret is updated, it is pushed to AWS S3 storage
- `rm_project.py` - remove specified project(s) from database and all associated entries and files
- `update_cert.py` - a script to be used by the cert proxy clients on the NUCs. `update_cert.py` will update a TLS
certificate if the NUC is connected to the internet and if the certificate is ready for renewal. If these conditions
are met, it will update the certificate from AWS S3 storage.
Expand All @@ -120,7 +122,7 @@ resources are installed or reconfigured, the following jobs are created:

- `ecr-cred-helper` is a one-time Job that is run to create the `aws-login-credentials` Secret. The Secret type is
`kubernetes.io/dockerconfigjson` and can be used by the deployments to pull the required images from AWS ECR.
- `ecr-cred-helper-cron` refreshes the `aws-logon-credentials` periodically. The current configuration refreshes them
- `ecr-cred-helper-cron` refreshes the `aws-login-credentials` periodically. The current configuration refreshes them
every 8 hours.

The reason that both a one-time Job and a CronJob is so that when the cluster is first created, the pull secrets are
Expand All @@ -134,10 +136,10 @@ The following diagram shows the Kubernetes resources used to create the image pu

### Additional AWS Login Resources

| Resource | Kind | Description |
| ------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| aws-ecr-config | ConfigMap | `aws-ecr-config` defines the runtime configuration for AWS ECR logins. |
| aws-ecr-credentials | Secret | `aws-ecr-credentials` defines the access accounts and credentials to log in to the AWS ECR service. Note that these credentials may be different than the `aws-s3-credentials` |
| Resource | Kind | Description |
| ------------------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| aws-ecr-config | ConfigMap | Defines the runtime configuration for AWS ECR logins. |
| aws-ecr-credentials | Secret | Defines the access accounts and credentials to log in to the AWS ECR service. Note that these may be different than the `aws-s3-credentials`. |

## SSL Termination

Expand Down
2 changes: 1 addition & 1 deletion maintenance/scripts/add_user_to_proj.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""
Add user to a project.
Expand Down
2 changes: 1 addition & 1 deletion maintenance/scripts/combine_backup.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""Create a backup of TheCombine and push the file to AWS S3 service."""

import argparse
Expand Down
2 changes: 1 addition & 1 deletion maintenance/scripts/combine_restore.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""
Restore The Combine from a backup stored in the AWS S3 service.
Expand Down
59 changes: 0 additions & 59 deletions maintenance/scripts/db_update_audio_type.py

This file was deleted.

2 changes: 1 addition & 1 deletion maintenance/scripts/get_fonts.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""
Generates font support for all SIL fonts used in Mui-Language-Picker.
Expand Down
2 changes: 1 addition & 1 deletion maintenance/scripts/monitor.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""
Monitor TLS secrets for changes and push changes to AWS S3.
Expand Down
2 changes: 1 addition & 1 deletion maintenance/scripts/rm_project.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""
Remove a project and its associated data from TheCombine.
Expand Down
2 changes: 1 addition & 1 deletion maintenance/scripts/update_cert.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
"""
Check the expiration time of the TLS secret and update if needed.
Expand Down

0 comments on commit 4078438

Please sign in to comment.