-
Notifications
You must be signed in to change notification settings - Fork 7
Secrets Vault
Secrets are stored in a LHDI's HashiCorp Vault, which resides in the VA network. Secrets include credentials, tokens, and certificates for all deployment environments. Scripts and Helm configurations have been created to formalize and reproduce deployment of secrets to all LHDI environments.
Secrets for all LHDI deployment environments are stored in a single vault at https://ldx-mapi.lighthouse.va.gov/vault, which requires VA network access. Log in using OIDC with the default, empty role and you will have access to secrets engines belonging to your GitHub teams, following the security principle of least privilege.
(Context: A separate VRO Admins GitHub Team was created to limit access to secrets. By default, LHDI allows all members of VA-ABD-RRD GitHub Team to have access to a vault store, which is contrary to the principle of least privilege. There's a vault store for va-abd-rrd
but it is unused.)
VRO and partner team secrets are stored in the va-abd-rrd
secrets engine, with secrets being organized under the deploy/
folder. Subfolders for each environment are used as follows:
-
default
: provides default secrets for all environments; used for the LHDIdev
environment -
qa
,sandbox
,prod-test
,prod
: used for the respective LHDI environment and overrides any default secrets- Only differences from default secrets need to be present. As a result, there are few secrets in the
qa
environment and there is nodev
subfolder.
- Only differences from default secrets need to be present. As a result, there are few secrets in the
Under each LHDI environment folder are additional subfolders which encapsulate secrets for each application. This unique path per-environment-and-application, e.g., /data/deploy/{env}/{app}
is used to reference an application's secret path from a helm chart using the ArgoCD Vault plugin.
VRO and platform team applications are configured via the ArgoCD applications vault repository to map Vault secrets to a local key in the helm secrets template, which is then referenced when injecting environment variables into a pod in the deployment template. Using secrets.yaml
for svc-bie-kafka
as an example:
- Each application may have its own secrets configuration, identified by a unique name per-app:
metadata:
name: vro-secrets-bie-kafka-vault
- An application's secrets template uses a
secrets_env
variable, defined in the environment's yaml configuration (e.g.,dev.yaml
), to map to that specific set of secrets in Vault by path:
va-abd-rrd/data/deploy/{{ .Values.global.secrets_env }}/VRO_SECRETS_BIE_KAFKA
- Secret keys in Vault, referenced by <NAME>, are mapped to local secret keys:
bie-kafka-rbac-username: <BIE_KAFKA_RBAC_USERNAME>
bie-kafka-rbac-password: <BIE_KAFKA_RBAC_PASSWORD>
...
- Those local secret keys are then referenced in
deployment.yaml
when providing environment variables to the pod:
env:
- name: BIE_KAFKA_RBAC_USERNAME
valueFrom:
secretKeyRef:
name: vro-secrets-bie-kafka-vault
key: bie-kafka-rbac-username
- name: BIE_KAFKA_RBAC_PASSWORD
...
- Our pods then have access to environment variables containing decoded secrets, for use by the application.
Most of our applications also used a few sets of shared secrets, for example to connect to Datadog with the same credentials. This was done to limit the effort required to update these secrets when they change.
Using Datadog as an example:
- A new chart is created under
shared/datadog
, which gets referenced by another chart as a dependency. - In the new chart, a secrets template is added in the same way as above, but with the template name containing a variable based on the application name:
metadata:
name: {{ .Values.global.labels.app }}-secrets-datadog-vault
- The custom vault secrets object in ArgoCD is then referenced in a convenience template file called
vro-lhdi-libchart/templates/_datadog.tpl
as a variable:
{{- define "vro.vault.datadog.envVars" -}}
- name: DD_SITE
valueFrom:
secretKeyRef:
name: {{ .Values.global.labels.app }}-secrets-datadog-vault
key: dd-site
- name: ...
- ...which is then referenced in an app's
deployment.yaml
:
env:
{{- include "vro.vault.datadog.envVars" . | nindent 12 }}
...
Ask a VRO Admin to add, remove, or update the secret in Vault. Securely provide the secret for each LHDI environment -- minimally, one secret value for dev
and another for prod
.
- If the secret is added to an existing
VRO_SECRETS_*
group, no Helm configuration changes are needed. - If the secret is added to another group, Helm configurations should be updated to use the new secret.
With the ArgoCD Vault plugin, secrets don't seamlessly update in pods or trigger syncs in ArgoCD. A hard refresh is required.
- Symptom: a secrets update in Vault is not being reflected in the pod environment, sometimes presenting as missing environment variables or connection failures due to still using old credentials.
- Cause: ArgoCD secrets objects created with the Vault plugin maintain a cache which must be cleared. The old values will remain until they are evicted.
- Workaround: None at the moment; explorations are needed around TTL on the secrets cache as well as for other configuration items to determine if this can happen without manual intervention.
- Resolution: a hard refresh of the secrets object in ArgoCD clears its cache and prompts for syncing when it then discovers there are updated. Sync the secrets object, then restart the service.
- Additional diagnosis: Vault secrets are visible in Lens, and can also be checked to verify secrets updates have propagated.
There are circumstances where Kubernetes logs "Error: couldn't find key VRO_SECRETS_... in Secret va-abd-rrd-.../vro-secrets" -- see
Slack thread for screenshots.
This occurred because a single aggregate vro-secrets
secret was used for all VRO_SECRETS_*
groups, as that introduces issues with propagation of secret updates because containers still references the old aggregate secret:
- Symptom: Sometimes redeploying the pod works and sometimes it fails with this error.
-
Current hypothesis for this inconsistent error: If other running pods reference the
vro-secrets
secret, then old versions of it may be available and is being used by new pods. This article prompted the hypothesis. -
Workaround: Restart all old pods that reference the
vro-secrets
secret. Then start everything back up. If a restart isn't sufficient, a complete shutdown of all pods may not be necessary to remove all references to the old secret. - Additionally, marking the secret immutable may be contributing to the use of old secrets because immutable secrets aren't expected to change, so any changes (included a destroy and re-create) are not propagated. As a result, the
vro-secrets
secret is marked mutable inset-k8-secrets.sh
.
Now the vro-secrets-*
secrets are individual secrets, where an individual secret is used by one or a very small number of containers. This reduces the number of containers that need to be shut down simultaneously to release all references to the old secret. This improvement should mitigate this probably of this problem.
To set a non-secret environment variable for a container in an LHDI environment, add it to the relevant Helm chart(s) under helm/
. If the variable value is different for each environment, also add it to helm/values-for-*.yaml
files.
With that said, before adding an environment variable, please read the next section.
It is preferred to use a configuration file scoped to only the application/microservice/container (e.g., ).
An environment variable is needed when any of the following are true:
- it is a secret (username, password, token, private certificate, ...) -- use Hashicorp Vault (as described on this page)
- used by multiple containers -- set it in
helm/values*.yaml
files and reference it in Helm charts (underhelm/
) - needs to be manually changed in deployment environments -- let's discuss
We should minimize the number of unnecessary Helm configurations, which will reduce DevOps maintenance and overhead, and reduce the number of factors that can cause VRO deployments to fail.
A Vault token is needed to access vault. The automation (a self-hosted GitHub Runner) expects a the Vault token to be a Kubernetes secret named vro-vault
in the LHDI dev
environment. The token expires monthly. Run scripts/set-secret-vault-token.sh "$VAULT_TOKEN"
to set the token, where $VAULT_TOKEN
equals the string copied from the Vault web UI (click "Copy token" in the upper-right corner drop-down menu).
Kubernetes access tokens for each cluster (i.e., non-prod and prod) are needed to be able to deploy the secrets to the LHDI environments. The access tokens expire in 90 days. Run scripts/set-secret-kube-config.sh
to set the devops-kubeconfig
secret.
A GHCR secret in Kubernetes named devops-ghcr
needs to be set for LHDI to pull images. Run scripts/set-secret-ghcr.sh "$ENV" "$PAT"
for each LHDI environment, where $ENV
is dev
, qa
, etc. The $PAT
is a GitHub personal access token -- generate one using the abd-vro-machine
account. This only needs to be run once (or every time the PAT expires).
It is a centralized, secure location in the VA's network designed to hold secrets. From the Vault, secrets can be quickly and consistently redeployed to various LHDI environments in case they need to be reset or rotated.
Our GitHub Action workflow starts a self-host runner that runs within our LHDI dev
environment to pull Vault secrets and set Kubernetes secrets, all within the VA's network. This is more secure than using HashiCorp's GitHub Action which would pull Vault secrets outside the VA network and into the GitHub Action workflow environment.
The runner is a container in the vro-set-secrets-...
Kubernetes pod and can deploy secrets to any LHDI environment when initiated by the Deploy secrets from Vault GitHub Action workflow. The pod is deployed to the dev
LHDI environment (because that environment doesn't require SecRel-signed images) and can deploy secrets to other environments.
There has been unexplained occurrences where Kubernetes secrets have changed and caused problems. Making them immutable aims to reduce (but not entirely prevents) this problem.