Skip to content

Commit

Permalink
Merge pull request #1095 from stackhpc/2023.1-zed-merge
Browse files Browse the repository at this point in the history
2023.1: zed merge
  • Loading branch information
markgoddard authored Jun 11, 2024
2 parents 9549f1b + a826dca commit 40a1526
Show file tree
Hide file tree
Showing 14 changed files with 123 additions and 7 deletions.
18 changes: 15 additions & 3 deletions doc/source/configuration/monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ depending on your configuration, you may need set the
``kolla_enable_prometheus_ceph_mgr_exporter`` variable to ``true`` in order to
enable the ceph mgr exporter.

.. _os-capacity:

OpenStack Capacity
==================

Expand All @@ -149,9 +151,19 @@ project domain name in ``stackhpc-monitoring.yml``:
stackhpc_os_capacity_openstack_region_name: <openstack_region_name>
Additionally, you should ensure these credentials have the correct permissions
for the exporter. If you are deploying in a cloud with internal TLS, you may be required
to disable certificate verification for the OpenStack Capacity exporter
if your certificate is not signed by a trusted CA.
for the exporter.

If you are deploying in a cloud with internal TLS, you may be required
to provide a CA certificate for the OpenStack Capacity exporter if your
certificate is not signed by a trusted CA. For example, to use a CA certificate
named ``vault.crt`` that is also added to the Kolla containers:

.. code-block:: yaml
stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"
Alternatively, to disable certificate verification for the OpenStack Capacity
exporter:

.. code-block:: yaml
Expand Down
27 changes: 27 additions & 0 deletions doc/source/configuration/release-train.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,33 @@ By default, HashiCorp images (Consul and Vault) are not synced from Docker Hub
to the local Pulp. To sync these images, set ``stackhpc_sync_hashicorp_images``
to ``true``.

Custom container images
-----------------------

A custom list of container images can be synced to the local Pulp using the
``stackhpc_pulp_repository_container_repos_extra`` and
``stackhpc_pulp_distribution_container_extra`` variables.

.. code-block:: yaml
# List of extra container image repositories.
stackhpc_pulp_repository_container_repos_extra:
- name: "certbot/certbot"
url: "https://registry-1.docker.io"
policy: on_demand
proxy_url: "{{ pulp_proxy_url }}"
state: present
include_tags: "nightly"
required: True
# List of extra container image distributions.
stackhpc_pulp_distribution_container_extra:
- name: certbot
repository: certbot/certbot
base_path: certbot/certbot
state: present
required: True
Usage
=====

Expand Down
2 changes: 2 additions & 0 deletions doc/source/configuration/vault.rst
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,8 @@ Enable the required TLS variables in kayobe and kolla
# Whether TLS is enabled for the internal API endpoints. Default is 'no'.
kolla_enable_tls_internal: yes
See :ref:`os-capacity` for information on adding CA certificates to the trust store when deploying the OpenStack Capacity exporter.

3. Set the following in etc/kayobe/kolla/globals.yml or if environments are being used etc/kayobe/environments/$KAYOBE_ENVIRONMENT/kolla/globals.yml

.. code-block::
Expand Down
2 changes: 1 addition & 1 deletion doc/source/operations/secret-rotation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ process easier.

This was previously mitigated with a change to the StackHPC fork of
Kolla-Ansible, which has since been reverted due to an unforeseen issue. See
`here <https://github.com/stackhpc/kolla-ansible/pull/503>` for more
`here <https://github.com/stackhpc/kolla-ansible/pull/503>`__ for more
details.

#. A change to Nova, to automate :ref:`this<nova-change>` step to change the
Expand Down
12 changes: 12 additions & 0 deletions etc/kayobe/ansible/deploy-os-capacity-exporter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
delegate_to: localhost
register: credential
when: stackhpc_enable_os_capacity
changed_when: false

- name: Set facts for admin credentials
ansible.builtin.set_fact:
Expand All @@ -43,6 +44,16 @@
src: templates/os_capacity-clouds.yml.j2
dest: /opt/kayobe/os-capacity/clouds.yaml
when: stackhpc_enable_os_capacity
register: clouds_yaml_result

- name: Copy CA certificate to OpenStack Capacity nodes
ansible.builtin.copy:
src: "{{ stackhpc_os_capacity_openstack_cacert }}"
dest: /opt/kayobe/os-capacity/cacert.pem
when:
- stackhpc_enable_os_capacity
- stackhpc_os_capacity_openstack_cacert | length > 0
register: cacert_result

- name: Ensure os_capacity container is running
community.docker.docker_container:
Expand All @@ -56,6 +67,7 @@
source: /opt/kayobe/os-capacity/
target: /etc/openstack/
network_mode: host
restart: "{{ clouds_yaml_result is changed or cacert_result is changed }}"
restart_policy: unless-stopped
become: true
when: stackhpc_enable_os_capacity
3 changes: 3 additions & 0 deletions etc/kayobe/ansible/templates/os_capacity-clouds.yml.j2
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ clouds:
interface: "internal"
identity_api_version: 3
auth_type: "password"
{% if stackhpc_os_capacity_openstack_cacert | length > 0 %}
cacert: /etc/openstack/cacert.pem
{% endif %}
{% if not stackhpc_os_capacity_openstack_verify | bool %}
verify: False
{% endif %}
3 changes: 3 additions & 0 deletions etc/kayobe/environments/ci-multinode/stackhpc-monitoring.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
# Path to a CA certificate file to trust in the OpenStack Capacity exporter.
stackhpc_os_capacity_openstack_cacert: "{{ kayobe_env_config_path }}/kolla/certificates/ca/vault.crt"
1 change: 0 additions & 1 deletion etc/kayobe/kolla.yml
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,6 @@ kolla_build_blocks:
ARG prometheus_url=https://github.com/prometheus/prometheus/releases/download/v${prometheus_version}/prometheus-${prometheus_version}.linux-{{debian_arch}}.tar.gz
{% endraw %}
# Dict mapping image customization variable names to their values.
# Each variable takes the form:
# <image name>_<customization>_<operation>
Expand Down
18 changes: 18 additions & 0 deletions etc/kayobe/kolla/config/prometheus/system.rules
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@ groups:
summary: "Prometheus exporter at {{ $labels.instance }} reports low memory"
description: "Available memory is {{ $value }} GiB."

- alert: LowSwapSpace
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_warning_threshold_ratio }}{% raw %}
for: 1m
labels:
severity: warning
annotations:
summary: "Swap space at {{ $labels.instance }} reports low memory"
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."

- alert: LowSwapSpace
expr: (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes) < {% endraw %}{{ alertmanager_node_free_swap_critical_threshold_ratio }}{% raw %}
for: 1m
labels:
severity: critical
annotations:
summary: "Swap space at {{ $labels.instance }} reports low memory"
description: "Available swap space is {{ $value | humanizePercentage }}. Running out of swap space causes OOM Kills."

- alert: HostOomKillDetected
expr: increase(node_vmstat_oom_kill[5m]) > 0
for: 5m
Expand Down
12 changes: 10 additions & 2 deletions etc/kayobe/pulp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -652,14 +652,22 @@ stackhpc_pulp_distribution_container_hashicorp:
state: present
required: "{{ stackhpc_sync_hashicorp_images | bool }}"

# List of extra container image repositories.
stackhpc_pulp_repository_container_repos_extra: []

# List of extra container image distributions.
stackhpc_pulp_distribution_container_extra: []

# List of container image repositories.
stackhpc_pulp_repository_container_repos: >-
{{ (stackhpc_pulp_repository_container_repos_kolla +
stackhpc_pulp_repository_container_repos_ceph +
stackhpc_pulp_repository_container_repos_hashicorp) | selectattr('required') }}
stackhpc_pulp_repository_container_repos_hashicorp +
stackhpc_pulp_repository_container_repos_extra) | selectattr('required') }}
# List of container image distributions.
stackhpc_pulp_distribution_container: >-
{{ (stackhpc_pulp_distribution_container_kolla +
stackhpc_pulp_distribution_container_ceph +
stackhpc_pulp_distribution_container_hashicorp) | selectattr('required') }}
stackhpc_pulp_distribution_container_hashicorp +
stackhpc_pulp_distribution_container_extra) | selectattr('required') }}
9 changes: 9 additions & 0 deletions etc/kayobe/stackhpc-monitoring.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,12 @@ alertmanager_low_memory_threshold_gib: 5
# link. Change to false to disable this alert.
alertmanager_warn_network_bond_single_link: true

# Threshold to trigger an LowSwapSpace alert on swap space depletion (ratio).
# When the ratio of free swap space is lower than each of these values, warning
# and critical alerts will be triggered respectively.
alertmanager_node_free_swap_warning_threshold_ratio: 0.25
alertmanager_node_free_swap_critical_threshold_ratio: 0.1

###############################################################################
# Exporter configuration

Expand All @@ -20,6 +26,9 @@ alertmanager_warn_network_bond_single_link: true
# targets being templated during deployment.
stackhpc_enable_os_capacity: true

# Path to a CA certificate file to trust in the OpenStack Capacity exporter.
stackhpc_os_capacity_openstack_cacert: ""

# Whether TLS certificate verification is enabled for the OpenStack Capacity
# exporter during Keystone authentication.
stackhpc_os_capacity_openstack_verify: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
features:
- |
Added two alerts (Warning and critical) that are triggered when the ratio
of (free_swap_sppace / total_swap_space) is below thresholds.
Each threshold can be modified by alterting value of
``alertmanager_node_free_swap_warning_threshold_ratio`` and
``alertmanager_node_free_swap_critical_threshold_ratio``.
Currently this solution has limitation of having one-size fits all policy.
This can cause unwanted alerts for the hosts which utilise swap heavily
Therefore it is recommended to tune the thresholds or apply silence rules
for the needs.
4 changes: 4 additions & 0 deletions releasenotes/notes/os-capacity-cacert-8b800b22d84ae0b1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
features:
- |
Adds support for providing a CA certificate for OpenStack Capacity exporter.
6 changes: 6 additions & 0 deletions releasenotes/notes/pulp-container-extra-9379806192900d22.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
features:
- |
Allows to synchronise a custom list of containers to Pulp using the
``stackhpc_pulp_repository_container_repos_extra`` and
``stackhpc_pulp_distribution_container_extra`` variables.

0 comments on commit 40a1526

Please sign in to comment.