From b82609ea2bddfe81e9915f340130657ef4c9487b Mon Sep 17 00:00:00 2001 From: Ilja Rotar Date: Wed, 12 Jun 2024 15:46:39 +0200 Subject: [PATCH 1/3] logging and monitoring --- docs/src/installation/monitoring.md | 82 ++++++++++++++++++++++++++++- 1 file changed, 81 insertions(+), 1 deletion(-) diff --git a/docs/src/installation/monitoring.md b/docs/src/installation/monitoring.md index 57f19cfb53..360a85f5c6 100644 --- a/docs/src/installation/monitoring.md +++ b/docs/src/installation/monitoring.md @@ -1,3 +1,83 @@ # Monitoring the metal-stack -We are currently working on providing the sources of our monitoring deployment for public usage. Please come back later. +## Logging + +Logs are being collected by +[Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/) and pushed +to a [Loki](https://grafana.com/docs/loki/latest/) instance running in the +control plane. Loki is deployed in +[monolithic mode](https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/) +and with storage type `'filesystem'`. You can find all logging related +configuration parameters for the control plane in the control plane's +[logging role](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/logging/README.md). + +In the partitions, Promtail is deployed inside a systemd-managed Docker +container. Configuration parameters can be found in the partition's +[promtail role](https://github.com/metal-stack/metal-roles/blob/master/partition/roles/promtail/README.md). +Which hosts Promtail collects from can be configured via the +`prometheus_promtail_targets` variable. + +## Monitoring + +For monitoring we deploy the +[kube-prometheus-stack](https://github.com/prometheus-operator/kube-prometheus) +and a [Thanos](https://thanos.io/tip/thanos/getting-started.md/) instance in the +control plane. Metrics for the control plane are supplied by + +- `metal-metrics-exporter` +- `rethindb-exporter` +- `event-exporter` +- `gardener-metrics-exporter` + +To query and visualize logs, metrics and alerts we deploy several grafana +dashboards to the control plane: + +- `grafana-dashboard-alertmanager` +- `grafana-dashboard-machine-capacity` +- `grafana-dashboard-metal-api` +- `grafana-dashboard-rethinkdb` +- `grafana-dashboard-sonic-exporter` + +and also some gardener related dashboards: + +- `grafana-dashboard-gardener-overview` +- `grafana-dashboard-shoot-cluster` +- `grafana-dashboard-shoot-customizations` +- `grafana-dashboard-shoot-details` +- `grafana-dashboard-shoot-states` + +The following `ServiceMonitors` are also deployed: + +- `gardener-metrics-exporter` +- `ipam-db` +- `masterdata-api` +- `masterdata-db` +- `metal-api` +- `metal-db` +- `rethinkdb-exporter` +- `metal-metrics-exporter` + +All monitoring related configuration parameters for the control plane can be +found in the control plane's +[monitoring role](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/monitoring/README.md). + +Partition metrics are supplied by + +- `node-exporter` +- `blackbox-exporter` +- `ipmi-exporter` +- `sonic-exporter` +- `metal-core` +- `frr-exporter` + +and scraped by Prometheus. For each of these exporters, the target hosts can be +defined by + +- `prometheus_node_exporter_targets` +- `prometheus_blackbox_exporter_targets` +- `prometheus_frr_exporter_targets` +- `prometheus_sonic_exporter_targets` +- `prometheus_metal_core_targets` +- `prometheus_frr_exporter_targets` + +## Alerting From f84209be15f6f5a440224775eede0821030dad71 Mon Sep 17 00:00:00 2001 From: Ilja Rotar Date: Thu, 13 Jun 2024 09:42:16 +0200 Subject: [PATCH 2/3] alerts --- docs/src/installation/monitoring.md | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/docs/src/installation/monitoring.md b/docs/src/installation/monitoring.md index 360a85f5c6..d7c3dbd78b 100644 --- a/docs/src/installation/monitoring.md +++ b/docs/src/installation/monitoring.md @@ -9,12 +9,13 @@ control plane. Loki is deployed in [monolithic mode](https://grafana.com/docs/loki/latest/setup/install/helm/install-monolithic/) and with storage type `'filesystem'`. You can find all logging related configuration parameters for the control plane in the control plane's -[logging role](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/logging/README.md). +[logging](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/logging/README.md) +role. In the partitions, Promtail is deployed inside a systemd-managed Docker container. Configuration parameters can be found in the partition's -[promtail role](https://github.com/metal-stack/metal-roles/blob/master/partition/roles/promtail/README.md). -Which hosts Promtail collects from can be configured via the +[promtail](https://github.com/metal-stack/metal-roles/blob/master/partition/roles/promtail/README.md) +role. Which hosts Promtail collects from can be configured via the `prometheus_promtail_targets` variable. ## Monitoring @@ -59,7 +60,8 @@ The following `ServiceMonitors` are also deployed: All monitoring related configuration parameters for the control plane can be found in the control plane's -[monitoring role](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/monitoring/README.md). +[monitoring](https://github.com/metal-stack/metal-roles/blob/master/control-plane/roles/monitoring/README.md) +role. Partition metrics are supplied by @@ -81,3 +83,12 @@ defined by - `prometheus_frr_exporter_targets` ## Alerting + +In addition to Grafana, alerts can optionally be sent to a +[Slack](https://slack.com/) channel. For this to work, at least a valid +`monitoring_slack_api_url` and a `monitoring_slack_notification_channel` must be +specified. For further configuration parameters refer to the +[monitoring](https://github.com/metal-stack/metal-roles/tree/master/control-plane/roles/monitoring) +role. Alerting rules are defined in the +[rules](https://github.com/metal-stack/metal-roles/tree/master/partition/roles/monitoring/prometheus/files/rules) +directory of the partition's prometheus role. From 14bc6a9ebc1f2647bcfe7fb043af1e1ba8e62a84 Mon Sep 17 00:00:00 2001 From: Ilja Rotar Date: Fri, 14 Jun 2024 11:44:30 +0200 Subject: [PATCH 3/3] monitoring overview --- docs/src/installation/monitoring-stack.svg | 4 ++++ docs/src/installation/monitoring.md | 4 ++++ 2 files changed, 8 insertions(+) create mode 100644 docs/src/installation/monitoring-stack.svg diff --git a/docs/src/installation/monitoring-stack.svg b/docs/src/installation/monitoring-stack.svg new file mode 100644 index 0000000000..bea2fd359e --- /dev/null +++ b/docs/src/installation/monitoring-stack.svg @@ -0,0 +1,4 @@ + + + +
Management Servers
Management Servers
Promtail
Promtail
Prometheus
Prometheus
node_exporter
node_exporter
ipmi_exporter
ipmi_exporter
blackbox_exporter
blackbox_exporter
Exporters
Exporters
Switches
Switches
Promtail
Promtail
Exporters
Exporters
node_exporter
node_exporter
sonic_exporter
sonic_exporter
blackbox_exporter
blackbox_exporter
Machines
Machines
BMC
BMC
Metal Partition
Metal Partition
GCS
GCS
shoot-states
shoot-states
shoot-details
shoot-details
shoot-customizations
shoot-customizations
shoot-cluster
shoot-cluster
gardener-overview
gardener-overview
alertmanager
alertmanager
sonic-exporter
sonic-exporter
rethinkdb
rethinkdb
metal-api
metal-api
machine-capacity
machine-capacity
Gardener Dashboards
Gardener Dashboards
Grafana Dashboards
Grafana Dashboards
Metal Control Plane
Metal Control Plane
Promtail
Promtail
filesystem
filesystem
Loki
Loki
Exporters
Exporters
gardener-metrics-exporter
gardener-metrics-exporter
metal-metrics-exporter
metal-metrics-exporter
event-exporter
event-exporter
rethinkdb-exporter
rethinkdb-exporter
ServiceMonitors
ServiceMonitors
gardener-metrics-exporter
gardener-metrics-exporter
ipam-db
ipam-db
masterdata-api
masterdata-api
masterdata-db
masterdata-db
metal-db
metal-db
rethinkdb-exporter
rethinkdb-exporter
metal-metrics-exporter
metal-metrics-exporter
metal-api
metal-api
prometheus-operator
prometheus-operator
kube-prometheus
kube-prometheus
node_exporter
node_exporter
blackbox_exporter
blackbox_exporter
prometheus-adapter
prometheus-adapter
Grafana
Grafana
kube-state-metrics
kube-state-metrics
Prometheus
Prometheus
alertmanager
alertmanager
Thanos
Thanos
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/src/installation/monitoring.md b/docs/src/installation/monitoring.md index d7c3dbd78b..5b588f5ca7 100644 --- a/docs/src/installation/monitoring.md +++ b/docs/src/installation/monitoring.md @@ -1,5 +1,9 @@ # Monitoring the metal-stack +## Overview + +![Monitoring Stack](monitoring-stack.svg) + ## Logging Logs are being collected by