diff --git a/docs/configuration/uds-monitoring-metrics.md b/docs/configuration/uds-monitoring-metrics.md index bad32a6fa..909d5ad5f 100644 --- a/docs/configuration/uds-monitoring-metrics.md +++ b/docs/configuration/uds-monitoring-metrics.md @@ -6,15 +6,17 @@ weight: 1 UDS Core leverages Pepr to handle setup of Prometheus scraping metrics endpoints, with the particular configuration necessary to work in a STRICT mTLS (Istio) environment. We handle this via a default scrapeClass in prometheus to add the istio certs. When a monitor needs to be exempt from that tlsConfig a mutation is performed to leverage a plain scrape class without istio certs. -## Mutations +## TLS Configuration Setup -Note: The below implementation has been deprecated in favor of a default `scrapeClass` with the file-based `tlsConfig` required for istio mTLS in prometheus automatically, supplemented with a mutation of `scrapeClass: exempt` that exempts monitors from the `tlsConfig` required for istio if the destination namespace is not istio injected (e.g. kube-system), unless the `uds/skip-sm-mutate` annotation is specified. The mutation behavior stated in the paragraph immediately below this section will be removed in a later release. +Generally it is beneficial to use service and pod monitor resources from existing helm charts where possible as these may have more advanced configuration and options. The UDS monitoring setup ensures that all monitoring resources use a default [`scrapeClass`](https://github.com/prometheus-operator/prometheus-operator/blob/v0.75.1/Documentation/api.md#monitoring.coreos.com/v1.ScrapeClass) configured in Prometheus to handle the necessary `tlsConfig` setup for metrics to work in STRICT Istio mTLS environments (the `scheme` is also mutated to `https` on individual monitor endpoints, see [this doc](https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings) for details). This setup is the default configuration but individual monitors can opt out of this config in 3 different ways: -All service monitors are mutated to set the scrape scheme to HTTPS and set the TLS Config to what is required for Istio mTLS scraping (see [this doc](https://istio.io/latest/docs/ops/integrations/prometheus/#tls-settings) for details). Beyond this, no other fields are mutated. Supporting existing service monitors is useful since some charts include service monitors by default with more advanced configurations, and it is in our best interest to enable those and use them where possible. +1. If the service or pod monitor targets namespaces that are not Istio injected (ex: `kube-system`), Pepr will detect this and mutate these monitors to use an `exempt` scrape class that does not have the Istio certs. Assumptions are made about STRICT mTLS here for simplicity, based on the `istio-injection` namespace label. Without making these assumptions we would need to query `PeerAuthentication` resources or another resource to determine the exact workload mTLS posture. +1. Individual monitors can explicitly set the `exempt` scrape class to opt out of the Istio certificate configuration. This should typically only be done if your service exposes metrics on a PERMISSIVE mTLS port. +1. If setting a `scrapeClass` is not an option due to lack of configuration in a helm chart, or for other reasons, monitors can use the `uds/skip-mutate` annotation (with any value) to have Pepr mutate the `exempt` scrape class onto the monitor. -Assumptions are made about STRICT mTLS here for simplicity, based on the `istio-injection` namespace label. Without making these assumptions we would need to query `PeerAuthentication` resources or another resource to determine the exact workload mTLS posture. - -Note: This mutation is the default behavior for all service monitors but can be skipped using the annotation key `uds/skip-sm-mutate` (with any value). Skipping this mutation should only be done if your service exposes metrics on a PERMISSIVE mTLS port. +{{% alert-note %}} +There is a deprecated functionality in Pepr that will mutate `tlsConfig` onto individual service monitors, rather than using the scrape class approach. This has been kept in the current code temporarily to prevent any metrics downtime during the switch to `scrapeClass`. In a future release this behavior will be removed to reduce the complexity of the setup and required mutations. +{{% /alert-note %}} ## Package CR `monitor` field @@ -24,6 +26,7 @@ UDS Core also supports generating `ServiceMonitors` and/or `PodMonitors` from th ... spec: monitor: + # Example Service Monitor - selector: # Selector for the service to monitor app: foobar portName: metrics # Name of the port to monitor @@ -40,17 +43,30 @@ spec: name: "example-secret" optional: false type: "Bearer" + # Example Pod Monitor + - portName: metrics # Name of the port on the pod to monitor + targetPort: 1234 # Corresponding target port on the pod/container (for network policy) + selector: # Selector for pod(s) to monitor; note: pod monitors support `podSelector` as well, both options behave the same + app: barfoo + kind: PodMonitor + # Optional properties depending on your application + description: "Metrics" # Add to customize the pod monitor name + path: "/mymetrics" # Add if metrics are exposed on a different path than "/metrics" + authorization: # Add if authorization is required for the metrics endpoint + credentials: + key: "example-key" + name: "example-secret" + optional: false + type: "Bearer" ``` -This config is used to generate service monitors and corresponding network policies to setup scraping for your applications. The `ServiceMonitor`s will go through the mutation process to add `tlsConfig` and `scheme` to work in an istio environment. - -This spec intentionally does not support all options available with a `ServiceMonitor`. While we may add additional fields in the future, we do not want to simply rebuild the `ServiceMonitor` spec since mutations are already available to handle Istio specifics. The current subset of spec options is based on the bare minimum necessary to craft resources. +This config is used to generate service or pod monitors and corresponding network policies to setup scraping for your applications. The aforementioned TLS configuration will also apply to these generated monitors, setting a default scrape class unless target namespaces are non-istio-injected. -NOTE: While this is a rather verbose spec, each of the above fields are strictly required to craft the necessary service monitor and network policy resources. +This spec intentionally does not support all options available with a `PodMonitor` or `ServiceMonitor`. While we may add additional fields in the future, we do not want to simply rebuild these specs since we are handling the complexities of Istio mTLS metrics. The current subset of spec options is based on the common needs seen in most environments. ## Notes on Alternative Approaches -In coming up with this feature a few alternative approaches were considered but not chosen due to issues with each one. The current spec provides the best balance of a simplified interface compared to the `ServiceMonitor` spec, and a faster/easier reconciliation loop. +In coming up with this feature when targeting the `ServiceMonitor` use case a few alternative approaches were considered but not chosen due to issues with each one. The current spec provides the best balance of a simplified interface compared to the `ServiceMonitor` spec, and a faster/easier reconciliation loop. ### Generation based on service lookup diff --git a/src/pepr/operator/controllers/monitoring/pod-monitor.ts b/src/pepr/operator/controllers/monitoring/pod-monitor.ts index 655015562..d3f033898 100644 --- a/src/pepr/operator/controllers/monitoring/pod-monitor.ts +++ b/src/pepr/operator/controllers/monitoring/pod-monitor.ts @@ -58,7 +58,7 @@ export function generatePodMonitor( generation: string, ownerRefs: V1OwnerReference[], ) { - const { selector, portName } = monitor; + const { selector, podSelector, portName } = monitor; const name = generateMonitorName(pkgName, monitor); const payload: PrometheusPodMonitor = { metadata: { @@ -79,7 +79,7 @@ export function generatePodMonitor( }, ], selector: { - matchLabels: selector, + matchLabels: podSelector ?? selector, }, }, }; diff --git a/src/pepr/operator/controllers/network/policies.ts b/src/pepr/operator/controllers/network/policies.ts index 4cb341fa1..8534f028f 100644 --- a/src/pepr/operator/controllers/network/policies.ts +++ b/src/pepr/operator/controllers/network/policies.ts @@ -98,13 +98,13 @@ export async function networkPolicies(pkg: UDSPackage, namespace: string) { policies.push(keycloakGeneratedPolicy); } - // Generate NetworkPolicies for any ServiceMonitors that are generated + // Generate NetworkPolicies for any monitors that are generated const monitorList = pkg.spec?.monitor ?? []; - // Iterate over each ServiceMonitor + // Iterate over each monitor for (const monitor of monitorList) { const { selector, targetPort, podSelector } = monitor; - // Create the NetworkPolicy for the ServiceMonitor + // Create the NetworkPolicy for the monitor const policy: Allow = { direction: Direction.Ingress, selector: podSelector ?? selector, @@ -113,7 +113,7 @@ export async function networkPolicies(pkg: UDSPackage, namespace: string) { app: "prometheus", }, port: targetPort, - // Use the targetPort and selector to generate a description for the ServiceMonitor derived policies + // Use the targetPort and selector to generate a description for the monitoring derived policies description: `${targetPort}-${Object.values(selector)} Metrics`, }; // Generate the policy diff --git a/src/pepr/operator/crd/index.ts b/src/pepr/operator/crd/index.ts index d92e14d2a..285c6c904 100644 --- a/src/pepr/operator/crd/index.ts +++ b/src/pepr/operator/crd/index.ts @@ -34,6 +34,7 @@ export { } from "./generated/istio/serviceentry-v1beta1"; export { + PodMetricsEndpoint as PodMonitorEndpoint, Scheme as PodMonitorScheme, PodMonitor as PrometheusPodMonitor, } from "./generated/prometheus/podmonitor-v1"; diff --git a/src/pepr/prometheus/index.ts b/src/pepr/prometheus/index.ts index 04886748f..1f69b1664 100644 --- a/src/pepr/prometheus/index.ts +++ b/src/pepr/prometheus/index.ts @@ -1,6 +1,9 @@ import { Capability, K8s, kind } from "pepr"; import { Component, setupLogger } from "../logger"; import { + PodMonitorEndpoint, + PodMonitorScheme, + PrometheusPodMonitor, PrometheusServiceMonitor, ServiceMonitorEndpoint, ServiceMonitorScheme, @@ -17,60 +20,81 @@ export const prometheus = new Capability({ const { When } = prometheus; /** - * Mutate a service monitor to enable mTLS metrics + * Mutate a service monitor to exclude it from mTLS metrics with `exempt` scrapeClass */ When(PrometheusServiceMonitor) .IsCreatedOrUpdated() .Mutate(async sm => { - // Provide an opt-out of mutation to handle complicated scenarios - if (sm.Raw.metadata?.annotations?.["uds/skip-sm-mutate"]) { + if (sm.Raw.spec === undefined) { + return; + } + + // Add an exempt scrape class if explicitly opted out via annotation OR targeting a non-istio-injected namespace + if ( + sm.Raw.metadata?.annotations?.["uds/skip-mutate"] || + sm.Raw.metadata?.annotations?.["uds/skip-sm-mutate"] || + !(await isIstioInjected(sm)) + ) { log.info( `Mutating scrapeClass to exempt ServiceMonitor ${sm.Raw.metadata?.name} from default scrapeClass mTLS config`, ); - if (sm.Raw.spec === undefined) { - return; - } sm.Raw.spec.scrapeClass = "exempt"; return; - } - - // This assumes istio-injection == strict mTLS due to complexity around mTLS lookup - if (await isIstioInjected(sm)) { - if (sm.Raw.spec?.endpoints === undefined) { - return; - } - /** - * Patching ServiceMonitor tlsConfig is deprecated in favor of default scrapeClass with tls config - * this mutation will be removed in favor of a mutation to opt-out of the default scrapeClass in the future - */ + } else { log.info(`Patching service monitor ${sm.Raw.metadata?.name} for mTLS metrics`); + // Note: this tlsConfig patch is deprecated in favor of a default scrape class for both service and pod monitors const tlsConfig = { caFile: "/etc/prom-certs/root-cert.pem", certFile: "/etc/prom-certs/cert-chain.pem", keyFile: "/etc/prom-certs/key.pem", insecureSkipVerify: true, }; - const endpoints: ServiceMonitorEndpoint[] = sm.Raw.spec.endpoints; + const endpoints: ServiceMonitorEndpoint[] = sm.Raw.spec.endpoints || []; endpoints.forEach(endpoint => { endpoint.scheme = ServiceMonitorScheme.HTTPS; endpoint.tlsConfig = tlsConfig; }); sm.Raw.spec.endpoints = endpoints; - } else { + } + }); + +/** + * Mutate a pod monitor to exclude it from mTLS metrics with `exempt` scrapeClass + */ +When(PrometheusPodMonitor) + .IsCreatedOrUpdated() + .Mutate(async pm => { + if (pm.Raw.spec === undefined) { + return; + } + + // Add an exempt scrape class if explicitly opted out via annotation OR targeting a non-istio-injected namespace + if (pm.Raw.metadata?.annotations?.["uds/skip-mutate"] || !(await isIstioInjected(pm))) { log.info( - `Mutating scrapeClass to exempt ServiceMonitor ${sm.Raw.metadata?.name} from default scrapeClass mTLS config`, + `Mutating scrapeClass to exempt PodMonitor ${pm.Raw.metadata?.name} from default scrapeClass mTLS config`, ); - if (sm.Raw.spec === undefined) { - return; - } - sm.Raw.spec.scrapeClass = "exempt"; + pm.Raw.spec.scrapeClass = "exempt"; + return; + } else { + log.info(`Patching pod monitor ${pm.Raw.metadata?.name} for mTLS metrics`); + const endpoints: PodMonitorEndpoint[] = pm.Raw.spec.podMetricsEndpoints || []; + endpoints.forEach(endpoint => { + endpoint.scheme = PodMonitorScheme.HTTPS; + }); + pm.Raw.spec.podMetricsEndpoints = endpoints; } }); -async function isIstioInjected(sm: PrometheusServiceMonitor) { - const namespaces = sm.Raw.spec?.namespaceSelector?.matchNames || [sm.Raw.metadata?.namespace] || [ - "default", - ]; +// This assumes istio-injection == strict mTLS due to complexity around mTLS lookup +async function isIstioInjected(monitor: PrometheusServiceMonitor | PrometheusPodMonitor) { + // If monitor allows any namespace assume istio injection + if (monitor.Raw.spec?.namespaceSelector?.any) { + return true; + } + + const namespaces = monitor.Raw.spec?.namespaceSelector?.matchNames || [ + monitor.Raw.metadata?.namespace, + ] || ["default"]; for (const ns of namespaces) { const namespace = await K8s(kind.Namespace).Get(ns); diff --git a/src/prometheus-stack/chart/templates/istio-monitor.yaml b/src/prometheus-stack/chart/templates/istio-monitor.yaml index 1311f4658..f2871b10b 100644 --- a/src/prometheus-stack/chart/templates/istio-monitor.yaml +++ b/src/prometheus-stack/chart/templates/istio-monitor.yaml @@ -4,8 +4,9 @@ kind: PodMonitor metadata: name: envoy-stats-monitor namespace: istio-system + annotations: + uds/skip-mutate: "true" spec: - scrapeClass: exempt selector: matchExpressions: - {key: istio-prometheus-ignore, operator: DoesNotExist} diff --git a/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml b/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml index 60c3bb615..29f2827c2 100644 --- a/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml +++ b/src/prometheus-stack/chart/templates/prometheus-pod-monitor.yaml @@ -4,8 +4,9 @@ kind: PodMonitor metadata: name: prometheus-pod-monitor namespace: monitoring + annotations: + uds/skip-mutate: "true" spec: - scrapeClass: exempt selector: matchLabels: app: prometheus diff --git a/src/test/app-tenant.yaml b/src/test/app-tenant.yaml index a16e89349..3eb203b99 100644 --- a/src/test/app-tenant.yaml +++ b/src/test/app-tenant.yaml @@ -3,43 +3,6 @@ kind: Namespace metadata: name: test-tenant-app --- -apiVersion: v1 -kind: Secret -metadata: - name: example-secret - namespace: test-tenant-app -type: Opaque -data: - example-key: ZXhhbXBsZS1rZXk= ---- -apiVersion: monitoring.coreos.com/v1 -kind: PodMonitor -metadata: - name: httpbin-pod-monitor-default-scrape - namespace: test-tenant-app -spec: - podMetricsEndpoints: - - path: /metrics - port: service - scrapeClass: istio-certs - selector: - matchLabels: - app: httpbin ---- -apiVersion: monitoring.coreos.com/v1 -kind: PodMonitor -metadata: - name: httpbin-pod-monitor-no-tls-config - namespace: test-tenant-app -spec: - podMetricsEndpoints: - - path: /metrics - port: service - scrapeClass: exempt - selector: - matchLabels: - app: httpbin ---- apiVersion: uds.dev/v1alpha1 kind: Package metadata: @@ -60,47 +23,6 @@ spec: gateway: tenant host: demo-8081 port: 8081 - monitor: - - selector: - app: httpbin - targetPort: 3000 - portName: service - description: Pod Monitor - kind: PodMonitor - - selector: - app: httpbin - targetPort: 3000 - portName: service - description: Service Monitor Explicit - kind: ServiceMonitor - - selector: - app: httpbin - targetPort: 3000 - portName: service - description: Service Monitor Default - - portName: "http" - selector: - app: "example" - targetPort: 8080 - kind: "PodMonitor" - authorization: - credentials: - key: "example-key" - name: "example-secret" - optional: false - type: "Bearer" - description: Pod Monitor with Authorization - - portName: "http" - selector: - app: "example" - targetPort: 8080 - authorization: - credentials: - key: "example-key" - name: "example-secret" - optional: false - type: "Bearer" - description: Service Monitor with Authorization --- apiVersion: v1 kind: Service diff --git a/src/test/chart/Chart.yaml b/src/test/chart/Chart.yaml index 5cabd1306..288986028 100644 --- a/src/test/chart/Chart.yaml +++ b/src/test/chart/Chart.yaml @@ -1,5 +1,5 @@ apiVersion: v2 -name: exempted-app +name: uds-podinfo-config description: A Helm chart for testing an exempted-app type: application version: 0.1.0 diff --git a/src/test/chart/templates/package.yaml b/src/test/chart/templates/package.yaml new file mode 100644 index 000000000..1bb8d8e76 --- /dev/null +++ b/src/test/chart/templates/package.yaml @@ -0,0 +1,27 @@ +apiVersion: uds.dev/v1alpha1 +kind: Package +metadata: + name: podinfo + namespace: podinfo +spec: + monitor: + - selector: + app.kubernetes.io/name: podinfo + targetPort: 9898 + portName: http + description: "podmonitor" + kind: PodMonitor + - selector: + app.kubernetes.io/name: podinfo + targetPort: 9898 + portName: http + description: "svcmonitor" + kind: ServiceMonitor + network: + expose: + - service: podinfo + selector: + app.kubernetes.io/name: podinfo + gateway: tenant + host: podinfo + port: 9898 diff --git a/src/test/tasks.yaml b/src/test/tasks.yaml index 18bd46153..385728f6a 100644 --- a/src/test/tasks.yaml +++ b/src/test/tasks.yaml @@ -88,6 +88,7 @@ tasks: exit 1 ;; esac + - description: Verify podinfo is healthy wait: cluster: @@ -96,5 +97,27 @@ tasks: namespace: podinfo condition: Ready + - description: Verify podinfo package CR is ready + wait: + cluster: + kind: Package + name: podinfo + namespace: podinfo + condition: "'{.status.phase}'=Ready" + + - description: Verify podinfo podmonitor exists + wait: + cluster: + kind: PodMonitor + name: podinfo-podmonitor + namespace: podinfo + + - description: Validate podinfo servicemonitor exists + wait: + cluster: + kind: ServiceMonitor + name: podinfo-svcmonitor + namespace: podinfo + - description: Remove the test resources cmd: "uds zarf package remove build/zarf-package-uds-core-test-apps-*.zst --confirm --no-progress" diff --git a/src/test/zarf.yaml b/src/test/zarf.yaml index e38657758..b98f98bf7 100644 --- a/src/test/zarf.yaml +++ b/src/test/zarf.yaml @@ -24,8 +24,8 @@ components: - name: podinfo required: true charts: - - name: exempted-app - namespace: exempted-app + - name: uds-podinfo-config + namespace: podinfo localPath: ./chart version: 0.1.0 - name: podinfo