Skip to content

Commit

Permalink
docs: update remote write usage
Browse files Browse the repository at this point in the history
  • Loading branch information
swiatekm committed Sep 22, 2023
1 parent f4679bd commit 1e3303e
Show file tree
Hide file tree
Showing 4 changed files with 58 additions and 176 deletions.
93 changes: 5 additions & 88 deletions docs/collecting-application-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ There are two major sections:

## Scraping metrics

This section describes how to scrape metrics from your applications. Several scenarios has been covered:
This section describes how to scrape metrics from your applications. The following scenarios are covered:

- [Application metrics are exposed (one endpoint scenario)](#application-metrics-are-exposed-one-endpoint-scenario)
- [Application metrics are exposed (multiple enpoints scenario)](#application-metrics-are-exposed-multiple-enpoints-scenario)
Expand Down Expand Up @@ -48,14 +48,6 @@ sumologic:
endpoints:
- port: "<port name or number>"
path: <metrics path>
relabelings:
## Sets _sumo_forward_ label to true
- sourceLabels: [__name__]
separator: ;
regex: (.*)
targetLabel: _sumo_forward_
replacement: "true"
action: replace
namespaceSelector:
matchNames:
- <namespace>
Expand All @@ -67,35 +59,8 @@ sumologic:

**Note** For advanced serviceMonitor configuration, please look at the [Prometheus documentation][prometheus_service_monitors]

> **Note** If you not set `_sumo_forward_` label you will have to configure `additionalRemoteWrite`:
>
> ```yaml
> kube-prometheus-stack:
> prometheus:
> prometheusSpec:
> additionalRemoteWrite:
> ## This is required to keep default configuration. It's copy of values.yaml content
> - url: http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888/prometheus.metrics.applications.custom
> remoteTimeout: 5s
> writeRelabelConfigs:
> - action: keep
> regex: ^true$
> sourceLabels: [_sumo_forward_]
> - action: labeldrop
> regex: _sumo_forward_
> ## This is your custom remoteWrite configuration
> - url: http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888/prometheus.metrics.<custom endpoint name>
> writeRelabelConfigs:
> - action: keep
> regex: <metric1>|<metric2>|...
> sourceLabels: [__name__]
> ```
>
> We recommend using a regex validator, for example [https://regex101.com/]

[prometheus_service_monitors]:
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitor
[https://regex101.com/]: https://regex101.com/

#### Example

Expand Down Expand Up @@ -169,24 +134,8 @@ sumologic:
endpoints:
- port: some-port
path: /metrics
relabelings:
## Sets _sumo_forward_ label to true
- sourceLabels: [__name__]
separator: ;
regex: (.*)
targetLabel: _sumo_forward_
replacement: "true"
action: replace
- port: another-port
path: /custom-endpoint
relabelings:
## Sets _sumo_forward_ label to true
- sourceLabels: [__name__]
separator: ;
regex: (.*)
targetLabel: _sumo_forward_
replacement: "true"
action: replace
namespaceSelector:
matchNames:
- my-custom-app-namespace
Expand All @@ -197,8 +146,8 @@ sumologic:

### Application metrics are not exposed

In case you want to scrape metrics from application which do not expose them, you can use telegraf operator. It will scrape metrics
according to configuration and expose them on port `9273` so Prometheus will be able to scrape them.
In case you want to scrape metrics from an application which does not expose a Prometheus endpoint, you can use telegraf operator. It will
scrape metrics according to configuration and expose them on port `9273` so Prometheus will be able to scrape them.

For example to expose metrics from nginx Pod, you can use the following annotations:

Expand All @@ -214,10 +163,10 @@ annotations:
`sumologic-prometheus` defines the way telegraf operator will expose the metrics. They are going to be exposed in prometheus format on port
`9273` and `/metrics` path.

**NOTE** If you apply annotations on Pod which is subject of other object, e.g. DaemonSet, it won't take affect. In such case, the
**NOTE** If you apply annotations on Pod which is owned by another object, e.g. DaemonSet, it won't take affect. In such case, the
annotation should be added to Pod specification in DeamonSet template.

After restart, the Pod should have additional `telegraf` container.
After restart, the Pod should have an additional `telegraf` container.

To scrape and forward exposed metrics to Sumo Logic, please follow one of the following scenarios:

Expand Down Expand Up @@ -369,7 +318,6 @@ If you do not see your metrics in Sumo Logic, please check the following stages:
- [Pod is visible in Prometheus targets](#pod-is-visible-in-prometheus-targets)
- [There is no target for serviceMonitor](#there-is-no-target-for-servicemonitor)
- [Pod is not visible in target for custom serviceMonitor](#pod-is-not-visible-in-target-for-custom-servicemonitor)
- [Check if Prometheus knows how to send metrics to Sumo Logic](#check-if-prometheus-knows-how-to-send-metrics-to-sumo-logic)

### Check if metrics are in Prometheus

Expand Down Expand Up @@ -514,34 +462,3 @@ $ kubectl -n "${NAMESPACE}" describe prometheus

If you don't see Pod you are expecting to see for your serviceMonitor, but serviceMonitor is in the Prometheus targets, please verify if
`selector` and `namespaceSelector` in `additionalServiceMonitors` configuration are matching your Pod's namespace and labels.

### Check if Prometheus knows how to send metrics to Sumo Logic

If metrics are visible in Prometheus, but you cannot see them in Sumo Logic, please check if Prometheus knows how to send it to Sumo Logic
Metatada StatefulSet.

Go to the [http://localhost:8000/config](http://localhost:8000/config) and verify if your metric definition is added to any `remote_write`
section. It most likely will be covered by:

```yaml
- url: http://collection-sumologic-remote-write-proxy.sumologic.svc.cluster.local.:9888/prometheus.metrics.applications.custom
remote_timeout: 5s
write_relabel_configs:
- source_labels: [_sumo_forward_]
separator: ;
regex: ^true$
replacement: $1
action: keep
- separator: ;
regex: _sumo_forward_
replacement: $1
action: labeldrop
```

If there is no `remote_write` for your metric definition, you can add one using `additionalRemoteWrite` what has been described in
[Application metrics are exposed (multiple enpoints scenario)](#application-metrics-are-exposed-multiple-enpoints-scenario) section.

However if you can see `remote_write` which matches your metrics and metrics are in Prometheus, we recommend to look at the Prometheus,
Prometheus Operator and OpenTelemetry Metrics Collector Pod logs.

If the issue won't be solved, please create an issue or contact with our Customer Support.
28 changes: 3 additions & 25 deletions docs/collecting-kubernetes-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,16 @@ By default, we collect selected metrics from the following Kubernetes components
- `Kube State Metrics` configured with `kube-prometheus-stack.kube-state-metrics.prometheus.monitor`
- `Prometheus Node Exporter` configured with `kube-prometheus-stack.prometheus-node-exporter.prometheus.monitor`

If you want to forward additional metric from one of these services, you need to make two configuration changes:

- edit corresponding Service Monitor configuration. Service Monitor tells Prometheus which metrics it should take from the service
- ensure that the new metric is forwarded to metadata Pod, by adding new (or editing existing) Remote Write to
`kube-prometheus-stack.prometheus.prometheusSpec.additionalRemoteWrite`
If you want to forward additional metric from one of these services, you need to edit the corresponding Service Monitor definition. Service
Monitor tells Prometheus which metrics it should take from the service.

## Example

Let's consider the following example:

In addition to all metrics we send by default from CAdvisor you also want to forward `container_blkio_device_usage_total`.

You need to modify `kube-prometheus-stack.kubelet.serviceMonitor.cAdvisorMetricRelabelings` to include `container_blkio_device_usage_total`
in regex, and also to add `container_blkio_device_usage_total` to `kube-prometheus-stack.prometheus.prometheusSpec.additionalRemoteWrite`.
You need to modify `kube-prometheus-stack.kubelet.serviceMonitor.cAdvisorMetricRelabelings` to include `container_blkio_device_usage_total`.

```yaml
kube-prometheus-stack:
Expand All @@ -42,24 +38,6 @@ kube-prometheus-stack:
regex: POD
- action: labeldrop
regex: (id|name)
prometheus:
prometheusSpec:
additionalRemoteWrite:
## This is required to keep default configuration. It's copy of values.yaml content
- url: http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888/prometheus.metrics.applications.custom
remoteTimeout: 5s
writeRelabelConfigs:
- action: keep
regex: ^true$
sourceLabels: [_sumo_forward_]
- action: labeldrop
regex: _sumo_forward_
## This is your custom remoteWrite configuration
- url: http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888/prometheus.metrics.custom_kubernetes_metrics
writeRelabelConfigs:
- action: keep
regex: container_blkio_device_usage_total
sourceLabels: [__name__]
```
**Note:** You can use the method described in
Expand Down
54 changes: 20 additions & 34 deletions docs/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,28 @@

Prometheus is crucial part of the metrics pipeline. It is also a complicated and powerful tool. In Kubernetes specifically, it's also often
managed by Prometheus Operator and a set of custom resources. It's possible that you already have some part of the K8s Prometheus stack
already installed, and would like to make use of it. This document describes how to deal with all the possible cases.
installed, and would like to make use of it. This document describes how to deal with all the possible cases.

**NOTE:** In this document we assume that `${NAMESPACE}` represents namespace in which the Sumo Logic Kubernetes Collection is going to be
installed.

<!-- TOC -->

- [No Prometheus in the cluster](#no-prometheus-in-the-cluster)
- [Prometheus Operator in the cluster](#prometheus-operator-in-the-cluster)
- [Custom Resource Definition compatibility](#custom-resource-definition-compatibility)
- [Installing Sumo Logic Prometheus Operator side by side with existing Operator](#installing-sumo-logic-prometheus-operator-side-by-side-with-existing-operator)
- [Set Sumo Logic Prometheus Operator to observe installation namespace](#set-sumo-logic-prometheus-operator-to-observe-installation-namespace)
- [Using existing Operator to create Sumo Logic Prometheus instance](#using-existing-operator-to-create-sumo-logic-prometheus-instance)
- [Disable Sumo Logic Prometheus Operator](#disable-sumo-logic-prometheus-operator)
- [Prepare Sumo Logic Configuration to work with existing Operator](#prepare-sumo-logic-configuration-to-work-with-existing-operator)
- [Using existing Kube Prometheus Stack](#using-existing-kube-prometheus-stack)
- [Build Prometheus Configuration](#build-prometheus-configuration)
- [Using a load balancing proxy for Prometheus remote write](#using-a-load-balancing-proxy-for-prometheus-remote-write)
- [Horizontal Scaling (Sharding)](#horizontal-scaling-sharding)
- [Troubleshooting](#troubleshooting)
- [UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com"](#upgrade-failed-failed-to-create-resource-internal-error-occurred-failed-calling-webhook-prometheusrulemutatemonitoringcoreoscom)
- [Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Prometheus.spec)](#error-unable-to-build-kubernetes-objects-from-release-manifest-error-validating--error-validating-data-validationerrorprometheusspec)
<!-- /TOC -->
- [Prometheus](#prometheus)
- [No Prometheus in the cluster](#no-prometheus-in-the-cluster)
- [Prometheus Operator in the cluster](#prometheus-operator-in-the-cluster)
- [Custom Resource Definition compatibility](#custom-resource-definition-compatibility)
- [Installing Sumo Logic Prometheus Operator side by side with existing Operator](#installing-sumo-logic-prometheus-operator-side-by-side-with-existing-operator)
- [Set Sumo Logic Prometheus Operator to observe installation namespace](#set-sumo-logic-prometheus-operator-to-observe-installation-namespace)
- [Using existing Operator to create Sumo Logic Prometheus instance](#using-existing-operator-to-create-sumo-logic-prometheus-instance)
- [Disable Sumo Logic Prometheus Operator](#disable-sumo-logic-prometheus-operator)
- [Prepare Sumo Logic Configuration to work with existing Operator](#prepare-sumo-logic-configuration-to-work-with-existing-operator)
- [Using existing Kube Prometheus Stack](#using-existing-kube-prometheus-stack)
- [Build Prometheus Configuration](#build-prometheus-configuration)
- [Horizontal Scaling (Sharding)](#horizontal-scaling-sharding)
- [Troubleshooting](#troubleshooting)
- [UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com"](#upgrade-failed-failed-to-create-resource-internal-error-occurred-failed-calling-webhook-prometheusrulemutatemonitoringcoreoscom)
- [Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Prometheus.spec)](#error-unable-to-build-kubernetes-objects-from-release-manifest-error-validating--error-validating-data-validationerrorprometheusspec)

## No Prometheus in the cluster

Expand Down Expand Up @@ -235,21 +234,19 @@ are correctly added to your Kube Prometheus Stack configuration:

- ServiceMonitors configuration:

- `sumologic.metrics.ServiceMonitors` to `prometheus.additionalServiceMonitors`
- `sumologic.metrics.ServiceMonitors` and `sumologic.metrics.additionalServiceMonitors` to `prometheus.additionalServiceMonitors`

- RemoteWrite configuration:

- `kube-prometheus-stack.prometheus.prometheusSpec.remoteWrite` to `prometheus.prometheusSpec.remoteWrite` or
`prometheus.prometheusSpec.additionalRemoteWrite`
- `kube-prometheus-stack.prometheus.prometheusSpec.additionalRemoteWrite` to `prometheus.prometheusSpec.remoteWrite` or
`prometheus.prometheusSpec.additionalRemoteWrite`

**Note:** `kube-prometheus-stack.prometheus.prometheusSpec.remoteWrite` and
`kube-prometheus-stack.prometheus.prometheusSpec.additionalRemoteWrite` are being use to generate list of endpoints in Metadata Pod, so
ensure that:

- they are always in sync with the current configuration and endpoints starts with.
- url is always starting with `http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888`
- url always starts with `http://$(METADATA_METRICS_SVC).$(NAMESPACE).svc.cluster.local.:9888`

Alternatively, you can list endpoints in `metadata.metrics.config.additionalEndpoints`:

Expand All @@ -258,8 +255,7 @@ are correctly added to your Kube Prometheus Stack configuration:
metrics:
config:
additionalEndpoints:
- /prometheus.metrics.state
- /prometheus.metrics.controller-manager
- /prometheus.metrics
# - ...
```

Expand All @@ -285,7 +281,7 @@ are correctly added to your Kube Prometheus Stack configuration:
value: ${METADATA}
```

where `${METADATA}` is content of `metadataMetrics` key from `sumologic-configmap` Config Map within `${NAMESPACE}`:
where `${METADATA}` is content of `metadataMetrics` key from `sumologic-configmap` ConfigMap within `${NAMESPACE}`:

```yaml
apiVersion: v1
Expand Down Expand Up @@ -338,20 +334,10 @@ prometheus:
# values copied from kube-prometheus-stack.prometheus.prometheusSpec.containers
additionalRemoteWrite:
# values copied from kube-prometheus-stack.prometheus.prometheusSpec.remoteWrite
# values copied from kube-prometheus-stack.prometheus.prometheusSpec.additionalRemoteWrite
```

Prometheus configuration is ready. Apply the changes on the cluster.

## Using a load balancing proxy for Prometheus remote write

In environments with a high volume of metrics (problems may start appearing around 30k samples per second), the above mitigations may not be
sufficient. It is possible to remedy the problem by sharding Prometheus itself, but that can be complicated to set up and require manual
intervention to scale.

A simpler alternative is to put a HTTP load balancer between Prometheus and the metrics metadata Service. This is enabled in `values.yaml`
via the `sumologic.metrics.remoteWriteProxy.enabled` key.

## Horizontal Scaling (Sharding)

Horizontal scaling, also known as sharding, is supported by setting up a configuration parameter which allows running several prometheus
Expand Down
Loading

0 comments on commit 1e3303e

Please sign in to comment.