Skip to content

Commit

Permalink
feat(metrics)!: use otel by default
Browse files Browse the repository at this point in the history
  • Loading branch information
swiatekm committed Sep 21, 2023
1 parent d71a1da commit 543a433
Show file tree
Hide file tree
Showing 21 changed files with 162 additions and 122 deletions.
1 change: 1 addition & 0 deletions .changelog/3284.breaking.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
feat(metrics)!: use otel by default
2 changes: 1 addition & 1 deletion deploy/helm/sumologic/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,4 @@ dependencies:
- name: opentelemetry-operator
version: 0.35.0
repository: https://open-telemetry.github.io/opentelemetry-helm-charts
condition: opentelemetry-operator.enabled
condition: opentelemetry-operator.enabled,sumologic.metrics.collector.otelcol.enabled
6 changes: 4 additions & 2 deletions deploy/helm/sumologic/README.md

Large diffs are not rendered by default.

9 changes: 5 additions & 4 deletions deploy/helm/sumologic/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -445,9 +445,8 @@ sumologic:
collector:
### Otel metrics collector. Replaces Prometheus.
## To enable, you need opentelemetry-operator enabled as well.
## Stability: Beta.
otelcol:
enabled: false
enabled: true

## Default scrape interval
scrapeInterval: 30s
Expand Down Expand Up @@ -543,7 +542,7 @@ sumologic:
## This is an advanced feature, enable only if you're experiencing performance
## issues with metrics metadata enrichment.
remoteWriteProxy:
enabled: true
enabled: false
config:
## Increase this if you've increased samples_per_send in Prometheus to prevent nginx
## from spilling proxied request bodies to disk
Expand Down Expand Up @@ -950,6 +949,7 @@ kube-prometheus-stack:
enabled: false
defaultDashboardsEnabled: false
prometheusOperator:
enabled: false
## Labels to add to the operator pod
podLabels: {}
## Annotations to add to the operator pod
Expand Down Expand Up @@ -1105,6 +1105,7 @@ kube-prometheus-stack:
regex: (?:node_load1|node_load5|node_load15|node_cpu_seconds_total|node_disk_io_time_weighted_seconds_total|node_disk_io_time_seconds_total|node_vmstat_pgpgin|node_vmstat_pgpgout|node_memory_MemFree_bytes|node_memory_Cached_bytes|node_memory_Buffers_bytes|node_memory_MemTotal_bytes|node_network_receive_drop_total|node_network_transmit_drop_total|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_filesystem_avail_bytes|node_filesystem_size_bytes)
sourceLabels: [__name__]
prometheus:
enabled: false
additionalServiceMonitors: []
prometheusSpec:
## Prometheus default scrape interval, default from upstream Kube Prometheus Stack Helm chart
Expand Down Expand Up @@ -2377,7 +2378,7 @@ tailing-sidecar-operator:
## Configure OpenTelemetry Operator - Instrumentation
## ref: https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-operator
opentelemetry-operator:
enabled: false
enabled: true

## Specific for Sumo Logic chart - Instrumentation resource creation
instrumentationJobImage:
Expand Down
44 changes: 43 additions & 1 deletion docs/v4-migration-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@
- [Important changes](#important-changes)
- [OpenTelemetry Collector](#opentelemetry-collector)
- [Drop Prometheus recording rule metrics](#drop-prometheus-recording-rule-metrics)
- [OpenTelemetry Collector for metrics collection](#opentelemetry-collector-for-metrics-collection)
- [How to upgrade](#how-to-upgrade)
- [Requirements](#requirements)
- [Metrics migration](#metrics-migration)
- [Convert Prometheus remote writes to otel metrics filters](#convert-prometheus-remote-writes-to-otel-metrics-filters)
- [Removing support for Fluent Bit and Fluentd](#removing-support-for-fluent-bit-and-fluentd)
- [Configuration Migration](#configuration-migration)
- [Switch to OTLP sources](#switch-to-otlp-sources)
Expand Down Expand Up @@ -37,6 +39,13 @@ format. Please check [Solution Overview][solution-overview] and see below for de
OpenTelemetry can't collect Prometheus recording rule metrics. The new version therefore stops collecting recording rule metrics and updates
will be made to the Kubernetes App to remove these metrics.

### OpenTelemetry Collector for metrics collection

By default, the OpenTelemetry Collector is now used for metrics collection instead of Prometheus. For the majority of use cases, this will
be a transparent change without any need for manual configuration changes. OpenTelemetry Collector will continue to collect the same default
metrics as Prometheus did previously, and will support the same mechanisms for collecting custom application metrics. Any exceptions will be
called out in the migration guide below.

## How to upgrade

### Requirements
Expand All @@ -53,7 +62,35 @@ export HELM_RELEASE_NAME=...

### Metrics migration

:construction:
If you don't have metrics collection enabled, skip straight to the [next major section](#switch-to-otlp-sources).

#### Convert Prometheus remote writes to otel metrics filters

**When?**: If you have custom remote writes defined in `kube-prometheus-stack.prometheus.additionalServiceMonitors`

When using Prometheus for metrics collection in v3, we relied on remote writes for filtering forwarded metrics. Otel, which is the default
in v4, does not support remote writes, so we've moved this functionality to Otel processors, or ServiceMonitors if it can be done there.

There are several scenarios here, depending on the exact use case:

1. You're collecting different [Kubernetes metrics][kubernetes_metrics_v3] than what the Chart provides by default. You've modified the
existing ServiceMonitor for these metrics, and added a remote write as instructed by the documentation.

You can safely delete the added remote write definition. No further action is required.

1. As above, but you're also doing some additional data transformation via relabelling rules in the remote write definition.

You'll need to either move the relabelling rules into the ServiceMonitor itself, or [add an equivalent filter
processor][otel_metrics_filter] rule to Otel.

1. You're collecting custom application metrics by adding a [`prometheus.io/scrape` annotation][application_metrics_annotation]. You don't
need to filter these metrics.

No action is needed.

1. As above, but you also have a remote write definition to filter these metrics.

You'll need to delete the remote write definition and [add an equivalent filter processor][otel_metrics_filter] rule to Otel.

### Removing support for Fluent Bit and Fluentd

Expand Down Expand Up @@ -162,3 +199,8 @@ require additional action.

Some Kubernetes objects, for example statefulsets, have a tight (63 characters) limit for their names. Because of that, we truncate the
prefix that is attached to the names. In particular, the value under key `fullnameOverride` will be truncated to 22 characters.

[application_metrics_annotation]: ./collecting-application-metrics.md#application-metrics-are-exposed-one-endpoint-scenario
[kubernetes_metrics_v3]:
https://github.com/SumoLogic/sumologic-kubernetes-collection/blob/release-v3/docs/collecting-kubernetes-metrics.md#collecting-kubernetes-metrics
[otel_metrics_filter]: ./collecting-application-metrics.md#filtering-metrics
2 changes: 2 additions & 0 deletions tests/helm/prometheus_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ func TestServiceMonitors(t *testing.T) {
ValuesYaml: `
kube-prometheus-stack:
prometheus:
enabled: true
additionalServiceMonitors:
- name: collection-sumologic-fluentd-logs-test
additionalLabels:
Expand Down Expand Up @@ -55,6 +56,7 @@ kube-prometheus-stack:
ValuesYaml: `
kube-prometheus-stack:
prometheus:
enabled: true
additionalServiceMonitors:
- name: collection-sumologic-fluentd-logs-test
additionalLabels:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +0,0 @@
sumologic:
metrics:
enabled: true
collector:
otelcol:
enabled: true
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
sumologic:
metrics:
enabled: true
collector:
otelcol:
enabled: true
scrapeInterval: 60s
autoscaling:
enabled: true
Expand Down
6 changes: 6 additions & 0 deletions tests/helm/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ import (
"github.com/gruntwork-io/go-commons/files"
"github.com/gruntwork-io/terratest/modules/helm"
"github.com/gruntwork-io/terratest/modules/logger"
otoperator "github.com/open-telemetry/opentelemetry-operator/apis/v1alpha1"
prometheus "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1"
"github.com/stretchr/testify/require"
"gopkg.in/yaml.v3"
Expand Down Expand Up @@ -123,6 +124,11 @@ func UnmarshalMultipleK8sObjectsFromYaml(yamlDocs string) (objects []runtime.Obj
return objects, err
}

err = otoperator.AddToScheme(scheme)
if err != nil {
return objects, err
}

decoder := serializer.NewCodecFactory(scheme).UniversalDeserializer()
multidocReader := utilyaml.NewYAMLReader(bufio.NewReader(bytes.NewReader([]byte(yamlDocs))))

Expand Down
13 changes: 9 additions & 4 deletions tests/integration/helm_ot_default_namespaceoverride_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,21 @@ type ctxKey string

func Test_Helm_Default_OT_NamespaceOverride(t *testing.T) {

expectedMetrics := internal.DefaultExpectedMetrics
// we have tracing enabled, so check tracing-specific metrics
expectedMetrics = append(expectedMetrics, internal.TracingOtelcolMetrics...)
expectedMetrics := []string{}
// defaults without otel metrics collector metrics, but with Prometheus metrics
expectedMetricsGroups := make([][]string, len(internal.DefaultExpectedMetricsGroups))
copy(expectedMetricsGroups, internal.DefaultExpectedMetricsGroups)
expectedMetricsGroups = append(expectedMetricsGroups, internal.PrometheusMetrics, internal.DefaultOtelcolMetrics, internal.LogsOtelcolMetrics, internal.TracingOtelcolMetrics)
for _, metrics := range expectedMetricsGroups {
expectedMetrics = append(expectedMetrics, metrics...)
}

installChecks := []featureCheck{
CheckSumologicSecret(15),
CheckOtelcolMetadataLogsInstall,
CheckOtelcolMetadataMetricsInstall,
CheckOtelcolEventsInstall,
CheckPrometheusInstall,
CheckOtelcolMetricsCollectorInstall,
CheckOtelcolLogsCollectorInstall,
CheckTracesInstall,
}
Expand Down
6 changes: 3 additions & 3 deletions tests/integration/helm_ot_default_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,16 @@ func Test_Helm_Default_OT(t *testing.T) {
CheckOtelcolMetadataLogsInstall,
CheckOtelcolMetadataMetricsInstall,
CheckOtelcolEventsInstall,
CheckPrometheusInstall,
CheckOtelcolMetricsCollectorInstall,
CheckOtelcolLogsCollectorInstall,
CheckTracesInstall,
}

featInstall := GetInstallFeature(installChecks)

featMetrics := GetMetricsFeature(expectedMetrics, Prometheus)
featMetrics := GetMetricsFeature(expectedMetrics, Otelcol)

featTelegrafMetrics := GetTelegrafMetricsFeature(internal.DefaultExpectedNginxAnnotatedMetrics, Prometheus, true)
featTelegrafMetrics := GetTelegrafMetricsFeature(internal.DefaultExpectedNginxAnnotatedMetrics, Otelcol, true)

featLogs := GetLogsFeature()

Expand Down
55 changes: 0 additions & 55 deletions tests/integration/helm_ot_metrics_test.go

This file was deleted.

8 changes: 5 additions & 3 deletions tests/integration/helm_otc_fips_metadata_installation_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,16 @@ func Test_Helm_Default_OT_FIPS(t *testing.T) {
CheckOtelcolMetadataLogsInstall,
CheckOtelcolMetadataMetricsInstall,
CheckOtelcolEventsInstall,
CheckPrometheusInstall,
CheckOtelcolMetricsCollectorInstall,
CheckOtelcolLogsCollectorInstall,
CheckTracesInstall,
}

featInstall := GetInstallFeature(installChecks)

featMetrics := GetMetricsFeature(expectedMetrics, Prometheus)
featMetrics := GetMetricsFeature(expectedMetrics, Otelcol)

featTelegrafMetrics := GetTelegrafMetricsFeature(internal.DefaultExpectedNginxAnnotatedMetrics, Otelcol, true)

featLogs := GetLogsFeature()

Expand All @@ -37,5 +39,5 @@ func Test_Helm_Default_OT_FIPS(t *testing.T) {

featTraces := GetTracesFeature()

testenv.Test(t, featInstall, featMetrics, featLogs, featMultilineLogs, featEvents, featTraces)
testenv.Test(t, featInstall, featMetrics, featTelegrafMetrics, featLogs, featMultilineLogs, featEvents, featTraces)
}
15 changes: 9 additions & 6 deletions tests/integration/helm_otlp_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import (
)

func Test_Helm_OTLP(t *testing.T) {

expectedMetrics := internal.DefaultExpectedMetrics
// we have tracing enabled, so check tracing-specific metrics
expectedMetrics = append(expectedMetrics, internal.TracingOtelcolMetrics...)
Expand All @@ -20,20 +19,24 @@ func Test_Helm_OTLP(t *testing.T) {
CheckOtelcolMetadataLogsInstall,
CheckOtelcolMetadataMetricsInstall,
CheckOtelcolEventsInstall,
CheckPrometheusInstall,
CheckOtelcolMetricsCollectorInstall,
CheckOtelcolLogsCollectorInstall,
CheckTracesInstall,
}

featInstall := GetInstallFeature(installChecks)

featLogs := GetLogsFeature()
featMetrics := GetMetricsFeature(expectedMetrics, Otelcol)

featMetrics := GetMetricsFeature(expectedMetrics, Prometheus)
featTelegrafMetrics := GetTelegrafMetricsFeature(internal.DefaultExpectedNginxAnnotatedMetrics, Otelcol, true)

featTraces := GetTracesFeature()
featLogs := GetLogsFeature()

featMultilineLogs := GetMultipleMultilineLogsFeature()

featEvents := GetEventsFeature()

testenv.Test(t, featInstall, featLogs, featMetrics, featEvents, featTraces)
featTraces := GetTracesFeature()

testenv.Test(t, featInstall, featMetrics, featTelegrafMetrics, featLogs, featMultilineLogs, featEvents, featTraces)
}
35 changes: 35 additions & 0 deletions tests/integration/helm_prometheus_metrics_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
//go:build onlylatest
// +build onlylatest

package integration

import (
"testing"

"github.com/SumoLogic/sumologic-kubernetes-collection/tests/integration/internal"
)

func Test_Helm_Prometheus_Metrics(t *testing.T) {
expectedMetrics := []string{}
// defaults without otel metrics collector metrics, but with Prometheus metrics
expectedMetricsGroups := make([][]string, len(internal.DefaultExpectedMetricsGroups))
copy(expectedMetricsGroups, internal.DefaultExpectedMetricsGroups)
expectedMetricsGroups = append(expectedMetricsGroups, internal.PrometheusMetrics, internal.DefaultOtelcolMetrics)
for _, metrics := range expectedMetricsGroups {
expectedMetrics = append(expectedMetrics, metrics...)
}

installChecks := []featureCheck{
CheckSumologicSecret(9),
CheckOtelcolMetadataMetricsInstall,
CheckPrometheusInstall,
}

featInstall := GetInstallFeature(installChecks)

featMetrics := GetMetricsFeature(expectedMetrics, Prometheus)

featTelegrafMetrics := GetTelegrafMetricsFeature(internal.DefaultExpectedNginxAnnotatedMetrics, Prometheus, true)

testenv.Test(t, featInstall, featMetrics, featTelegrafMetrics)
}
3 changes: 1 addition & 2 deletions tests/integration/internal/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,6 @@ var (
CoreDNSMetrics,
CAdvisorMetrics,
NodeExporterMetrics,
PrometheusMetrics,
OtherMetrics,
AdditionalNodeExporterMetrics,
}
Expand Down Expand Up @@ -414,7 +413,7 @@ func InitializeConstants() error {
}

DefaultExpectedMetrics = []string{}
metricsGroupsWithOtelcol := append(DefaultExpectedMetricsGroups, DefaultOtelcolMetrics, LogsOtelcolMetrics)
metricsGroupsWithOtelcol := append(DefaultExpectedMetricsGroups, DefaultOtelcolMetrics, LogsOtelcolMetrics, MetricsCollectorOtelcolMetrics)
for _, metrics := range metricsGroupsWithOtelcol {
DefaultExpectedMetrics = append(DefaultExpectedMetrics, metrics...)
}
Expand Down
Loading

0 comments on commit 543a433

Please sign in to comment.