Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Component #1823

Merged
merged 48 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a5cf26d
Checkin the networking items.
mattdurham Sep 19, 2024
9e84aee
Fix for config updating and tests.
mattdurham Sep 20, 2024
1cf1e7a
Update internal/component/prometheus/remote/queue/network/loop.go
mattdurham Oct 1, 2024
5c49e9e
Update internal/component/prometheus/remote/queue/network/loop.go
mattdurham Oct 1, 2024
02a41e0
pr feedback
mattdurham Oct 1, 2024
8ab6a26
Merge branch 'wal_network' of github.com:grafana/alloy into wal_network
mattdurham Oct 1, 2024
a638c1a
pr feedback
mattdurham Oct 1, 2024
5abe271
simplify stats
mattdurham Oct 1, 2024
2d0eb00
simplify stats
mattdurham Oct 2, 2024
46e1764
Initial push.
mattdurham Oct 3, 2024
1e12b5e
dev.new-wal merge
mattdurham Oct 3, 2024
254dc4c
docs and some renaming
mattdurham Oct 3, 2024
e06efff
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
992c703
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
0546707
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
e75e34f
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
0ea1f71
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
9a90c2b
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
f04edb4
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 4, 2024
0d99288
Changes and testing.
mattdurham Oct 8, 2024
ed64bc3
Update docs.
mattdurham Oct 8, 2024
98bc887
Update docs.
mattdurham Oct 8, 2024
2e32ce6
Fix race conditions in unit tests.
mattdurham Oct 8, 2024
e1aaa9f
Tweaking unit tests.
mattdurham Oct 8, 2024
38b15a1
lower threshold more.
mattdurham Oct 8, 2024
6f9a820
lower threshold more.
mattdurham Oct 8, 2024
c78ea1d
Fix deadlock in manager tests.
mattdurham Oct 8, 2024
c6239d1
rollback to previous
mattdurham Oct 8, 2024
3bb04d4
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 9, 2024
6713554
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 9, 2024
d5568d9
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 9, 2024
d8cd012
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 9, 2024
42fbdd9
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 9, 2024
d5eb26e
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 9, 2024
0c9e755
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 10, 2024
ce0ecb0
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 11, 2024
872de53
Docs PR feedback
mattdurham Oct 11, 2024
0506c37
Merge remote-tracking branch 'origin/wal_component' into wal_component
mattdurham Oct 11, 2024
db5bd6a
Update docs/sources/reference/components/prometheus/prometheus.remote…
mattdurham Oct 11, 2024
3ee51b3
PR feedback
mattdurham Oct 11, 2024
98872f8
Merge remote-tracking branch 'origin/wal_component' into wal_component
mattdurham Oct 11, 2024
bd2d083
PR feedback
mattdurham Oct 11, 2024
b2d5cab
PR feedback
mattdurham Oct 11, 2024
57d81b0
PR feedback
mattdurham Oct 15, 2024
21952d8
Fix typo
mattdurham Oct 15, 2024
5966b6d
Fix typo
mattdurham Oct 15, 2024
561bfeb
Fix bug.
mattdurham Oct 15, 2024
9dc25f1
Fix docs
mattdurham Oct 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ lint: alloylint
# final command runs tests for all other submodules.
test:
$(GO_ENV) go test $(GO_FLAGS) -race $(shell go list ./... | grep -v /integration-tests/)
$(GO_ENV) go test $(GO_FLAGS) ./internal/static/integrations/node_exporter ./internal/static/logs ./internal/component/otelcol/processor/tail_sampling ./internal/component/loki/source/file ./internal/component/loki/source/docker ./internal/component/prometheus/remote/queue/serialization
$(GO_ENV) go test $(GO_FLAGS) ./internal/static/integrations/node_exporter ./internal/static/logs ./internal/component/otelcol/processor/tail_sampling ./internal/component/loki/source/file ./internal/component/loki/source/docker ./internal/component/prometheus/remote/queue/serialization ./internal/component/prometheus/remote/queue/network
$(GO_ENV) find . -name go.mod -not -path "./go.mod" -execdir go test -race ./... \;

test-packages:
Expand Down
1 change: 1 addition & 0 deletions docs/sources/reference/compatibility/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ The following components, grouped by namespace, _export_ Prometheus `MetricsRece

{{< collapse title="prometheus" >}}
- [prometheus.relabel](../components/prometheus/prometheus.relabel)
- [prometheus.remote.queue](../components/prometheus/prometheus.remote.queue)
- [prometheus.remote_write](../components/prometheus/prometheus.remote_write)
{{< /collapse >}}

Expand Down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to add a bit more info on how prometheus.remote.queue is different from prometheus.remote_write, and when to use what component.

Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
---
canonical: https://grafana.com/docs/alloy/latest/reference/components/prometheus/prometheus.remote.queue/
description: Learn about prometheus.remote.queue
title: prometheus.remote.queue
---


<span class="badge docs-labels__stage docs-labels__item">Experimental</span>

# prometheus.remote.queue
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

`prometheus.remote.queue` collects metrics sent from other components into a
Write-Ahead Log (WAL) and forwards them over the network to a series of
user-supplied endpoints. Metrics are sent over the network using the
[Prometheus Remote Write protocol][remote_write-spec].

You can specify multiple `prometheus.remote.queue` components by giving them different labels.

You should consider everything here extremely experimental and highly subject to change.
[emote_write-spec]: https://docs.google.com/document/d/1LPhVRSFkGNSuU1fBd81ulhsCPR4hkSZyyBj1SZ8fWOM/edit
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
mattdurham marked this conversation as resolved.
Show resolved Hide resolved



## Usage

```alloy
prometheus.remote.queue "LABEL" {
endpoint "default "{
url = REMOTE_WRITE_URL

...
}

...
}
```

## Arguments

The following arguments are supported:

Name | Type | Description | Default | Required
---- | ---- | ----------- | ------- | --------
`ttl` | `time` | `duration` | How long the timestamp of a signal is valid before the signal is discarded. | `2h` | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

## Blocks

The following blocks are supported inside the definition of
`prometheus.remote.queue`:

Hierarchy | Block | Description | Required
--------- | ----- | ----------- | --------
serialization | [serialization][] | Configuration for serializing and writing to disk | no
endpoint | [endpoint][] | Location to send metrics to. | no
endpoint > basic_auth | [basic_auth][] | Configure basic_auth for authenticating to the endpoint. | no

The `>` symbol indicates deeper levels of nesting. For example, `endpoint >
basic_auth` refers to a `basic_auth` block defined inside an
`endpoint` block.

[endpoint]: #endpoint-block
[basic_auth]: #basic_auth-block
[serialization]: #serialization-block

### serialization block
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

The `serialization` block describes how often and at what limits to write to disk. Serialization settings
are shared for each `endpoint.`
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

The following arguments are supported:

Name | Type | Description | Default | Required
---- | ---- | ----------- | ------- | --------
`max_signals_to_batch` | `uint` | The maximum number of signals before they are batched to disk. | `10,000` | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
`batch_frequency` | `duration` | How often to batch signals to disk if `max_signals_to_batch` is not reached. | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use signals as a name? maybe it should be samples? or is this to encompass histograms samples too? I'm not sure if that's a right name. In current remote_write we call this 'metrics' which makes sense although it's not very precise: https://grafana.com/docs/alloy/next/reference/components/prometheus/prometheus.remote_write/#wal-block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsequent PR will enable metadata storage, so I wanted it to not be just metrics. Sample is to narrow, open to better names here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 'entries'? that would be a generic name for things inside a queue.



### endpoint block

The `endpoint` block describes a single location to send metrics to. Multiple
`endpoint` blocks can be provided to send metrics to multiple locations. Each
`endpoint` will have its own WAL folder
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

The following arguments are supported:

Name | Type | Description | Default | Required
---- | ---- | ----------- | ------- | --------
`url` | `string` | Full URL to send metrics to. | | yes
`name` | `string` | Optional name to identify the endpoint in metrics. | | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
`write_timeout` | `duration` | Timeout for requests made to the URL. | `"30s"` | no
`retry_backoff` | `duration` | How often to wait between retries. | `1s` | no
`max_retry_backoff_attempts` | Maximum number of retries before dropping the batch. | `1s` | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
`batch_count` | `uint` | How many series to queue in each queue. | `1,000` | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
`flush_frequency` | `duration` | How often to wait until sending if `batch_count` is not trigger. | `1s` | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
`queue_count` | `uint` | How many concurrent batches to write. | 10 | no
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
`external_labels` | `map(string)` | Labels to add to metrics sent over the network. | | no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`external_labels` | `map(string)` | Labels to add to metrics sent over the network. | | no
`extra_labels` | `map(string)` | Labels to add to metrics sent over the network. | | no

not sure why external?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its how remote write defines it, so its easier to grok. I dont have a strong preference though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll leave it up to you, hard to tell whether it's worth using better phrasing with a cost of potentially tripping some users who got used to old names


### basic_auth block

{{< docs/shared lookup="reference/components/basic-auth-block.md" source="alloy" version="<ALLOY_VERSION>" >}}


## Exported fields

The following fields are exported and can be referenced by other components:

Name | Type | Description
---- | ---- | -----------
`receiver` | `MetricsReceiver` | A value that other components can use to send metrics to.

## Component health

`prometheus.remote.queue` is only reported as unhealthy if given an invalid
configuration. In those cases, exported fields are kept at their last healthy
values.

## Debug information

`prometheus.remote_write` does not expose any component-specific debug
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
information.

## Debug metrics

The following metrics are provided for backward compatibility.
They generally behave the same, but there are likely edge cases where they differ.

* `prometheus_remote_write_wal_storage_created_series_total` (counter): Total number of created
series appended to the WAL.
* `prometheus_remote_write_wal_storage_removed_series_total` (counter): Total number of series
removed from the WAL.
* `prometheus_remote_write_wal_samples_appended_total` (counter): Total number of samples
appended to the WAL.
* `prometheus_remote_write_wal_exemplars_appended_total` (counter): Total number of exemplars
appended to the WAL.
* `prometheus_remote_storage_samples_total` (counter): Total number of samples
sent to remote storage.
* `prometheus_remote_storage_exemplars_total` (counter): Total number of
exemplars sent to remote storage.
* `prometheus_remote_storage_metadata_total` (counter): Total number of
metadata entries sent to remote storage.
* `prometheus_remote_storage_samples_failed_total` (counter): Total number of
samples that failed to send to remote storage due to non-recoverable errors.
* `prometheus_remote_storage_exemplars_failed_total` (counter): Total number of
exemplars that failed to send to remote storage due to non-recoverable errors.
* `prometheus_remote_storage_metadata_failed_total` (counter): Total number of
metadata entries that failed to send to remote storage due to
non-recoverable errors.
* `prometheus_remote_storage_samples_retries_total` (counter): Total number of
samples that failed to send to remote storage but were retried due to
recoverable errors.
* `prometheus_remote_storage_exemplars_retried_total` (counter): Total number of
exemplars that failed to send to remote storage but were retried due to
recoverable errors.
* `prometheus_remote_storage_metadata_retried_total` (counter): Total number of
metadata entries that failed to send to remote storage but were retried due
to recoverable errors.
* `prometheus_remote_storage_samples_dropped_total` (counter): Total number of
samples which were dropped after being read from the WAL before being sent to
remote_write because of an unknown reference ID.
* `prometheus_remote_storage_exemplars_dropped_total` (counter): Total number
of exemplars that were dropped after being read from the WAL before being
sent to remote_write because of an unknown reference ID.
* `prometheus_remote_storage_enqueue_retries_total` (counter): Total number of
times enqueue has failed because a shard's queue was full.
* `prometheus_remote_storage_sent_batch_duration_seconds` (histogram): Duration
of send calls to remote storage.
* `prometheus_remote_storage_queue_highest_sent_timestamp_seconds` (gauge):
Unix timestamp of the latest WAL sample successfully sent by a queue.
* `prometheus_remote_storage_samples_pending` (gauge): The number of samples
pending in shards to be sent to remote storage.
* `prometheus_remote_storage_exemplars_pending` (gauge): The number of
exemplars pending in shards to be sent to remote storage.
* `prometheus_remote_storage_samples_in_total` (counter): Samples read into
remote storage.
* `prometheus_remote_storage_exemplars_in_total` (counter): Exemplars read into
remote storage.

Metrics that are new to `prometheus.remote.write`. These are highly subject to change.

* `alloy_queue_series_serializer_incoming_signals` (counter): Total number of series written to serialization.
* `alloy_queue_metadata_serializer_incoming_signals` (counter): Total number of metadata written to serialization.
* `alloy_queue_series_serializer_incoming_timestamp_seconds` (gauge): Highest timestamp of incoming series.
* `alloy_queue_series_serializer_errors` (gauge): Number of errors for series written to serializer.
* `alloy_queue_metadata_serializer_errors` (gauge): Number of errors for metadata written to serializer.
* `alloy_queue_series_network_timestamp_seconds` (gauge): Highest timestamp written to an endpoint.
* `alloy_queue_series_network_sent` (counter): Number of series sent successfully.
* `alloy_queue_metadata_network_sent` (counter): Number of metadata sent successfully.
* `alloy_queue_network_series_failed` (counter): Number of series failed.
* `alloy_queue_network_metadata_failed` (counter): Number of metadata failed.
* `alloy_queue_network_series_retried` (counter): Number of series retried due to network issues.
* `alloy_queue_network_metadata_retried` (counter): Number of metadata retried due to network issues.
* `alloy_queue_network_series_retried_429` (counter): Number of series retried due to status code 429.
* `alloy_queue_network_metadata_retried_429` (counter): Number of metadata retried due to status code 429.
* `alloy_queue_network_series_retried_5xx` (counter): Number of series retried due to status code 5xx.
* `alloy_queue_network_metadata_retried_5xx` (counter): Number of metadata retried due to status code 5xx.
* `alloy_queue_network_series_network_duration_seconds` (histogram): Duration writing series to endpoint.
* `alloy_queue_network_metadata_network_duration_seconds` (histogram): Duration writing metadata to endpoint.
* `alloy_queue_network_series_network_errors` (counter): Number of errors writing series to network.
* `alloy_queue_network_metadata_network_errors` (counter): Number of errors writing metadata to network.

## Examples

The following examples show you how to create `prometheus.remote_write` components that send metrics to different destinations.
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

### Send metrics to a local Mimir instance

You can create a `prometheus.remote.queue` component that sends your metrics to a local Mimir instance:

```alloy
prometheus.remote.queue "staging" {
// Send metrics to a locally running Mimir.
endpoint "mimir" {
url = "http://mimir:9009/api/v1/push"

basic_auth {
username = "example-user"
password = "example-password"
}
}
}

// Configure a prometheus.scrape component to send metrics to
// prometheus.remote_write component.
mattdurham marked this conversation as resolved.
Show resolved Hide resolved
prometheus.scrape "demo" {
targets = [
// Collect metrics from the default HTTP listen address.
{"__address__" = "127.0.0.1:12345"},
]
forward_to = [prometheus.remote.queue.staging.receiver]
}

```

## TODO Metadata settings
mattdurham marked this conversation as resolved.
Show resolved Hide resolved

## Technical details

`prometheus.remote.queue` uses [snappy][] for compression.
`prometheus.remote.queue` sends native histograms by default.
Any labels that start with `__` will be removed before sending to the endpoint.

### Data retention

Data is written to disk in blocks utilizing [snappy][] compression. These blocks are read on startup and resent if they are still within the TTL.
Any data that has not been written to disk, or that is in the network queues is lost if {{< param "PRODUCT_NAME" >}} is restarted.

### Retries

`prometheus.remote.queue` will retry sending data if the following errors or HTTP status codes are returned:

* Network errors.
* HTTP 429 errors.
* HTTP 5XX errors.

`prometheus.remote.queue` will not retry sending data if any other unsuccessful status codes are returned.

### Memory

`prometheus.remote.queue` is meant to be memory efficient.
You can adjust the `max_signals_to_batch`, `queue_count`, and `batch_size` to control how much memory is used.
A higher `max_signals_to_batch` allows for more efficient disk compression.
A higher `queue_count` allows more concurrent writes, and `batch_size` allows more data sent at one time.
This can allow greater throughput at the cost of more memory on both {{< param "PRODUCT_NAME" >}} and the endpoint.
The defaults are suitable for most common usages.

## Compatible components
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section looks a bit weird. Usually the comments are before the section heading. Also, the same section is listed twice :) It'd be nice to regenerate it.


`prometheus.remote.queue` has exports that can be consumed by the following components:

- Components that consume [Prometheus `MetricsReceiver`](../../../compatibility/#prometheus-metricsreceiver-consumers)

{{< admonition type="note" >}}
Connecting some components may not be sensible or components may require further configuration to make the connection work correctly.
Refer to the linked documentation for more details.
{{< /admonition >}}

<!-- END GENERATED COMPATIBLE COMPONENTS -->

[snappy]: https://en.wikipedia.org/wiki/Snappy_(compression)
[WAL block]: #wal-block
[Stop]: ../../../../set-up/run/
[run]: ../../../cli/run/
<!-- START GENERATED COMPATIBLE COMPONENTS -->

## Compatible components

`prometheus.remote.queue` has exports that can be consumed by the following components:

- Components that consume [Prometheus `MetricsReceiver`](../../../compatibility/#prometheus-metricsreceiver-consumers)

{{< admonition type="note" >}}
Connecting some components may not be sensible or components may require further configuration to make the connection work correctly.
Refer to the linked documentation for more details.
{{< /admonition >}}

<!-- END GENERATED COMPATIBLE COMPONENTS -->
3 changes: 2 additions & 1 deletion internal/component/all/all.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,10 @@ import (
_ "github.com/grafana/alloy/internal/component/otelcol/processor/attributes" // Import otelcol.processor.attributes
_ "github.com/grafana/alloy/internal/component/otelcol/processor/batch" // Import otelcol.processor.batch
_ "github.com/grafana/alloy/internal/component/otelcol/processor/deltatocumulative" // Import otelcol.processor.deltatocumulative
_ "github.com/grafana/alloy/internal/component/otelcol/processor/interval" // Import otelcol.processor.interval
_ "github.com/grafana/alloy/internal/component/otelcol/processor/discovery" // Import otelcol.processor.discovery
_ "github.com/grafana/alloy/internal/component/otelcol/processor/filter" // Import otelcol.processor.filter
_ "github.com/grafana/alloy/internal/component/otelcol/processor/groupbyattrs" // Import otelcol.processor.groupbyattrs
_ "github.com/grafana/alloy/internal/component/otelcol/processor/interval" // Import otelcol.processor.interval
_ "github.com/grafana/alloy/internal/component/otelcol/processor/k8sattributes" // Import otelcol.processor.k8sattributes
_ "github.com/grafana/alloy/internal/component/otelcol/processor/memorylimiter" // Import otelcol.processor.memory_limiter
_ "github.com/grafana/alloy/internal/component/otelcol/processor/probabilistic_sampler" // Import otelcol.processor.probabilistic_sampler
Expand Down Expand Up @@ -134,6 +134,7 @@ import (
_ "github.com/grafana/alloy/internal/component/prometheus/operator/servicemonitors" // Import prometheus.operator.servicemonitors
_ "github.com/grafana/alloy/internal/component/prometheus/receive_http" // Import prometheus.receive_http
_ "github.com/grafana/alloy/internal/component/prometheus/relabel" // Import prometheus.relabel
_ "github.com/grafana/alloy/internal/component/prometheus/remote/queue" // Import prometheus.remote.queue
_ "github.com/grafana/alloy/internal/component/prometheus/remotewrite" // Import prometheus.remote_write
_ "github.com/grafana/alloy/internal/component/prometheus/scrape" // Import prometheus.scrape
_ "github.com/grafana/alloy/internal/component/pyroscope/ebpf" // Import pyroscope.ebpf
Expand Down
Loading
Loading