Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow: Add component for multi-tenant remote_write support #521

Open
tpaschalis opened this issue Oct 27, 2022 · 11 comments · May be fixed by grafana/agent#2583
Open

Flow: Add component for multi-tenant remote_write support #521

tpaschalis opened this issue Oct 27, 2022 · 11 comments · May be fixed by grafana/agent#2583
Labels
enhancement New feature or request proposal A proposal for new functionality.

Comments

@tpaschalis
Copy link
Member

tpaschalis commented Oct 27, 2022

The Prometheus remote_write protocol doesn't have the notion of multi-tenancy itself. For this reason different backends offer use various methods to enable multi-tenancy, most often using an X-Scope-OrgID header, or labels with a special meaning.

When using a prometheus.remote_write component, the Prometheus queue_manager reads WAL segments sequentially and enqueues metrics opportunistically to be batched off as remote_write requests. This behaviour offers no fine-grained control at the request level. Our current suggestion to users is that they add an extra header to their endpoint block, but these per-endpoint headers are static, and we don't support write_relabel_config filtering in Flow yet.

The new component would act as a remote_write middleware, receiving metrics from upstream components, extracting a given label (say tenant), and batching timeseries grouped by this label value.

It would then send discrete remote_write requests for these batches while also adding the correct X-Scope-OrgID header for each one.

Notes

  • This was inspired by the conversation in pipeline support for metrics agent#1821 and existing solutions like cortex-tenant.
  • I don't believe it would be easy to try and shoehorn this behaviour into the current prometheus.remote_write component.
  • Using this component would entail some performance penalty which should be measured and explicitly documented.
@tpaschalis tpaschalis added the proposal A proposal for new functionality. label Oct 27, 2022
@oscni
Copy link

oscni commented Oct 27, 2022

Just to add to the great summary above, an example where this is needed is when you scrape metrics from a kubernetes cluster and want the metrics from different namespaces to end up in different tenants/headers.

@akselleirv
Copy link

Thank you @tpaschalis for creating the issue.

I have not looked into Flow too much, but an important configuration option would be to handle this X-Scope-OrgID logic in a centralized way. Let's say I'm collecting the scrape jobs from the Prometheus CRDs, then I want add this write_relabel_config step for all these jobs.

My current approach is to use Kyverno to add this relabel step to all the ServiceMonitor CRDs. Would an option for the Grafana Agent to do the same approach. For example, I have one component that collects all the ServiceMonitors and apart of that component I want to add the relabel of a pod label into a time series label.

Would this be feasible?

@tpaschalis
Copy link
Member Author

tpaschalis commented Nov 2, 2022

Hey @akselleirv, apologies for the belated response. What you're describing would not be actually possible with Flow, until we have Operator components that can read the Prometheus CRDs.

I suppose that your current approach to have duplicated remote_write definitions for each tenant with the correct headers in each, and a write_relabel_config rule that drops all metrics except for those with the correct tenant label?
Good news is that is something that could work in the 'static' Agent mode and the Operator, but would basically be some syntactic sugar around the same approach, as built-in multi-tenancy cannot be implemented without changes to Prometheus' remote_write protocol in itself.

Could you open a separate issue describing your desired approach around the Operator to kickstart this conversation and put some more eyes to it? I think it would benefit to discuss your proposal from scratch.

@ptodev ptodev self-assigned this Nov 3, 2022
@akselleirv
Copy link

Hello @tpaschalis, sorry for not responding to you sooner. I saw that this proposal have been added to the upcoming release which I'm very grateful for. If Prometheus CRDs support is also added to Flow, then it would satisfy my requirements and I can replace the existing setup with Flow.

At the moment I'm using a modified version of the cortext-tenant proxy which allows for some additional logic. Currently I'm adding the time series in an open tenant if the pod has not been labeled with privateMetrics=true. If it has, then I'm adding the time series to the tenant specified by another pod label.

Would it be possible to have this kind of logic in this component? This would be similar to the feature found in the Promtail tenant stage.

@ptodev ptodev linked a pull request Dec 2, 2022 that will close this issue
@ptodev ptodev linked a pull request Dec 2, 2022 that will close this issue
@rfratto rfratto added the enhancement New feature or request label Jan 19, 2023
@mattdurham
Copy link
Collaborator

This is accepted based on the concept but not the specific implementation.

@ptodev
Copy link
Contributor

ptodev commented May 18, 2023

Collector's Loki exporter seems to support remote writing using different headers based on label hints.

Also, today on the community slack a person asked about using different headers based on label hints, but for otelcol.exporter.otlp.

@adberger
Copy link

Any updates on this?

@LilWatson
Copy link

+1

Since there seems to be no progress - is there any alternative for dynamic (based on metric label) tenant headers?

@wildum
Copy link
Contributor

wildum commented Nov 4, 2024

Moving this proposal to active because of the amount of requests for it. @mattdurham let's discus it when you have time whether it's something we can bring in the remote_write component or the remote_queue one

@Nachtfalkeaw
Copy link

I would like to see a possibility to add a component in Alloy the tenant information.
I have an alloy instance and several different components active - however I want to send metrics from one component to tenantA and the metrics from other components to tenantB.

Maybe in the component we can add a parameter like "tenant = tenantA" which is then forwarded to "prometheus.remotw_write" component and this component is able to if the component which is sending the metrics to is has the value set and the dynamically replaces it in remote_write for the relevant data,.

If an additional "label in the metrics should be the indicator where to send the metrics to we should make it possible to decide which label it is because a hardcoded "tenant" label may conflict with existing labels in metrics.

@ptodev
Copy link
Contributor

ptodev commented Nov 25, 2024

I wonder if we could use the proposed foreach block for such a feature. The foreach would create a new sub-pipeline for every member of an array. We would also need to have a new type of component which looks at all the metric samples, gets the tenant label, and outputs an array containing all tenant strings. That array could be what the foreach is iterating on, and each sub-pipeline could filter out the samples which don't apply for its tenant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request proposal A proposal for new functionality.
Projects
Status: Active
Status: In Progress
Development

Successfully merging a pull request may close this issue.

10 participants