-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for configuring Grafana Mimir via PrometheusRule CRDs #2604
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is off to a great start, and I'm really excited to have this component.
We'll want a CHANGELOG entry and reference documentation before we merge. Writing documentation is usually the most time consuming part of new components since we try to explain as much as users need to know in the reference docs.
I added a few tests including some integration tests for the core event loop, but please let me know if there are more tests I should add. Notably, some of the scaffolding code is currently untested.
This will be a judgment call by you; if you think what you have tested is enough to have confidence in the correctness of the code (and any untested code is obviously correct), then you don't need to add any other tests :)
f9f15a6
to
ac615ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 I'm happy with this. We still need a few things before it can be merged:
- Changelog entry
- Documentation
Other than that, looks great, thank you for working on this!
Client copied from cortextool
- Use newer ruler API urls - Remove usage of ioutil - Rename Cortex to Mimir
- Also filter resources in the informer to avoid unecessary events and allocations
Only namespaces matching the expected naming convention are reconciled
- This allows multiple agents to manage groups of namespaces without conflicting
- Remove methods copied from cortextool that are unused here
Use go-kit/log instead
Ok, @rfratto I've added a first pass at reference documentation based on some of the other docs I found. I think at this point it should be good. Let me know if anything else needs fixing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 fantastic. I'll leave the final doc review to @karengermond, but once that's done, we'll get this merged!
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, one extra thing :)
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, aside from minor edits.
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
docs/sources/flow/reference/components/mimir.rules.kubernetes.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Robert Fratto <[email protected]> Co-authored-by: Karen Germond <[email protected]>
@rfratto All comments have been addressed 👍 |
Hi there! Congrats on the work, I can't wait to run this in production 🙂. I've tried it on my lab, it worked great. Two things: 1 : I ran into an issue where the component would not boot correctly. cf. screenshot, it's neither healthy nor unhealthy, it did not finish booting. The issue was that the service account I used for Grafana Agent missed some capabilities. I used the one mentioned in the documentation (https://grafana.com/docs/grafana-cloud/kubernetes-monitoring/other-methods/k8s-agent-metrics/#deploy-grafana-agent-resources). Here is the diff for it to work: apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: grafana-agent
rules:
+ - apiGroups: # This group is needed to list PrometheusRules
+ - "monitoring.coreos.com"
+ resources:
+ - prometheusrules
+ verbs:
+ - '*'
- apiGroups:
- ""
resources:
- nodes
- nodes/proxy
- nodes/metrics
- services
- endpoints
- pods
- events
+ - namespaces # This was missing to list namespaces
(...) This call blocks when the service account has not enough privileges: https://github.com/grafana/agent/blob/main/component/mimir/rules/kubernetes/rules.go#L304. I believe that we should:
wdyt @rfratto @Logiraptor ? 2: I'm down to work on the AlertManager CRD integration. I believe that the approach could be largely similar to what has been done for PrometheusRules. wdyt? Shall I open a dedicated issue for that? On mimir and/or agent? |
Definitely both great ideas 👍 It might also be worth mentioning that the CRDs need to be installed in the cluster, though maybe that's obvious.
Contributions here would be fantastic :) Creating an issue in the agent repo to track sounds good to me. |
Gotcha, I popped an issue in the agent: grafana/alloy#504. I'll try to open a PR in the coming days |
@Logiraptor Can you provide an example creating this integration? I moved from Prometheus to Prometheus-agent and I need a way to make Mimir use my cluster prometheus rules and use mimir alert-manager to send alerts |
@rafilkmp3 Have you checked the docs? https://grafana.com/docs/agent/latest/flow/reference/components/mimir.rules.kubernetes/#example That should help point you in the right direction |
@Logiraptor Yes but I can't find a way to make this work using grafana-agent-operator this is my configuration already with additional rbac
Can you help me know where to add this section ?
|
The agent operator doesn't support grafana agent flow as far as I know. It's really two separate modes of using the agent. I'd have to defer to someone on the agent team to help from here, as this was a cross-team hackathon project for me and I'm not actually working on the agent full time. Pinging @rfratto to help! |
Unfortunately, Grafana Agent Operator is focused around static mode, which doesn't support PrometheusRule CRDs. Grafana Agent Flow is currently the only Grafana Agent variant which supports reading PrometheusRule CRDs, and we don't have any plans right now to add support for it to static mode or the static mode operator. |
PR Description
This PR adds a new flow component:
mimir.rules.kubernetes
. This was my hackathon project for the December Grafana Labs hackathon. (I'm on the Mimir squad). Here's a list of items I managed to get done this week:Which issue(s) this PR fixes
Fixes #1544
Fixes grafana/mimir#2133
Related to #523 and #1986
Notes to the Reviewer
I added a few tests including some integration tests for the core event loop, but please let me know if there are more tests I should add. Notably, some of the scaffolding code is currently untested.Also missing as of now is documentation. Happy to contribute there, if you have any pointers?PR Checklist