Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new alert rules for throttling #509

Merged

Conversation

gabrielcocenza
Copy link
Member

@gabrielcocenza gabrielcocenza commented Nov 28, 2024

  • If OpenSearch is throttling, this is an alert that optimizations are necessary like scaling the number of nodes or changing queries and indexing patterns

- It's recommended that OpenSearch runs with at least 3 nodes to
  have high availability
- If OpenSearch is throttling, this is an alert that optimizations
  are necessary like scaling the number of nodes or changing
  queries and indexing patterns
zmraul
zmraul previously approved these changes Dec 2, 2024
@@ -105,3 +105,23 @@
"for": "1m"
"labels":
"severity": "alert"

- "alert": "OpenSearchFewNodes"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this alert is not too relevant. High availability requirements should be explained on docs, or deployment requisites, not trigger once integrated with observability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind agree. However, the generated metrics cannot ensure that a node is lost apart from checking if the upmetric is working or not. I created this as a warning just to trigger something not critical to operators know that the current cluster is not operating as expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the node count alert. We can add in the future if necessary

Copy link
Contributor

@phvalguima phvalguima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gabrielcocenza we cannot unfortunately limit ourselves to 3+ nodes. The scenario of 1-node and 2-node cluster is also supposed to work. On the other side, the throttling alert is a really good one!

src/alert_rules/prometheus/prometheus_alerts.yaml Outdated Show resolved Hide resolved
@gabrielcocenza gabrielcocenza changed the title Add new alert rules for throttling and number of nodes Add new alert rules for throttling Dec 11, 2024
@gabrielcocenza gabrielcocenza merged commit 4416292 into canonical:2/edge Dec 12, 2024
34 of 41 checks passed
phvalguima pushed a commit that referenced this pull request Dec 13, 2024
- If OpenSearch is throttling, this is an alert that optimizations are
necessary like scaling the number of nodes or changing queries and
indexing patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants