Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate alerts when cluster is in yellow state #506

Merged
merged 5 commits into from
Dec 4, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions src/alert_rules/prometheus/prometheus_alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,26 @@
"labels":
"severity": "critical"

- "alert": "OpenSearchClusterYellow"
- "alert": "OpenSearchClusterYellowTemp"
"annotations":
"message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Some cluster replicas shards are not allocated."
"summary": "Cluster health status is YELLOW"
"message": "Cluster {{ $labels.cluster }} health status has been YELLOW for at least 20m. Cluster might be under heavy load."
gabrielcocenza marked this conversation as resolved.
Show resolved Hide resolved
"summary": "Cluster health status is temporarily YELLOW"
"expr": |
sum by (cluster) (opensearch_cluster_status == 1)
sum by (cluster, instance) (opensearch_cluster_shards_number{type=~"relocating|initializing"}) > 0 and on(cluster, instance) opensearch_cluster_status == 1
gabrielcocenza marked this conversation as resolved.
Show resolved Hide resolved
"for": "20m"
"labels":
"severity": "warning"

- "alert": "OpenSearchClusterYellow"
"annotations":
"message": "Cluster {{ $labels.cluster }} health status has been YELLOW with some replica shards unassigned."
gabrielcocenza marked this conversation as resolved.
Show resolved Hide resolved
"summary": "Number of nodes in the cluster might be too low. Consider scaling the application to ensure that it has enough nodes to host all shards."
"expr": |
sum by (cluster, instance) (opensearch_cluster_shards_number{type="unassigned"}) > 0 and on(cluster, instance) opensearch_cluster_status == 1
"for": "10m"
"labels":
"severity": "warning"

- "alert": "OpenSearchBulkRequestsRejectionJumps"
"annotations":
"message": "High Bulk Rejection Ratio at {{ $labels.node }} node in {{ $labels.cluster }} cluster. This node may not be keeping up with the indexing speed."
Expand Down
Loading