You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a continuation of this issue #220 and this PR #373
Based on what we have seen in this and this PR's Comments figure out what are the best metrics, time ranges, etc to create meaningful Alert rules and Dashboards.
To avoid oscillations of node_filesystem_avail_bytes, we can use a combination of node_filesystem_avail_bytes and loki_distributor_bytes_received_total.
For example, we'd have a 72h and 20m predictions:
# 72h prediction, using a 18h window
sum (rate(loki_distributor_bytes_received_total[18h])*60*60*72) > bool sum(node_filesystem_avail_bytes)
# 20m prediction, using a 5m window
sum (rate(loki_distributor_bytes_received_total[5m])*60*20) > bool sum(node_filesystem_avail_bytes)
No more false-positive on startup.
It’s uncompressed, so the estimation is conservative. Later on could add a scaling factor to account for compression.
Need to replace “sum” over node_avail_bytes with something more correct!
lucabello
changed the title
Alert rules and Dashboard panles about log growth rate in Grafana
Alert rules and Dashboard panels about log growth rate in Grafana
Jul 17, 2024
Enhancement Proposal
This issue is a continuation of this issue #220 and this PR #373
Based on what we have seen in this and this PR's Comments figure out what are the best metrics, time ranges, etc to create meaningful Alert rules and Dashboards.
About alert rules, useful post: https://www.robustperception.io/reduce-noise-from-disk-space-alerts/
The text was updated successfully, but these errors were encountered: