You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I either can't initialize all the counters in code because I have no control over it, or the counters have such a high cardinality that it is not feasible to initialize all of them when the application starts
Let's assume that I want an SLO on mymetric_request_count{rpc="myendpoint"}. When the application starts, this counter is not initialized, so it is not reported on the application's /metrics endpoint. It will only be reported, once the application has traffic on the myendpoint rpc endpoint in this example.
Problem
This will lead to no data being reported and the SLOMetricAbsent alert will fire. I can disable the absent alert via a property on the SLO. But the burnrate expressions will evaluate to no data.
Pyrra will generate the following absent rule. As the metric is not reported after an application restart, because the counter is not initialized, the absent error will fire (as described, this can be disabled). The for duration is calculated based on the window and is not configurable at the moment.
Pyrra will generate the following recording rule (out of many):
- expr: | sum by (rpc) (rate(mymetric_request_count{rpc="myendpoint", code=~"Internal|UNKNOWN"}[5m])) / sum by (rpc) (rate(mymetric_request_count{rpc="myendpoint"}[5m]))
Known mitigations
a) Initialize the counters in the code (which in my scenario I can't)
b) Create recording rules for the error and the total metric and pass these into your SLO, as the SLO only accepts Vector Selectors. These recording rules will contain workarounds to solve the problem of the uninitialized counters.
The recording rule can look like this. This will report 0 if the application is up, but the counter is not initialized.
mymetric_request_count{rpc="myendpoint", code=~"Internal|UNKNOWN"} or up{job="myjob"} * 0
The recording rule only helps with low cardinality metrics, as I can know all of the label values in advance. If I have high cardinality or potentially unknown label values (imagine customer IDs), then the or clause must be added dynamically, not in advance.
Request / Question
Are you interested in adding support for a customized or hardcoded or clause to help with uninitialized Prometheus counters?
This could be:
or up{job="myjob"} * 0
or vector(0) (will always report 0, even if the application is not running)
These could be set via a toggle on the SLO or fully customizable PromQL (which I know you try to avoid, as you would like to keep it simple).
This pattern is so far only used in the pyrra_availability metric (link)
The text was updated successfully, but these errors were encountered:
fstr
changed the title
Improve support for applications with low traffic
Support for applications with low traffic / uninitialized counters
Aug 30, 2024
Scenario
Let's assume that I want an SLO on
mymetric_request_count{rpc="myendpoint"}
. When the application starts, this counter is not initialized, so it is not reported on the application's/metrics
endpoint. It will only be reported, once the application has traffic on themyendpoint
rpc endpoint in this example.Problem
This will lead to no data being reported and the
SLOMetricAbsent
alert will fire. I can disable the absent alert via a property on the SLO. But the burnrate expressions will evaluate to no data.The SLO definition
Pyrra will generate the following
absent
rule. As the metric is not reported after an application restart, because the counter is not initialized, the absent error will fire (as described, this can be disabled). Thefor
duration is calculated based on the window and is not configurable at the moment.Pyrra will generate the following recording rule (out of many):
Known mitigations
Vector Selectors
. These recording rules will contain workarounds to solve the problem of the uninitialized counters.The recording rule can look like this. This will report
0
if the application is up, but the counter is not initialized.The recording rule only helps with low cardinality metrics, as I can know all of the label values in advance. If I have high cardinality or potentially unknown label values (imagine customer IDs), then the
or
clause must be added dynamically, not in advance.Request / Question
Are you interested in adding support for a customized or hardcoded
or
clause to help with uninitialized Prometheus counters?This could be:
or up{job="myjob"} * 0
or vector(0)
(will always report 0, even if the application is not running)These could be set via a toggle on the SLO or fully customizable PromQL (which I know you try to avoid, as you would like to keep it simple).
This pattern is so far only used in the
pyrra_availability
metric (link)The text was updated successfully, but these errors were encountered: