-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(metrics): remove actor_id
label from back-pressure metrics based on metrics level
#18213
refactor(metrics): remove actor_id
label from back-pressure metrics based on metrics level
#18213
Conversation
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
let dispatcher_count = register_guarded_int_gauge_vec_with_registry!( | ||
"stream_dispatcher_count", | ||
"Total number of dispatchers", | ||
&["fragment_id", "downstream_fragment_id"], | ||
registry | ||
) | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is exactly equal to the parallelism (i.e. number of actors) of this fragment, no need to introduce a new metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right as we currently don't support multiple-edges between two fragments. But is there existing metrics for "parallelism"? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is there existing metrics for "parallelism"? 👀
We can retrieve this from metrics actor_info
of which labels are "actor_id", "fragment_id", "compute_node"
, but is this also an actor-level metrics with high cardinality? 🥵🥵
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Let's introduce a new metrics for "parallelism (i.e. number of actors) of this fragment"
Meanwhile, for actor_info
, I recommend that we can keep it as is now and focus on other actor-level metrics.
…-in-backpressure-metrics Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Can we verify that the amount of metrics ingested by prometheus is in fact less than before, after this PR? |
Signed-off-by: Bugen Zhao <[email protected]>
Sure. This is what the metrics look like when
While the following is under
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Signed-off-by: Bugen Zhao <[email protected]>
… based on metrics level (#18213) Signed-off-by: Bugen Zhao <[email protected]>
…ure metrics based on metrics level (#18213) (#18330) Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao [email protected]I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
See background in #18108. Resolve #18062.
Following the idea described in #18108 (comment), i.e., split
avg
intosum
andcount
, this PR reduces the cardinality of the back-pressure metricsstream_actor_output_buffer_blocking_duration_ns
by masking theactor_id
label if the metrics level is less verbose thanDebug
. This is achieved by utilizingRelabeledMetricVec
.Tested locally and the results are the same.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.