-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metrics): add internal latency of actors, MVs and sinks #19639
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Should we cherry-pick?
@@ -744,6 +744,26 @@ def section_streaming(outer_panels): | |||
), | |||
], | |||
), | |||
panels.timeseries_latency( | |||
"Latency of Materialize Views & Sinks", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be easier to interpret than the epoch one. Should we add it to dev dashboard also? (Actually I've never checked user dashboard before... 🤪)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dev dashboard shows epoch instead of latency. The latency is actually calculated by timestamp(<epoch_metrics>) - <epoch_metrics>
, which is not very accurate. Thus I think, with a deeper understanding of RisingWave, showing epoch would be better.
"The current epoch that the Materialize Executors are processing. If an MV's epoch is far behind the others, " | ||
"it's very likely to be the performance bottleneck", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if an upstream MV is blocked, then all downstream MVs will have epoch lag? (Therefore we can't locate the root cause from this metrics alone) 🤔
Besides, backpressure may also affect upstream MV's epoch...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found you once raised the idea to show the epoch on DAG. 🤔
Do you mean check-pick to 2.1? Normally it will be shipped at 2.2, which sounds good to me |
Yes to 2.1. Since this may help troubleshooting incl. oncall, I feel it may be good to make it available eaelier... |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR resolves #18114, and also introduced an actor-level metrics for #19619.
New panel in DEV dashabord:
New panel in USER dashabord:
Note there are 2 metrics for each sink - "output" means reading from log store, while "enqueue" means writing into log store.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.