Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per actor metrics: should be cleaned when the actor is dropped or moved. #9492

Closed
Tracked by #6855
fuyufjh opened this issue Apr 26, 2023 · 10 comments
Closed
Tracked by #6855
Assignees
Labels
help wanted Issues that need help from contributors
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented Apr 26, 2023

Porblem

The leaked actor memory not only consumes extra memory but also affect the metrics. As the screenshot shows, the Actor 24 has already been dropped, but the metrics still exist.

image

Cause

The is caused by the design of MetricVec (actually a hashmap of labels -> single metrics) in the Prometheus client library.

For example,

pub agg_cached_keys: GenericGaugeVec<AtomicI64>,

which is actually backed by MetricVec

/// The underlying implementation for [`GaugeVec`] and [`IntGaugeVec`].
pub type GenericGaugeVec<P> = MetricVec<GaugeVecBuilder<P>>;

When call it with with_label_values, a new key (label) will be created in that hashmap e.g.

this.metrics
    .agg_cached_keys
    .with_label_values(&[&table_id_str, &actor_id_str])
    .set(vars.agg_group_cache.len() as i64);

But it's never been removed.

Solution

Similar to LockGuard, one solution I can tell is to wrap the MetricVec varaibles e.g. agg_cached_keys within a handler object, and remove itselves' key (label) from MetricVec's hashmap when being dropped.

@github-actions github-actions bot added this to the release-0.20 milestone Apr 26, 2023
@fuyufjh fuyufjh added good first issue Good for newcomers help wanted Issues that need help from contributors labels Apr 26, 2023
@fuyufjh
Copy link
Member Author

fuyufjh commented May 9, 2023

Quite annoying when investigating problems... 🥲 Hope to be fixed

@MrCroxx
Copy link
Contributor

MrCroxx commented May 10, 2023

IMO, the streaming actor metrics imitate how the batch task metrics are cleaned (with customized Collector). Related issues: #3832 #4844 #5742. What are your opinions? @fuyufjh @ZENOTME

@MrCroxx
Copy link
Contributor

MrCroxx commented May 10, 2023

And currently StreamingMetrics contains metrics of task level, actor level, executor level, etc, which can be split into multiple metrics for better lifetime management.

@fuyufjh
Copy link
Member Author

fuyufjh commented May 10, 2023

IMO, the streaming actor metrics imitate how the batch task metrics are cleaned (with customized Collector). Related issues: #3832 #4844 #5742. What are your opinions? @fuyufjh @ZENOTME

Sounds good to me. Similar to the solution that I imagined before (described in the PR's desciption) i.e. using something to hold the lifetime of these actor-level metrics

@ZENOTME
Copy link
Contributor

ZENOTME commented May 10, 2023

IMO, the streaming actor metrics imitate how the batch task metrics are cleaned (with customized Collector). Related issues: #3832 #4844 #5742. What are your opinions? @fuyufjh @ZENOTME

+1. I think it's the same problem in batch and it solved in #9378.

@MrCroxx
Copy link
Contributor

MrCroxx commented May 10, 2023

One thing that makes streaming metrics harder to clean is that the labels of the streaming metrics are not the same. 🤣 Let me try to register/collect them with as less modifications as possible.

@fuyufjh
Copy link
Member Author

fuyufjh commented Jul 14, 2023

@MrCroxx Any further updates?

@fuyufjh fuyufjh removed the good first issue Good for newcomers label Aug 8, 2023
@fuyufjh fuyufjh modified the milestones: release-1.0, release-1.2 Aug 8, 2023
@MrCroxx
Copy link
Contributor

MrCroxx commented Aug 8, 2023

Worked on the new file cache engine before. Let me get back to this PR these days.

@lmatz
Copy link
Contributor

lmatz commented Aug 18, 2023

SCR-20230818-oj4
The example above is an aggregation.
Als have a case for join.

@fuyufjh
Copy link
Member Author

fuyufjh commented Sep 11, 2023

@MrCroxx any updates? 👀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Issues that need help from contributors
Projects
None yet
Development

No branches or pull requests

4 participants