Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(meta): optimize table stats report performance #15401

Merged
merged 5 commits into from
Mar 7, 2024

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Mar 4, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

close #15153

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Little-Wallace <[email protected]>
@Li0k Li0k requested review from Li0k, hzxa21 and zwang28 March 4, 2024 06:09
table_stats_change: &PbTableStatsMap,
) {
for (table_id, stats) in table_stats_change {
if stats.total_key_size == 0 && stats.total_value_size == 0 && stats.total_key_count == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we need to check 0 after purge_prost_table_stats ?


// apply version delta before we persist this change. If it causes panic we can
// recover to a correct state after restarting meta-node.
current_version.apply_version_delta(&version_delta);
if purge_prost_table_stats(&mut version_stats.table_stats, &current_version) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, This metric is just an approximation, we don't need to update it every time we report the compact task, we can reduce the overhead by reporting it less frequently, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR only report the table which has update

Signed-off-by: Little-Wallace <[email protected]>
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three differnet logics to report the table stats metrics in differnet places:

  1. init: get_local_table_stats
  2. report_compact_task: trigger_local_table_stat
  3. commit_epoch: trigger_table_stat

This looks ad-hoc and not clean. Some ideas to improve:

  • Maintain local table stats (with table_id -> Int instead of IntGauge) under versioning
  • Update the local table stats under version write lock (which is already acquired) in report_compact_task and commit_epoch. We can do it incrementally and no reset is needed.
  • Report local table stats to prometheus in HummockTimerEvent::Report

Signed-off-by: Little-Wallace <[email protected]>
@Little-Wallace
Copy link
Contributor Author

There are three differnet logics to report the table stats metrics in differnet places:

  1. init: get_local_table_stats
  2. report_compact_task: trigger_local_table_stat
  3. commit_epoch: trigger_table_stat

This looks ad-hoc and not clean. Some ideas to improve:

  • Maintain local table stats (with table_id -> Int instead of IntGauge) under versioning
  • Update the local table stats under version write lock (which is already acquired) in report_compact_task and commit_epoch. We can do it incrementally and no reset is needed.
  • Report local table stats to prometheus in HummockTimerEvent::Report

I have refactor code to maintain local table stats (with table_id -> Int instead of IntGauge) under versioning.

@hzxa21
Copy link
Collaborator

hzxa21 commented Mar 6, 2024

There are three differnet logics to report the table stats metrics in differnet places:

  1. init: get_local_table_stats
  2. report_compact_task: trigger_local_table_stat
  3. commit_epoch: trigger_table_stat

This looks ad-hoc and not clean. Some ideas to improve:

  • Maintain local table stats (with table_id -> Int instead of IntGauge) under versioning
  • Update the local table stats under version write lock (which is already acquired) in report_compact_task and commit_epoch. We can do it incrementally and no reset is needed.
  • Report local table stats to prometheus in HummockTimerEvent::Report

I have refactor code to maintain local table stats (with table_id -> Int instead of IntGauge) under versioning.

Did you forget to push?

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@@ -265,7 +265,7 @@ async fn test_hummock_compaction_task() {
// Finish the task and succeed.

assert!(hummock_manager
.report_compact_task(compact_task.task_id, TaskStatus::Success, vec![], None)
.report_compact_task(compact_task.task_id, TaskStatus::Success, vec![], None,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove , at the end

@@ -1972,7 +1972,7 @@ async fn test_compaction_task_expiration_due_to_split_group() {
let version_1 = hummock_manager.get_current_version().await;
// compaction_task.task_status = TaskStatus::Success.into();
assert!(!hummock_manager
.report_compact_task(compaction_task.task_id, TaskStatus::Success, vec![], None)
.report_compact_task(compaction_task.task_id, TaskStatus::Success, vec![], None,)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove , at the end

@Little-Wallace Little-Wallace enabled auto-merge March 7, 2024 04:40
@Little-Wallace Little-Wallace added this pull request to the merge queue Mar 7, 2024
Merged via the queue into main with commit 6a8fa2b Mar 7, 2024
26 of 27 checks passed
@Little-Wallace Little-Wallace deleted the wallace/metrics branch March 7, 2024 05:11
Li0k pushed a commit that referenced this pull request Mar 7, 2024
zwang28 pushed a commit that referenced this pull request Mar 18, 2024
zwang28 added a commit that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug(meta): Hummock's operations consume a lot of meta cpu
3 participants