Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: LabelGuardedCounter metrics report incorrect value #13948

Closed
fuyufjh opened this issue Dec 12, 2023 · 5 comments · Fixed by #13994
Closed

bug: LabelGuardedCounter metrics report incorrect value #13948

fuyufjh opened this issue Dec 12, 2023 · 5 comments · Fixed by #13994
Assignees
Labels
type/bug Something isn't working
Milestone

Comments

@fuyufjh
Copy link
Member

fuyufjh commented Dec 12, 2023

Describe the bug

Related #13080

Screenshot 2023-12-12 at 20 04 00

Error message/log

No response

To Reproduce

No response

Expected behavior

No response

How did you deploy RisingWave?

No response

The version of RisingWave

main

Additional context

No response

@fuyufjh fuyufjh added the type/bug Something isn't working label Dec 12, 2023
@github-actions github-actions bot added this to the release-1.6 milestone Dec 12, 2023
@hzxa21
Copy link
Collaborator

hzxa21 commented Dec 13, 2023

cc @wenym1

@wenym1
Copy link
Contributor

wenym1 commented Dec 13, 2023

Is there any query to reproduce the bug?

@wenym1 wenym1 self-assigned this Dec 13, 2023
@fuyufjh
Copy link
Member Author

fuyufjh commented Dec 13, 2023

Is there any query to reproduce the bug?

Try this:

CREATE TABLE t1 (k int, v1 struct<v2 int, v3 double>, t1 timestamp, c1 varchar) 
WITH (
     connector = 'datagen',
  
     fields.k.kind = 'sequence',
     fields.k.start = '1',
  
     fields.v1.v2.kind = 'random',
     fields.v1.v2.min = '-10',
     fields.v1.v2.max = '10',
     fields.v1.v2.seed = '1',
  
     fields.v1.v3.kind = 'random',
     fields.v1.v3.min = '15',
     fields.v1.v3.max = '55',
     fields.v1.v3.seed = '1',
  
     fields.t1.kind = 'random',
     fields.t1.max_past = '2h 37min',
     fields.t1.max_past_mode = 'relative',
     fields.t1.seed = '3',

     fields.z1.kind = 'random',
     fields.z1.max_past = '2h 37min',
     fields.z1.max_past_mode = 'relative',
     fields.z1.seed = '3',

     fields.c1.kind = 'random',
     fields.c1.length = '16',
     fields.c1.seed = '3',
  
     datagen.rows.per.second = '10'
 ) FORMAT PLAIN ENCODE JSON;


create materialized view mv1 as select (v1).v2, count(k) from t1 group by 1;

And then observe the actor_out_record_cnt from Grafana.

@fuyufjh
Copy link
Member Author

fuyufjh commented Dec 13, 2023

I am suspecting

.remove_label_values(&labels.each_ref().map(|s| s.as_str()))

might incorrectly deleted metrics from the inner Counter, thus, some inc_by() was actually lost.

Didn't fully understand the logic around uncollected_removed_labels...

@wenym1
Copy link
Contributor

wenym1 commented Dec 13, 2023

I am suspecting

risingwave/src/common/src/metrics/guarded_metrics.rs

Line 189 in e8f4a8a

.remove_label_values(&labels.each_ref().map(|s| s.as_str()))
might incorrectly deleted metrics from the inner Counter, thus, some inc_by() was actually lost.

Didn't fully understand the logic around uncollected_removed_labels...

The logic before using uncollected_removed_labels is, when the label is not used anywhere, we call remove_label_values on the unused label immediately, which may cause a loss of metrics because the label is removed before being collected.

After introducing uncollected_removed_labels, the unused label will be first added to this list, and on collect, we first call inner.collect() to collect metric of all labels, and then we call remove_label_values to really remove the label. In this way, we can avoid losing the metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants