-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global error handler cleanup - Metrics SDK #2185
Changes from all commits
704b848
b8cb6af
a0b6eee
acf97fa
7b48f14
ac61b79
a42d516
dbaa7f5
73fca4d
ee5c5f5
3aa97cf
de54afe
e62de04
bda9faa
4a34922
0087c24
115e73f
56682a4
7e48cbf
d81b374
4b09c92
949930e
6dcc9eb
4b42fe8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,12 +12,11 @@ use std::ops::{Add, AddAssign, Sub}; | |
use std::sync::atomic::{AtomicBool, AtomicI64, AtomicU64, AtomicUsize, Ordering}; | ||
use std::sync::{Arc, RwLock}; | ||
|
||
use aggregate::is_under_cardinality_limit; | ||
use aggregate::{cardinality_limit, is_under_cardinality_limit}; | ||
pub(crate) use aggregate::{AggregateBuilder, ComputeAggregation, Measure}; | ||
pub(crate) use exponential_histogram::{EXPO_MAX_SCALE, EXPO_MIN_SCALE}; | ||
use once_cell::sync::Lazy; | ||
use opentelemetry::metrics::MetricsError; | ||
use opentelemetry::{global, otel_warn, KeyValue}; | ||
use opentelemetry::{otel_warn, KeyValue}; | ||
|
||
use crate::metrics::AttributeSet; | ||
|
||
|
@@ -146,9 +145,10 @@ impl<AU: AtomicallyUpdate<T>, T: Number, O: Operation> ValueMap<AU, T, O> { | |
let new_tracker = AU::new_atomic_tracker(self.buckets_count); | ||
O::update_tracker(&new_tracker, measurement, index); | ||
trackers.insert(STREAM_OVERFLOW_ATTRIBUTES.clone(), Arc::new(new_tracker)); | ||
global::handle_error(MetricsError::Other("Warning: Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged.".into())); | ||
otel_warn!( name: "ValueMap.measure", | ||
message = "Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged." | ||
//TODO - include name of meter, instrument | ||
otel_warn!( name: "MetricCardinalityLimitReached", | ||
message = format!("{}", "Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric will not be logged until next collect."), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the need for |
||
cardinality_limit = cardinality_limit() as u64, | ||
); | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -4,8 +4,8 @@ use std::{ | |||||
}; | ||||||
|
||||||
use opentelemetry::{ | ||||||
global, | ||||||
metrics::{MetricsError, Result}, | ||||||
otel_debug, | ||||||
}; | ||||||
|
||||||
use super::{ | ||||||
|
@@ -76,10 +76,11 @@ impl MetricReader for ManualReader { | |||||
// Only register once. If producer is already set, do nothing. | ||||||
if inner.sdk_producer.is_none() { | ||||||
inner.sdk_producer = Some(pipeline); | ||||||
} else { | ||||||
global::handle_error(MetricsError::Config( | ||||||
"duplicate reader registration, did not register manual reader".into(), | ||||||
)) | ||||||
} else { | ||||||
otel_debug!( | ||||||
name: "ManualReader.RegisterPipeline.DuplicateRegistration", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better to leave out the implementation details from the event names. We might or might not change this method name or we might rename pipeline to something else in the future.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should come up with some convention all these events would follow. For example, something like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are already adding crate name in the macro. Trying to follow the naming as second lvel could be struct / method in this module. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Internal modules could still be moved around or refactored so I think we should rely on: |
||||||
error = "The pipeline is already registered to the Reader. Registering pipeline multiple times is not allowed." | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we use |
||||||
); | ||||||
} | ||||||
}); | ||||||
} | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the inner workings enough to give a strong opinion - but unless this is a auto recoverable error, this can flood the error log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As what I can understand this part of code, this error occurs with restrictive
max_size
configuration, while the application is recording measurements with values that are far apart than what allowed bymax_size
. And error would be logged whenever the faulty measurement is recorded. If these faulty measurements are not frequent, the error log won't be flooded, else it can. Again, either some kind of throttling or simply flag to log only once need to be added. Let me know what you suggest, else I can keep TODO to revisit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we are 100% sure this cannot cause flooding of logs, lets remove the log from here, and leave a TODO to add logging once we understand more.