Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global error handler cleanup - Metrics SDK #2185

Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
704b848
initial commit
lalitb Oct 8, 2024
b8cb6af
change name for exponential histogram
lalitb Oct 8, 2024
a0b6eee
Merge branch 'main' into global-error-handler-cleanup-metrics-sdk
lalitb Oct 9, 2024
acf97fa
rever changes
lalitb Oct 9, 2024
7b48f14
Update opentelemetry-sdk/src/metrics/internal/mod.rs
lalitb Oct 9, 2024
ac61b79
convert otel_error to otel_warn for cardinality limit
lalitb Oct 9, 2024
a42d516
log both existing and new instrument values
lalitb Oct 9, 2024
dbaa7f5
Merge branch 'main' into global-error-handler-cleanup-metrics-sdk
lalitb Oct 9, 2024
73fca4d
build error
lalitb Oct 9, 2024
ee5c5f5
Merge branch 'global-error-handler-cleanup-metrics-sdk' of github.com…
lalitb Oct 9, 2024
3aa97cf
remove formatting in wrapper macros
lalitb Oct 10, 2024
de54afe
Merge branch 'main' into global-error-handler-cleanup-metrics-sdk
lalitb Oct 23, 2024
e62de04
Merge branch 'main' into global-error-handler-cleanup-metrics-sdk
lalitb Oct 23, 2024
bda9faa
review comments
lalitb Oct 23, 2024
4a34922
resolve conflict
lalitb Oct 23, 2024
0087c24
lint error
lalitb Oct 23, 2024
115e73f
add comment for exponential histogram
lalitb Oct 23, 2024
56682a4
fix
lalitb Oct 23, 2024
7e48cbf
Update opentelemetry-sdk/src/metrics/internal/mod.rs
lalitb Oct 23, 2024
d81b374
use message for otel_warn
lalitb Oct 23, 2024
4b09c92
merge conflict
lalitb Oct 23, 2024
949930e
Update opentelemetry-sdk/src/metrics/meter.rs
lalitb Oct 23, 2024
6dcc9eb
Update opentelemetry-sdk/src/metrics/internal/mod.rs
lalitb Oct 23, 2024
4b42fe8
Update opentelemetry-sdk/src/metrics/meter.rs
lalitb Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions opentelemetry-sdk/src/metrics/internal/exponential_histogram.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use std::{collections::HashMap, f64::consts::LOG2_E, sync::Mutex, time::SystemTime};

use once_cell::sync::Lazy;
use opentelemetry::{metrics::MetricsError, KeyValue};
use opentelemetry::{otel_error, KeyValue};

use crate::{
metrics::data::{self, Aggregation, Temporality},
Expand Down Expand Up @@ -100,9 +100,15 @@
if (self.scale - scale_delta as i8) < EXPO_MIN_SCALE {
// With a scale of -10 there is only two buckets for the whole range of f64 values.
// This can only happen if there is a max size of 1.
opentelemetry::global::handle_error(MetricsError::Other(
"exponential histogram scale underflow".into(),
));
otel_error!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the inner workings enough to give a strong opinion - but unless this is a auto recoverable error, this can flood the error log.

Copy link
Member Author

@lalitb lalitb Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As what I can understand this part of code, this error occurs with restrictive max_size configuration, while the application is recording measurements with values that are far apart than what allowed by max_size. And error would be logged whenever the faulty measurement is recorded. If these faulty measurements are not frequent, the error log won't be flooded, else it can. Again, either some kind of throttling or simply flag to log only once need to be added. Let me know what you suggest, else I can keep TODO to revisit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we are 100% sure this cannot cause flooding of logs, lets remove the log from here, and leave a TODO to add logging once we understand more.

name: "ExponentialHistogramDataPoint.Scale.Underflow",
current_scale = self.scale,
scale_delta = scale_delta,
max_size = self.max_size,
min_scale = EXPO_MIN_SCALE,
record_min_max = self.record_min_max,
record_sum = self.record_sum
);

Check warning on line 111 in opentelemetry-sdk/src/metrics/internal/exponential_histogram.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/internal/exponential_histogram.rs#L103-L111

Added lines #L103 - L111 were not covered by tests
return;
}
// Downscale
Expand Down
7 changes: 3 additions & 4 deletions opentelemetry-sdk/src/metrics/internal/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
pub(crate) use exponential_histogram::{EXPO_MAX_SCALE, EXPO_MIN_SCALE};
use once_cell::sync::Lazy;
use opentelemetry::metrics::MetricsError;
use opentelemetry::{global, otel_warn, KeyValue};
use opentelemetry::{otel_error, KeyValue};

use crate::metrics::AttributeSet;

Expand Down Expand Up @@ -146,9 +146,8 @@
let new_tracker = AU::new_atomic_tracker(self.buckets_count);
O::update_tracker(&new_tracker, measurement, index);
trackers.insert(STREAM_OVERFLOW_ATTRIBUTES.clone(), Arc::new(new_tracker));
global::handle_error(MetricsError::Other("Warning: Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged.".into()));
otel_warn!( name: "ValueMap.measure",
message = "Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged."
otel_error!( name: "ValueMap.Measure.Overflow",
lalitb marked this conversation as resolved.
Show resolved Hide resolved
error = MetricsError::Other("Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged.".into())

Check warning on line 150 in opentelemetry-sdk/src/metrics/internal/mod.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/internal/mod.rs#L150

Added line #L150 was not covered by tests
);
}
}
Expand Down
6 changes: 2 additions & 4 deletions opentelemetry-sdk/src/metrics/manual_reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
};

use opentelemetry::{
global,
metrics::{MetricsError, Result},
otel_error,
};

use super::{
Expand Down Expand Up @@ -84,9 +84,7 @@
if inner.sdk_producer.is_none() {
inner.sdk_producer = Some(pipeline);
} else {
global::handle_error(MetricsError::Config(
"duplicate reader registration, did not register manual reader".into(),
))
otel_error!(name: "ManualReader.RegisterPipeline.DuplicateRegistration");

Check warning on line 87 in opentelemetry-sdk/src/metrics/manual_reader.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/manual_reader.rs#L87

Added line #L87 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info/debug only. Even if a user gets this message, they won't know what to do. Its helpful to us only.

}
});
}
Expand Down
27 changes: 15 additions & 12 deletions opentelemetry-sdk/src/metrics/meter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
use std::{borrow::Cow, sync::Arc};

use opentelemetry::{
global,
metrics::{
noop::{NoopAsyncInstrument, NoopSyncInstrument},
AsyncInstrumentBuilder, Counter, Gauge, Histogram, HistogramBuilder, InstrumentBuilder,
InstrumentProvider, MetricsError, ObservableCounter, ObservableGauge,
ObservableUpDownCounter, Result, UpDownCounter,
},
otel_error,
};

use crate::instrumentation::Scope;
Expand Down Expand Up @@ -74,7 +74,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateCounter.ValidationError", error = err);
return Ok(Counter::new(Arc::new(NoopSyncInstrument::new())));
}

Expand All @@ -90,7 +90,7 @@
{
Ok(counter) => Ok(counter),
Err(err) => {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateCounter.Error", error = err);

Check warning on line 93 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L93

Added line #L93 was not covered by tests
Ok(Counter::new(Arc::new(NoopSyncInstrument::new())))
}
}
Expand All @@ -106,7 +106,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateObservableCounter.ValidationError", error = err);
return Ok(ObservableCounter::new(Arc::new(NoopAsyncInstrument::new())));
}

Expand All @@ -119,6 +119,7 @@
)?;

if ms.is_empty() {
otel_error!(name: "SdkMeter.CreateObservableCounter.Error", error = MetricsError::Other("no measures found".into()));

Check warning on line 122 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L122

Added line #L122 was not covered by tests
return Ok(ObservableCounter::new(Arc::new(NoopAsyncInstrument::new())));
}

Expand All @@ -143,7 +144,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateObservableUpDownCounter.ValidationError", error = err);
return Ok(ObservableUpDownCounter::new(Arc::new(
NoopAsyncInstrument::new(),
)));
Expand All @@ -158,6 +159,7 @@
)?;

if ms.is_empty() {
otel_error!(name: "SdkMeter.CreateObservableUpDownCounter.Error", error = MetricsError::Other("no measures found".into()));

Check warning on line 162 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L162

Added line #L162 was not covered by tests
return Ok(ObservableUpDownCounter::new(Arc::new(
NoopAsyncInstrument::new(),
)));
Expand All @@ -184,7 +186,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateObservableGauge.ValidationError", error = err);
return Ok(ObservableGauge::new(Arc::new(NoopAsyncInstrument::new())));
}

Expand All @@ -197,6 +199,7 @@
)?;

if ms.is_empty() {
otel_error!(name: "SdkMeter.CreateObservableGauge.Error", error = MetricsError::Other("no measures found".into()));

Check warning on line 202 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L202

Added line #L202 was not covered by tests
return Ok(ObservableGauge::new(Arc::new(NoopAsyncInstrument::new())));
}

Expand All @@ -221,7 +224,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateUpDownCounter.ValidationError", error = err);
return Ok(UpDownCounter::new(Arc::new(NoopSyncInstrument::new())));
}

Expand All @@ -237,7 +240,7 @@
{
Ok(updown_counter) => Ok(updown_counter),
Err(err) => {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateUpDownCounter.Error", error = err);

Check warning on line 243 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L243

Added line #L243 was not covered by tests
Ok(UpDownCounter::new(Arc::new(NoopSyncInstrument::new())))
}
}
Expand All @@ -253,7 +256,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateGauge.ValidationError", error = err);
return Ok(Gauge::new(Arc::new(NoopSyncInstrument::new())));
}

Expand All @@ -269,7 +272,7 @@
{
Ok(gauge) => Ok(gauge),
Err(err) => {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateGauge.Error", error = err);

Check warning on line 275 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L275

Added line #L275 was not covered by tests
Ok(Gauge::new(Arc::new(NoopSyncInstrument::new())))
}
}
Expand All @@ -285,7 +288,7 @@
{
let validation_result = validate_instrument_config(builder.name.as_ref(), &builder.unit);
if let Err(err) = validation_result {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateHistogram.ValidationError", error = err);
return Ok(Histogram::new(Arc::new(NoopSyncInstrument::new())));
}

Expand All @@ -301,7 +304,7 @@
{
Ok(histogram) => Ok(histogram),
Err(err) => {
global::handle_error(err);
otel_error!(name: "SdkMeter.CreateHistogram.Error", error = err);

Check warning on line 307 in opentelemetry-sdk/src/metrics/meter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter.rs#L307

Added line #L307 was not covered by tests
Ok(Histogram::new(Arc::new(NoopSyncInstrument::new())))
}
}
Expand Down
5 changes: 2 additions & 3 deletions opentelemetry-sdk/src/metrics/meter_provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,8 @@
};

use opentelemetry::{
global,
metrics::{noop::NoopMeter, Meter, MeterProvider, MetricsError, Result},
KeyValue,
otel_error, KeyValue,
};

use crate::{instrumentation::Scope, Resource};
Expand Down Expand Up @@ -137,7 +136,7 @@
// shutdown(), then we don't need to call shutdown again.
if !self.is_shutdown.load(Ordering::Relaxed) {
if let Err(err) = self.shutdown() {
global::handle_error(err);
otel_error!(name: "SdkMeterProvider.Drop.ShutdownError", error = err);

Check warning on line 139 in opentelemetry-sdk/src/metrics/meter_provider.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/meter_provider.rs#L139

Added line #L139 was not covered by tests
}
}
}
Expand Down
19 changes: 8 additions & 11 deletions opentelemetry-sdk/src/metrics/pipeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@
};

use opentelemetry::{
global,
metrics::{MetricsError, Result},
KeyValue,
otel_warn, KeyValue,
};

use crate::{
Expand Down Expand Up @@ -414,15 +413,13 @@
if existing == id {
return;
}

global::handle_error(MetricsError::Other(format!(
"duplicate metric stream definitions, names: ({} and {}), descriptions: ({} and {}), kinds: ({:?} and {:?}), units: ({:?} and {:?}), and numbers: ({} and {})",
existing.name, id.name,
existing.description, id.description,
existing.kind, id.kind,
existing.unit, id.unit,
existing.number, id.number,
)))
otel_warn!(name: "InstrumentSync.AddSync.DuplicateMetricStreamDefinitions",
cijothomas marked this conversation as resolved.
Show resolved Hide resolved
name = id.name.clone(),
description = id.description.clone(),
kind = id.kind,
unit = id.unit.clone(),
number = id.number.clone(),

Check warning on line 421 in opentelemetry-sdk/src/metrics/pipeline.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/pipeline.rs#L416-L421

Added lines #L416 - L421 were not covered by tests
);
}
}
}
Expand Down
28 changes: 17 additions & 11 deletions opentelemetry-sdk/src/metrics/view.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
use super::instrument::{Instrument, Stream};
use glob::Pattern;
use opentelemetry::{
global,
metrics::{MetricsError, Result},
otel_error,
};

fn empty_view(_inst: &Instrument) -> Option<Stream> {
Expand Down Expand Up @@ -102,19 +102,23 @@
/// ```
pub fn new_view(criteria: Instrument, mask: Stream) -> Result<Box<dyn View>> {
if criteria.is_empty() {
global::handle_error(MetricsError::Config(format!(
"no criteria provided, dropping view. mask: {mask:?}"
)));
otel_error!(name: "CreateView.ValidationError", error = MetricsError::Config(
"no criteria provided, dropping view".to_string()),

Check warning on line 106 in opentelemetry-sdk/src/metrics/view.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/view.rs#L105-L106

Added lines #L105 - L106 were not covered by tests
criteria = criteria,
mask = mask
);
return Ok(Box::new(empty_view));
}
let contains_wildcard = criteria.name.contains(['*', '?']);
let err_msg_criteria = criteria.clone();

let match_fn: Box<dyn Fn(&Instrument) -> bool + Send + Sync> = if contains_wildcard {
if mask.name != "" {
global::handle_error(MetricsError::Config(format!(
"name replacement for multiple instruments, dropping view, criteria: {criteria:?}, mask: {mask:?}"
)));
otel_error!(name: "CreateView.ValidationError", error = MetricsError::Config(
"name replacement for wildcard instrument, dropping view".to_string()),

Check warning on line 118 in opentelemetry-sdk/src/metrics/view.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/view.rs#L117-L118

Added lines #L117 - L118 were not covered by tests
criteria = criteria,
mask = mask
);
return Ok(Box::new(empty_view));
}

Expand All @@ -138,10 +142,12 @@
match ma.validate() {
Ok(_) => agg = Some(ma.clone()),
Err(err) => {
global::handle_error(MetricsError::Other(format!(
"{}, proceeding as if view did not exist. criteria: {:?}, mask: {:?}",
err, err_msg_criteria, mask
)));
otel_error!(name: "CreateView.ValidationError",
error = MetricsError::Config("invalid aggregation, dropping view".to_string()),

Check warning on line 146 in opentelemetry-sdk/src/metrics/view.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/metrics/view.rs#L146

Added line #L146 was not covered by tests
criteria = err_msg_criteria,
mask = mask,
error = err
);
return Ok(Box::new(empty_view));
}
}
Expand Down
45 changes: 43 additions & 2 deletions opentelemetry/src/global/internal_logging.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,29 @@ macro_rules! otel_warn {
{
tracing::warn!(name: $name, target: env!("CARGO_PKG_NAME"), "");
}
#[cfg(not(feature = "internal-logs"))]
cijothomas marked this conversation as resolved.
Show resolved Hide resolved
{
#[allow(unused_variables)]
{

}
}
};
(name: $name:expr, $($key:ident = $value:expr),+ $(,)?) => {
#[cfg(feature = "internal-logs")]
{
tracing::warn!(name: $name, target: env!("CARGO_PKG_NAME"), $($key = $value),+, "");
tracing::warn!(name: $name,
target: env!("CARGO_PKG_NAME"),
$($key = {
$crate::format_value!($value)
}),+,
""
) }
#[cfg(not(feature = "internal-logs"))]
{
{
let _ = ($name, $($value),+);
}
}
};
}
Expand Down Expand Up @@ -104,11 +122,34 @@ macro_rules! otel_error {
{
tracing::error!(name: $name, target: env!("CARGO_PKG_NAME"), "");
}
#[allow(unused_variables)]
{

}
};
(name: $name:expr, $($key:ident = $value:expr),+ $(,)?) => {
#[cfg(feature = "internal-logs")]
{
tracing::error!(name: $name, target: env!("CARGO_PKG_NAME"), $($key = $value),+, "");
tracing::error!(name: $name,
target: env!("CARGO_PKG_NAME"),
$($key = {
$crate::format_value!($value)
}),+,
""
) }
#[cfg(not(feature = "internal-logs"))]
{
{
let _ = ($name, $($value),+);
}
}
};
}

/// Helper macro to format a value
#[macro_export]
macro_rules! format_value {
($value:expr) => {{
format!("{:?}", $value)
}};
}
Loading