Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant shutdown in TracerProvider::drop when already shut down #2197

Merged
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
32938be
initial commit
lalitb Oct 11, 2024
01c970b
Merge branch 'main' into tracer-provider-drop-shutdown-check
cijothomas Oct 11, 2024
c08c055
user reusable shutdown
lalitb Oct 11, 2024
82d2598
Merge branch 'main' into tracer-provider-drop-shutdown-check
lalitb Oct 14, 2024
f270dcd
restructure TracerProviderInner::Drop, TracerProvider::Shutdown, and …
lalitb Oct 14, 2024
c8f2166
Merge branch 'tracer-provider-drop-shutdown-check' of github.com:lali…
lalitb Oct 14, 2024
1a3dd67
Update opentelemetry-sdk/src/trace/provider.rs
lalitb Oct 16, 2024
36012dc
Update opentelemetry-sdk/src/trace/provider.rs
lalitb Oct 16, 2024
0d7d1ab
fix unit test
lalitb Oct 17, 2024
01234c5
Merge branch 'main' into tracer-provider-drop-shutdown-check
lalitb Oct 17, 2024
2279f3a
update doc
lalitb Oct 17, 2024
293120b
Merge branch 'tracer-provider-drop-shutdown-check' of github.com:lali…
lalitb Oct 17, 2024
21fa84e
update to atomics
lalitb Oct 17, 2024
6ea4a63
Update opentelemetry-sdk/src/trace/provider.rs
lalitb Oct 18, 2024
3dbb66e
Merge branch 'main' into tracer-provider-drop-shutdown-check
lalitb Oct 18, 2024
05b8f59
review comment
lalitb Oct 22, 2024
74bf489
review comments
lalitb Oct 22, 2024
556c529
rename AlreadyShutdown to TracerProviderAlreadyShutdown
lalitb Oct 22, 2024
80c97f7
fix
lalitb Oct 22, 2024
d3a557f
review comments
lalitb Oct 22, 2024
14a42cc
add test for noop shutdown
lalitb Oct 22, 2024
2e5899b
Merge branch 'main' into tracer-provider-drop-shutdown-check
lalitb Oct 22, 2024
1272312
Merge branch 'main' into tracer-provider-drop-shutdown-check
cijothomas Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
269 changes: 233 additions & 36 deletions opentelemetry-sdk/src/trace/provider.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,67 @@
//! # Trace Provider SDK
//!
//! ## Tracer Creation
//!
//! New [`Tracer`] instances are always created through a [`TracerProvider`].
//!
//! All configuration objects and extension points (span processors,
//! propagators) are provided by the [`TracerProvider`]. [`Tracer`] instances do
//! not duplicate this data to avoid that different [`Tracer`] instances
//! of the [`TracerProvider`] have different versions of these data.
/// # Trace Provider SDK
///
/// The `TracerProvider` handles the creation and management of [`Tracer`] instances and coordinates
/// span processing. It serves as the central configuration point for tracing, ensuring consistency
/// across all [`Tracer`] instances it creates.
///
/// ## Tracer Creation
///
/// New [`Tracer`] instances are always created through a `TracerProvider`. These `Tracer`s share
/// a common configuration, which includes the [`Resource`], span processors, sampling strategies,
/// and span limits. This avoids the need for each `Tracer` to maintain its own version of these
/// configurations, ensuring uniform behavior across all instances.
///
/// ## Cloning and Shutdown
///
/// The `TracerProvider` is designed to be lightweight and clonable. Cloning a `TracerProvider`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think TracerProvider is lightweight. It is pretty heavy, and we expect user to create it only once. It is correct to mention cloning is cheap as it is just creating a new ref.

Copy link
Member Author

@lalitb lalitb Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Have updated the comment to remove the lightweight reference.

/// creates a new reference to the same provider, not a new instance. Dropping the last reference
/// to the `TracerProvider` will automatically trigger its shutdown. During shutdown, the provider
/// will flush all remaining spans, ensuring they are passed to the configured processors.
/// Users can also manually trigger shutdown using the [`shutdown`](TracerProvider::shutdown)
/// method, which will ensure the same behavior.
///
/// Once shut down, the `TracerProvider` transitions into a disabled state. In this state, further
/// operations on its associated `Tracer` instances will result in no-ops, ensuring that no spans
/// are processed or exported after shutdown.
///
/// ## Span Processing and Force Flush
///
/// The `TracerProvider` manages the lifecycle of span processors, which are responsible for
/// collecting, processing, and exporting spans. The [`force_flush`](TracerProvider::force_flush) method
/// invoked at any time will trigger an immediate flush of all pending spans (if any) to the exporters.
/// This will block the user thread till all the spans are passed to exporters.
///
/// # Examples
///
/// ```
/// use opentelemetry::global;
/// use opentelemetry_sdk::trace::TracerProvider;
/// use opentelemetry::trace::Tracer;
///
/// fn init_tracing() -> TracerProvider {
/// let provider = TracerProvider::default();
///
/// // Set the provider to be used globally
/// let _ = global::set_tracer_provider(provider.clone());
///
/// provider
/// }
///
/// fn main() {
/// let provider = init_tracing();
///
/// // create tracer..
/// let tracer = global::tracer("example/client");
///
/// // create span...
/// let span = tracer
/// .span_builder("test_span")
/// .start(&tracer);
///
/// // Explicitly shut down the provider
/// provider.shutdown();
/// }
/// ```
use crate::runtime::RuntimeChannel;
use crate::trace::{
BatchSpanProcessor, Config, RandomIdGenerator, Sampler, SimpleSpanProcessor, SpanLimits, Tracer,
Expand All @@ -16,7 +70,7 @@
use crate::{InstrumentationLibrary, Resource};
use once_cell::sync::{Lazy, OnceCell};
use opentelemetry::trace::TraceError;
use opentelemetry::{global, trace::TraceResult};
use opentelemetry::{otel_debug, trace::TraceResult};
use std::borrow::Cow;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
Expand All @@ -36,36 +90,60 @@
span_limits: SpanLimits::default(),
resource: Cow::Owned(Resource::empty()),
},
is_shutdown: AtomicBool::new(true),
Copy link
Contributor

@utpilla utpilla Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not added in this PR but why have we initialized the no-op tracer provider as an already shut down provider?

I know this wouldn't make much difference in functionality but semantically it would be weird if I call shutdown on the global provider and get an error saying it has already been shut down.

Copy link
Member Author

@lalitb lalitb Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

global shutdown_tracer_provider naming is bit confusing. It doesn't invoke shutdown, only decrements the TracerProviderInner reference. Once all the tracers are dropped, the shutdown will get invoked through Drop. Probably release_tracer_provider is better name :)

And the user will never get hold of NOOP_TRACER_PROVIDER instance, or can invoke shutdown on it directly or indirectly. It's just internally used to create the tracer instance after shutdown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so then we could initialize no-op tracer provider with is_shutdown as false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so then we could initialize no-op tracer provider with is_shutdown as false?

Making it false fails this test -


And this test make me realise user can access it through tracer.provider(), and can try to invoke shutdown on it. But feel free to raise an issue if there are any improvements here.

}),
is_shutdown: Arc::new(AtomicBool::new(true)),
});

/// TracerProvider inner type
#[derive(Debug)]
pub(crate) struct TracerProviderInner {
processors: Vec<Box<dyn SpanProcessor>>,
config: crate::trace::Config,
is_shutdown: AtomicBool,
}

impl Drop for TracerProviderInner {
fn drop(&mut self) {
for processor in &mut self.processors {
impl TracerProviderInner {
/// Crate-private shutdown method to be called both from explicit shutdown
/// and from Drop when the last reference is released.
pub(crate) fn shutdown(&self) -> Vec<TraceError> {
let mut errs = vec![];
for processor in &self.processors {
if let Err(err) = processor.shutdown() {
global::handle_error(err);
// Log at debug level because:
// - The error is also returned to the user for handling (if applicable)
// - Or the error occurs during `TracerProviderInner::Drop` as part of telemetry shutdown,
// which is non-actionable by the user
otel_debug!(name: "TracerProvider.Drop.ShutdownError",
error = format!("{err}"));

Check warning on line 117 in opentelemetry-sdk/src/trace/provider.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/trace/provider.rs#L117

Added line #L117 was not covered by tests
errs.push(err);
}
}
errs
}
}

impl Drop for TracerProviderInner {
fn drop(&mut self) {
if !self.is_shutdown.load(Ordering::Relaxed) {
let _ = self.shutdown(); // errors are handled within shutdown
} else {
otel_debug!(
name: "TracerProvider.Drop.AlreadyShutdown"
);
}
}
}

/// Creator and registry of named [`Tracer`] instances.
///
/// `TracerProvider` is lightweight container holding pointers to `SpanProcessor` and other components.
/// Cloning and dropping them will not stop the span processing. To stop span processing, users
/// must either call `shutdown` method explicitly, or drop every clone of `TracerProvider`.
/// `TracerProvider` is a lightweight container holding pointers to `SpanProcessor` and other components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not introduced in this PR, but advertising TracerProvider as lightweight is incorrect, and can lead to users repeatedly creating them, instead of doing it once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the comment to remove the lightweight reference.

/// Cloning a `TracerProvider` instance and dropping it will not stop span processing. To stop span processing, users
/// must either call the `shutdown` method explicitly or allow the last reference to the `TracerProvider`
/// to be dropped. When the last reference is dropped, the shutdown process will be automatically triggered
/// to ensure proper cleanup.
#[derive(Clone, Debug)]
pub struct TracerProvider {
inner: Arc<TracerProviderInner>,
is_shutdown: Arc<AtomicBool>,
}

impl Default for TracerProvider {
Expand All @@ -79,7 +157,6 @@
pub(crate) fn new(inner: TracerProviderInner) -> Self {
TracerProvider {
inner: Arc::new(inner),
is_shutdown: Arc::new(AtomicBool::new(false)),
}
}

Expand All @@ -101,7 +178,7 @@
/// true if the provider has been shutdown
/// Don't start span or export spans when provider is shutdown
pub(crate) fn is_shutdown(&self) -> bool {
self.is_shutdown.load(Ordering::Relaxed)
self.inner.is_shutdown.load(Ordering::Relaxed)
}

/// Force flush all remaining spans in span processors and return results.
Expand Down Expand Up @@ -153,28 +230,20 @@
/// Note that shut down doesn't means the TracerProvider has dropped
pub fn shutdown(&self) -> TraceResult<()> {
if self
.inner
.is_shutdown
.compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst)
.is_ok()
{
// propagate the shutdown signal to processors
// it's up to the processor to properly block new spans after shutdown
let mut errs = vec![];
for processor in &self.inner.processors {
if let Err(err) = processor.shutdown() {
errs.push(err);
}
}

let errs = self.inner.shutdown();
if errs.is_empty() {
Ok(())
} else {
Err(TraceError::Other(format!("{errs:?}").into()))
}
} else {
Err(TraceError::Other(
"tracer provider already shut down".into(),
))
Err(TraceError::AlreadyShutdown)
}
}
}
Expand Down Expand Up @@ -215,7 +284,7 @@
}

fn library_tracer(&self, library: Arc<InstrumentationLibrary>) -> Self::Tracer {
if self.is_shutdown.load(Ordering::Relaxed) {
if self.inner.is_shutdown.load(Ordering::Relaxed) {
return Tracer::new(library, NOOP_TRACER_PROVIDER.clone());
}
Tracer::new(library, self.clone())
Expand Down Expand Up @@ -292,7 +361,12 @@
p.set_resource(config.resource.as_ref());
}

TracerProvider::new(TracerProviderInner { processors, config })
let is_shutdown = AtomicBool::new(false);
TracerProvider::new(TracerProviderInner {
processors,
config,
is_shutdown,
})
}
}

Expand Down Expand Up @@ -391,6 +465,7 @@
Box::from(TestSpanProcessor::new(false)),
],
config: Default::default(),
is_shutdown: AtomicBool::new(false),
});

let results = tracer_provider.force_flush();
Expand Down Expand Up @@ -534,6 +609,7 @@
let tracer_provider = super::TracerProvider::new(TracerProviderInner {
processors: vec![Box::from(processor)],
config: Default::default(),
is_shutdown: AtomicBool::new(false),
});

let test_tracer_1 = tracer_provider.tracer("test1");
Expand All @@ -554,14 +630,135 @@

// after shutdown we should get noop tracer
let noop_tracer = tracer_provider.tracer("noop");

// noop tracer cannot start anything
let _ = noop_tracer.start("test");
assert!(assert_handle.started_span_count(2));
// noop tracer's tracer provider should be shutdown
assert!(noop_tracer.provider().is_shutdown.load(Ordering::SeqCst));
assert!(noop_tracer.provider().is_shutdown());

// existing tracer becomes noops after shutdown
let _ = test_tracer_1.start("test");
assert!(assert_handle.started_span_count(2));
}

#[derive(Debug)]
struct CountingShutdownProcessor {
shutdown_count: Arc<AtomicU32>,
flush_called: Arc<AtomicBool>,
utpilla marked this conversation as resolved.
Show resolved Hide resolved
}

impl CountingShutdownProcessor {
fn new(shutdown_count: Arc<AtomicU32>, flush_called: Arc<AtomicBool>) -> Self {
CountingShutdownProcessor {
shutdown_count,
flush_called,
}
}
}

impl SpanProcessor for CountingShutdownProcessor {
fn on_start(&self, _span: &mut Span, _cx: &Context) {
// No operation needed for this processor
}

fn on_end(&self, _span: SpanData) {
// No operation needed for this processor
}

fn force_flush(&self) -> TraceResult<()> {
self.flush_called.store(true, Ordering::SeqCst);
Ok(())
}

Check warning on line 672 in opentelemetry-sdk/src/trace/provider.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/trace/provider.rs#L669-L672

Added lines #L669 - L672 were not covered by tests

fn shutdown(&self) -> TraceResult<()> {
self.shutdown_count.fetch_add(1, Ordering::SeqCst);
Ok(())
}
}

#[test]
fn drop_test_with_multiple_providers() {
let shutdown_count = Arc::new(AtomicU32::new(0));
let flush_called = Arc::new(AtomicBool::new(false));
utpilla marked this conversation as resolved.
Show resolved Hide resolved

{
// Create a shared TracerProviderInner and use it across multiple providers
let shared_inner = Arc::new(TracerProviderInner {
processors: vec![Box::new(CountingShutdownProcessor::new(
shutdown_count.clone(),
flush_called.clone(),
))],
config: Config::default(),
is_shutdown: AtomicBool::new(false),
});

{
let tracer_provider1 = super::TracerProvider {
inner: shared_inner.clone(),
};
let tracer_provider2 = super::TracerProvider {
inner: shared_inner.clone(),
};

let tracer1 = tracer_provider1.tracer("test-tracer1");
let tracer2 = tracer_provider2.tracer("test-tracer2");

let _span1 = tracer1.start("span1");
let _span2 = tracer2.start("span2");

// TracerProviderInner should not be dropped yet, since both providers and `shared_inner`
// are still holding a reference.
}
// At this point, both `tracer_provider1` and `tracer_provider2` are dropped,
// but `shared_inner` still holds a reference, so `TracerProviderInner` is NOT dropped yet.
assert_eq!(shutdown_count.load(Ordering::SeqCst), 0);
}
// Verify shutdown was called during the drop of the shared TracerProviderInner
assert_eq!(shutdown_count.load(Ordering::SeqCst), 1);
// Verify flush was not called during drop
assert!(!flush_called.load(Ordering::SeqCst));
}

#[test]
fn drop_after_shutdown_test_with_multiple_providers() {
let shutdown_count = Arc::new(AtomicU32::new(0));
let flush_called = Arc::new(AtomicBool::new(false));

// Create a shared TracerProviderInner and use it across multiple providers
let shared_inner = Arc::new(TracerProviderInner {
processors: vec![Box::new(CountingShutdownProcessor::new(
shutdown_count.clone(),
flush_called.clone(),
))],
config: Config::default(),
is_shutdown: AtomicBool::new(false),
});

// Create a scope to test behavior when providers are dropped
{
let tracer_provider1 = super::TracerProvider {
inner: shared_inner.clone(),
};
let tracer_provider2 = super::TracerProvider {
inner: shared_inner.clone(),
};

// Explicitly shut down the tracer provider
let shutdown_result = tracer_provider1.shutdown();
assert!(shutdown_result.is_ok());

// Verify that shutdown was called exactly once
assert_eq!(shutdown_count.load(Ordering::SeqCst), 1);

utpilla marked this conversation as resolved.
Show resolved Hide resolved
// TracerProvider2 should observe the shutdown state but not trigger another shutdown
let shutdown_result2 = tracer_provider2.shutdown();
assert!(shutdown_result2.is_err());

utpilla marked this conversation as resolved.
Show resolved Hide resolved
// Both tracer providers will be dropped at the end of this scope
}

// Verify that shutdown was only called once, even after drop
assert_eq!(shutdown_count.load(Ordering::SeqCst), 1);
utpilla marked this conversation as resolved.
Show resolved Hide resolved
}
}
4 changes: 4 additions & 0 deletions opentelemetry/src/trace/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,10 @@ pub enum TraceError {
#[error("Exporting timed out after {} seconds", .0.as_secs())]
ExportTimedOut(time::Duration),

/// already shutdown error
#[error("TracerProvider already shutdown")]
AlreadyShutdown,
utpilla marked this conversation as resolved.
Show resolved Hide resolved

/// Other errors propagated from trace SDK that weren't covered above
#[error(transparent)]
Other(#[from] Box<dyn std::error::Error + Send + Sync + 'static>),
Expand Down
Loading