Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate BatchExporterProcessor to async #5838

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

verdie-g
Copy link

Fixes #5778

Changes

This makes BatchExporterProcessor async to avoid having dedicated threads that are idle 99% of the time.

I first tried to make the implementation close to the original synchronous implementation where only the ExportProc method was allowed to call BaseExporter.ExportAsync but it was a unreviewable deadlock fest. This proposed implementation allows any method (Export, Flush, Shutdown) to also call ExportAsync but always limit a single concurrent export. This greatly simplifies the code and only took me 30 minutes to implement.

It is very important that we first discuss about how BaseExporter.ExportAsync will be implemented on all available exporters before merging this change, otherwise, this is going to push synchronous job on the thread pool which will make the situation worse than before.

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@github-actions github-actions bot added the pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package label Sep 16, 2024
@cijothomas
Copy link
Member

Can you describe the overall idea on how this will be implemented? This a core functionality, and there cannot be any breaking changes at all. It is intentional that the current design spins up own thread and not rely on async tasks/thread-pool.
And also the benefits of this change.

@verdie-g
Copy link
Author

verdie-g commented Sep 17, 2024

The exporter loop is kept but is now running on the thread pool using a simple Task.Run. The Export, Flush, and Shutdown operations were allowed to speed up the exporting by disabling the delay between two exports. In this change, it's done differently to simplify the code. The 3 operations are allowed to call the BaseExporter.ExportAsync method but only one concurrent export is allowed. BaseExporter.ExportAsync is a new method that by default does async-over-sync by calling Export just to avoid a breaking change. Though, it is important that this method gets overridden by the different implementations (zipkin, OTEL) to avoid pushing synchronous operations on the thread pool.

The idea of a thread pool is to avoid the cost of the creation/deletion of a thread as well the cost of the resources associated to the thread. But most importantly, it avoids having to perform a context switch to yield the execution to another task. In an ideal application, the thread pool has as many thread as CPU cores, and each thread never leaves its core. Each context switch will impact the throughput of the thread pool. If there is only a few extra threads used, it's fine, but if all libraries start spawning dedicated threads for their background jobs that can be an issue. Here, 3 threads are spawned, and maybe a 4th one will be added for the profiling signal. I won't have any figures to share but the OTEL library could become a better citizen in the .NET ecosystem by using the thread pool.

It is also important to note that currently, only on a part of the work done by the export is performed on the dedicated thread. When HttpClient is used, or any other network class, the thread pool will still be used.

Additionally, this will ease having an asynchronous flush method in the future, which I believe a user request will appear some day about that as you never want to block the thread pool and currently it does.

BONUS1: we get to remove the synchronousSendSupportedByCurrentPlatform.
BONUS2: might fix #2816

/// </summary>
/// <param name="batch">Batch of telemetry objects to export.</param>
/// <returns>Result of the export operation.</returns>
public virtual Task<ExportResult> ExportAsync(Batch<T> batch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related discussion on this topic we had in OTel Rust community : open-telemetry/opentelemetry-rust#2027
OTel Rust only has async API for the exporters. This is causing perf issues when the exporter does not benefit from async API (like etw exporters which has no async need as it don't talk to disk/network etc.)

We are also considering adding both options in OTel Rust, with one calling the other automatically!
Just shared as these are similar, though in diff. languages!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation overhead for calling ExportAsync instead of Export should just be the Task allocation. AFAIK, state machines are kept on the stack while the methods return synchronously. If a method needs to yield the execution because of a blocking operation (e.g. reading a socket), then the state machine will be moved to the heap.

In this short snippet we can see that the state machine is a struct and when the async method doesn't return synchronously (i.e. in the if (!awaiter.IsCompleted) branch), AwaitUnsafeOnCompleted is called which boxes the state machine.

So I'm not very concerned about the performance impact for synchronous exporters, especially that the method should only be called every X seconds.

One thing that could be done is to use ValueTask to avoid the Task allocation. It is used for example in Stream.ReadAsync(Memory). Though ValueTask has more constraints than Task so it can be a tough decision to make. That's why I always use Task and it never was a bottleneck. More reading from our savior Stephen Toub: https://devblogs.microsoft.com/dotnet/understanding-the-whys-whats-and-whens-of-valuetask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not very concerned about the performance impact for synchronous exporters, especially that the method should only be called every X seconds.

The export method is called for every Log, every Span when using sync exporters like ETW. (they don't do any batching)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could make sense to use ValueTask in that case. Could you give me a pointer to the ETW exporter. I can't seem to find it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this exporter, it will continue to call the synchronous export so they won't be any performance change here.

/// </summary>
/// <param name="batch">Batch of telemetry objects to export.</param>
/// <returns>Result of the export operation.</returns>
public virtual Task<ExportResult> ExportAsync(Batch<T> batch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to do this (and that is a big IF), it MUST be:

public virtual Task<ExportResult> ExportAsync(Batch<T> batch, CancellationToken cancellationToken)

Because this is an opportunity to clean up a mistake 😄 That cancellationToken should be built from https://github.com/open-telemetry/opentelemetry-dotnet/blob/1c01770882bb5113ff308b2b4398347fab6e0404/src/OpenTelemetry/BatchExportProcessor.cs#L24C27-L24C54.

I would also like to see:

public virtual ValueTask<ExportResult> 

There's only 2 ExportResult values. We could have 2 cached instances of Task<ExportResult>. But I think ValueTask is better for implementations which may always execute in sync. Imagine an exporter writing to a Stream. Could be async when hooked up to like a NetworkStream. But could also always run synchronously when hooked up to a MemoryStream.

Copy link
Author

@verdie-g verdie-g Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea for the CancellationToken!

Regarding the ValueTask, given the constraint of that type, I find it a hard decision to make. I've briefly discussed it here #5838 (comment). I think I'll be able to give more info once I implemented ExportAsync for most exporters.

Copy link
Member

@CodeBlanch CodeBlanch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to have ExportAsync but I think in general this may be the wrong direction. Our telemetry pipeline is not asynchronous. It is synchronous. And the specification says it must be so. Because of that we're always going to run into somewhere having to jump from sync to async back to sync. Async only works when everything is async. The telemetry pipeline being synchronous is a good thing. We don't want to block real work being done on a thread because it tried to write some telemetry and the SDK has to wait on a busy thread pool.

Here are some questions:

  • How will SimpleExporterProcessor work with ExportAsync? Are we expecting SimpleExporterProcessor to continue calling Export or will it call ExportAsync? The later will probably be a non-starter. We can't tie up two threads for our telemetry pipelines.

  • Depending on that answer, can we see some of the existing exporters (maybe start with Otlp) updated as a demonstration? We require Export be implemented via abstract. Authors will have to code something synchronous. If we offer ExportAsync how does that help with this? I don't want to see this...

public override ExportResult Export(in Batch<T> batch)
{
    return this.ExportAsync(in batch).Result;
}

Because that will tie up two threads 👎 Or lead to an infinite loop in the event ExportAsync is not overridden 🤣 Also, how do we prevent either of these thing from happening?

What I'm trying to figure out is, if I'm an exporter author, and I have to do a synchronous version anyway, why would I bother with an asynchronous version?

@verdie-g
Copy link
Author

It is synchronous. And the specification says it must be so.

I don't think that's true. Here is a quote from the spec:

Depending on the implementation the result of the export may be returned to the Processor not in the return value of the call to Export() but in a language specific way for signaling completion of an asynchronous task.

Because of that we're always going to run into somewhere having to jump from sync to async back to sync.

I've started implementing the ExportAsync method around, and found cases where we need to synchronously wait on a Task but I don't think I found a case where a synchronous operation occurs after an asynchronous one. Actually, I'm not sure what you mean. Could you clarify that point?

The telemetry pipeline being synchronous is a good thing. We don't want to block real work being done on a thread because it tried to write some telemetry and the SDK has to wait on a busy thread pool.

When a context switch occurs to work on the OTEL thread, it will actually prevent a thread pool thread to do its job. Using the thread-pool and async exports would greatly reduce that overhead.

How will SimpleExporterProcessor work with ExportAsync? Are we expecting SimpleExporterProcessor to continue calling Export or will it call ExportAsync?

Either should be fine. If a network exporter is used with SimpleExporterProcessor, there is obviously no guarantees about the performance, so I don't think ExportAsync will change anything here.

It is also worth noting that because of the synchronous export API, some exporters need to do some sync-over-async on some platforms.

Depending on that answer, can we see some of the existing exporters (maybe start with Otlp) updated as a demonstration? We require Export be implemented via abstract. Authors will have to code something synchronous. If we offer ExportAsync how does that help with this? I don't want to see this...

For OTLP, the Export method will indeed be doing sync-over-async on the ExportAsync. But it does not matter because only ExportAsync would be called.

What I'm trying to figure out is, if I'm an exporter author, and I have to do a synchronous version anyway, why would I bother with an asynchronous version?

I think it's very similar to how the Stream API works in .NET. If we focus on the read API, by default, you only need to implement Read(byte[] buffer, int offset, int count). Other methods, such as ReadByte or ReadAsync will fallback on Read. The stream will work fine but you'll get a performance hit until you implement the other methods.

Here is a very concrete example dotutils/streamutils#2 where some benchmarks showed that this Stream was a performance bottleneck, so extra Read methods were implemented to fix that.

In the case of ExportAsync it will be the same. Though, it is to expect to have way less implementations of BaseExporter than implementations of Stream in the wild. Also, most of the BaseExporter implementations are under our control in the OTEL repositories.

Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Sep 26, 2024
@verdie-g
Copy link
Author

Here is what the implementations of BaseExport.ExportAsync could look like: verdie-g@02eb450

@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Sep 28, 2024
Copy link
Contributor

github-actions bot commented Oct 6, 2024

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Oct 6, 2024
@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Oct 7, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Oct 14, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Oct 21, 2024
@verdie-g
Copy link
Author

Kind of awkward that the PR gets automatically closed when it's waiting for a review 🤔 @CodeBlanch @cijothomas could you have a look at it again? The TL;DR is that you don't create threads in .NET. Some platforms don't even support it. And OTEL, which will be probably become the most downloaded nuget, creates 3-4 of them when it could use the thread pool.

@CodeBlanch
Copy link
Member

Sorry @verdie-g the short version is this isn't a priority right now. Sorry! That doesn't mean we don't ❤️ the passion and the contribution. We really do, we've just got a lot to get done before we release 1.10.0 stable next month. This is a big change so I'm not going to consider it for inclusion this late in the cycle. We can come back to it for the next release though.

this.exporterThread.Start();
this.exportTask = Task.CompletedTask;
this.exporterTaskCancellation = new CancellationTokenSource();
this.exporterTask = Task.Run(this.ExporterProc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't change the default behavior here. If there are scenarios where a thread pool must be leveraged, then it can be a optional, opt-in behavior - either a modification for the BatchExportProcessor (UseDedicatedThread=false) or a new BatchExportProcessorThreadPool.

Having said that, I'll not have bandwidth to do a detailed review at the moment, but I believe the first step is to get some guidance from the maintainers about which direction should be taken:

  1. Provide a opt-in feature flag to BatchExportProcessor so it can do dedicated thread or threadpool.
  2. Make a new BatchExportProcessor to do the thread pool one - either in this repo OR in the contrib repo.
  3. Something else.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I'll do some experimentations.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about keeping current behavior as-is.

@madskonradsen
Copy link

Could be amazing to have this merged at some point. A bit frustrating that OTEL doesn't even work with Blazor because of missing threading in the browser :)

@alexaka1
Copy link

@CodeBlanch Any chance this can be reopened now?

@Kielek
Copy link
Contributor

Kielek commented Nov 21, 2024

Reopen to bring @CodeBlanch attention;)

@Kielek Kielek reopened this Nov 21, 2024
@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Nov 22, 2024
@verdie-g
Copy link
Author

I've added a boolean to the BatchExportProcessor constructor so by default the behavior doesn't change. Tell me what you think and I'll add the tests.

I've first tried to create a new async implementation of BaseExportProcessor but that would cause a breaking change when inheriting from that one instead of BatchExportProcessor.

Alternatively we could also use AppContext.TryGetSwitch to have a global flag instead of a by-instance one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:OpenTelemetry Issues related to OpenTelemetry NuGet package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async BatchExportProcessor
6 participants