-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make otel scheduler sync #2262
Make otel scheduler sync #2262
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTel contract states that Start must return quickly, so I suppose sync would be fine. Do we know if the Collector does this async or sync? It'd be best to just do what it does, just to be on the safe side.
Regarding the issue with the batch processor - does it happen because something is wrong on the OTel side, or because #2027 maybe doesn't work for some edge case?
It might be good for @thampiotr to opine here, given that this PR reverts changes in #2027. |
It's done sync in the otel collector. The service starts everything sync (https://github.com/open-telemetry/opentelemetry-collector/blob/main/service/service.go#L230) and it's called sync here (https://github.com/open-telemetry/opentelemetry-collector/blob/main/otelcol/collector.go#L228)
Piotr and I had a discussion about it here: https://raintank-corp.slack.com/archives/C02GSU8SHBN/p1733846257348439 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a changelog entry? If it's a problem end users could see, then mentioning the error in the changelog could indicate to them that they shouldn't see it again.
It doesn't revert all the changes, it's reusing the approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I like it more. Thanks for fixing @wildum 🚀
* Don't start components until Run is called * Update consumers after stopping the component * Minor fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments. Do we have sufficient test coverage? Cause it's a bit hard to ensure there are no bugs by just reading 😅
internal/component/otelcol/internal/scheduler/scheduler_test.go
Outdated
Show resolved
Hide resolved
func (cs *Scheduler) Schedule(h otelcomponent.Host, cc ...otelcomponent.Component) { | ||
// Schedule() completely overrides the set of previously running components. | ||
// Components which have been removed since the last call to Schedule will be stopped. | ||
func (cs *Scheduler) Schedule(ctx context.Context, updateConsumers func(), h otelcomponent.Host, cc ...otelcomponent.Component) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document updateConsumers
?
Also I wonder if we should use the "consumer" terminology in Scheduler
... before these changes it was somewhat decoupled from what you schedule - just some component that and onPause/onResume callbacks, but now the comments and the updateConsumers
variable name talk directly about scheduling the consumers. Maybe we should go all-in and make this a ConsumerScheduler
?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I documented the updateConsumers function
It's true that the scheduler functionality is a bit extended with the fact that it calls a function that's expected to set consumers. But its primary purpose is still to schedule the otel components. I find ConsumerScheduler a bit confusing because it missing the "component" part in the name. I would prefer to keep Scheduler. I think that in the context of otel, the fact that it also expects a function to update consumers is ok
Co-authored-by: Piotr <[email protected]>
We have tests at the scheduler and at the lazyconsumer level + the batch processor "Test_Update" is really good at spotting race conditions (thanks for adding all those tests when you did the change last time btw). Changes to the components themselves are really small, I think that we are ok |
The fix in #2027 did not address all the race conditions that could happen with the scheduler. We noticed it when getting rid of our Otel fork brought back the flakiness of a batch processor test.
With this change, the scheduling of otel components is not an async operation anymore. This simplifies the logic and ensures that the components are started before they can start consuming.