-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEBUG-3182 Rework DI loading #4239
base: master
Are you sure you want to change the base?
Conversation
Datadog ReportBranch report: ✅ 0 Failed, 22117 Passed, 1476 Skipped, 5m 7.35s Total Time |
# DI code tracker is instantiated globally before the regular set of | ||
# components is created, but the code tracker needs to call out to the | ||
# "current" DI component to perform instrumentation when application | ||
# code is loaded. Because this call may happen prior to Datadog | ||
# components having been initialized, we maintain the "current component" | ||
# which contains a reference to the most recently instantiated | ||
# DI::Component. This way, if a DI component hasn't been instantiated, | ||
# we do not try to reference Datadog.components. | ||
# In other words, this method exists so that we never attempt to call | ||
# Datadog.components from the code tracker. | ||
def current_component | ||
LOCK.synchronize do | ||
@current_components&.last | ||
end | ||
end | ||
|
||
# To avoid potential races with DI::Component being added and removed, | ||
# we maintain a list of the components. Normally the list should contain | ||
# either zero or one component depending on whether DI is enabled in | ||
# Datadog configuration. However, if a new instance of DI::Component | ||
# is created while the previous instance is still running, we are | ||
# guaranteed to not end up with no component when one is running. | ||
def add_current_component(component) | ||
LOCK.synchronize do | ||
@current_components ||= [] | ||
@current_components << component | ||
end | ||
end | ||
|
||
def remove_current_component(component) | ||
LOCK.synchronize do | ||
@current_components&.delete(component) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your use-case well, you can maybe replace this with something along the lines of
components = Datadog.send(:components, allow_initialization: false)
di = components.di if components
This allows you to access the components without triggering initialization. This is something we use in for instance Datadog::Tracing.correlation
to allow the API to be used, but not cause initialization as a side-effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check defined?(Datadog.components)
before proceeding as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defined? check is supposed to be thread-safe
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4239 +/- ##
=======================================
Coverage 97.75% 97.76%
=======================================
Files 1353 1355 +2
Lines 82392 82418 +26
Branches 4224 4228 +4
=======================================
+ Hits 80541 80573 +32
+ Misses 1851 1845 -6 ☔ View full report in Codecov by Sentry. |
If I can offer some advice on this, is as much as possible don't expose any API that's expected to be used directly by customers in DI. Specifically, since DI is mostly about "turn it on, then go to the datadog UX to clicky clicky", it's worth taking advantage of this by not allowing too much configuration/fiddling/and manual operations. This is somewhat similar to profiling: it's supported to be -- turn it on, then go look at the results. There's one or two public APIs but very very little surface, which has made it much easier to move fast and improve things without needing to walk on egg shells (or needing to put extra work or add extra overhead to comply with some specific public APIs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few notes!
module Datadog | ||
# Namespace for Datadog dynamic instrumentation. | ||
# | ||
# @api private | ||
module DI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider actually putting this under DI::Base
or something like that? Specifically, I always find it mega confusing if I'm looking at datadog/foo
and I open datadog/foo.rb
and I'm missing code -- and I might not ever realize it.
I think as much as possible, keeping the 1:1 between namespace and file helps a lot of not getting lost in the codebase.
# Activates code tracking. Normally this method should be called | ||
# when the application starts. If instrumenting third-party code, | ||
# code tracking needs to be enabled before the third-party libraries | ||
# are loaded. If you definitely will not be instrumenting | ||
# third-party libraries, activating tracking after third-party libraries | ||
# have been loaded may improve lookup performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you definitely will not be instrumenting third-party libraries...
This seems like like a potential support very sharp edge? E.g. person A onboards app X, then perhaps even leaves the company, then person B tries to use DI with X, and has a bad experience.
I would advise being careful about adding these kinds of sharp edges, at least until you have a very specific customer that really really wants this, and in that case you could still document it in a more "this is probably not something you want to ever enable, as it will break etc etc" kind of scary way.
# We do not have Datadog logger potentially because DI code tracker is | ||
# being loaded early in application boot process and the rest of datadog | ||
# wasn't loaded yet. Output to standard error. | ||
warn("Failed to activate code tracking for DI: #{exc.class}: #{exc}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since in this mode there'll be no extra info in the log line, consider adding a "[DATADOG]"
prefix or something like that. As a team member on some service, I might not know "what DI" is and that datadog is instrumenting my app, so it's useful to have something to quickly identify where this message is coming from.
# DI code tracker is instantiated globally before the regular set of | ||
# components is created, but the code tracker needs to call out to the | ||
# "current" DI component to perform instrumentation when application | ||
# code is loaded. Because this call may happen prior to Datadog | ||
# components having been initialized, we maintain the "current component" | ||
# which contains a reference to the most recently instantiated | ||
# DI::Component. This way, if a DI component hasn't been instantiated, | ||
# we do not try to reference Datadog.components. | ||
# In other words, this method exists so that we never attempt to call | ||
# Datadog.components from the code tracker. | ||
def current_component | ||
LOCK.synchronize do | ||
@current_components&.last | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my note below on how this can be replaced with a simpler approach :)
What does this PR do?
This PR separates the part of DI that is required for code tracker (which is the code tracker itself and the DI.current_component group of methods, plus exception classes) to be self-contained. Currently in master loading
datadog/di/init
would bring indatadog/di
as well which is too much (for example, that would cause DI to attempt to load ActiveRecord contrib, which is definitely not the right time for it).Motivation:
Ensuring DI code tracker is loaded early without causing the rest of dd-trace-rb to be loaded early.
Change log entry
Yes: Improved early loading mechanism of dynamic instrumentation (
datadog/di/init
).Additional Notes:
While this PR changes the behavior of
datadog/di/init
, this behavior has not yet been documented anywhere and should not be used by customers.How to test the change?
Integration tests specifically for
datadog/di/init
will be added in a follow-up PR. The existing unit tests verify that basic DI functionality continues to operate correctly.