Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEBUG-3182 Rework DI loading #4239

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

DEBUG-3182 Rework DI loading #4239

wants to merge 14 commits into from

Conversation

p-datadog
Copy link
Member

What does this PR do?

This PR separates the part of DI that is required for code tracker (which is the code tracker itself and the DI.current_component group of methods, plus exception classes) to be self-contained. Currently in master loading datadog/di/init would bring in datadog/di as well which is too much (for example, that would cause DI to attempt to load ActiveRecord contrib, which is definitely not the right time for it).

Motivation:

Ensuring DI code tracker is loaded early without causing the rest of dd-trace-rb to be loaded early.

Change log entry

Yes: Improved early loading mechanism of dynamic instrumentation (datadog/di/init).

Additional Notes:

While this PR changes the behavior of datadog/di/init, this behavior has not yet been documented anywhere and should not be used by customers.

How to test the change?

Integration tests specifically for datadog/di/init will be added in a follow-up PR. The existing unit tests verify that basic DI functionality continues to operate correctly.

@p-datadog p-datadog marked this pull request as ready for review December 19, 2024 14:52
@p-datadog p-datadog requested a review from a team as a code owner December 19, 2024 14:52
@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Dec 19, 2024

Datadog Report

Branch report: di-loading
Commit report: a9f892f
Test service: dd-trace-rb

✅ 0 Failed, 22117 Passed, 1476 Skipped, 5m 7.35s Total Time

@pr-commenter
Copy link

pr-commenter bot commented Dec 19, 2024

Benchmarks

Benchmark execution time: 2025-01-03 21:34:39

Comparing candidate commit a9f892f in PR branch di-loading with baseline commit f3e66ea in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 31 metrics, 2 unstable metrics.

Comment on lines +80 to +113
# DI code tracker is instantiated globally before the regular set of
# components is created, but the code tracker needs to call out to the
# "current" DI component to perform instrumentation when application
# code is loaded. Because this call may happen prior to Datadog
# components having been initialized, we maintain the "current component"
# which contains a reference to the most recently instantiated
# DI::Component. This way, if a DI component hasn't been instantiated,
# we do not try to reference Datadog.components.
# In other words, this method exists so that we never attempt to call
# Datadog.components from the code tracker.
def current_component
LOCK.synchronize do
@current_components&.last
end
end

# To avoid potential races with DI::Component being added and removed,
# we maintain a list of the components. Normally the list should contain
# either zero or one component depending on whether DI is enabled in
# Datadog configuration. However, if a new instance of DI::Component
# is created while the previous instance is still running, we are
# guaranteed to not end up with no component when one is running.
def add_current_component(component)
LOCK.synchronize do
@current_components ||= []
@current_components << component
end
end

def remove_current_component(component)
LOCK.synchronize do
@current_components&.delete(component)
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your use-case well, you can maybe replace this with something along the lines of

        components = Datadog.send(:components, allow_initialization: false)
        di = components.di if components

This allows you to access the components without triggering initialization. This is something we use in for instance Datadog::Tracing.correlation to allow the API to be used, but not cause initialization as a side-effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check defined?(Datadog.components) before proceeding as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined? check is supposed to be thread-safe

@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2024

Codecov Report

Attention: Patch coverage is 79.24528% with 11 lines in your changes missing coverage. Please review.

Project coverage is 97.76%. Comparing base (6fc4c8e) to head (a9f892f).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
lib/datadog/di/base.rb 71.42% 8 Missing ⚠️
spec/datadog/di/init_spec.rb 87.50% 2 Missing ⚠️
lib/datadog/di/code_tracker.rb 50.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4239   +/-   ##
=======================================
  Coverage   97.75%   97.76%           
=======================================
  Files        1353     1355    +2     
  Lines       82392    82418   +26     
  Branches     4224     4228    +4     
=======================================
+ Hits        80541    80573   +32     
+ Misses       1851     1845    -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ivoanjo
Copy link
Member

ivoanjo commented Jan 6, 2025

While this PR changes the behavior of datadog/di/init, this behavior has not yet been documented anywhere and should not be used by customers.

If I can offer some advice on this, is as much as possible don't expose any API that's expected to be used directly by customers in DI. Specifically, since DI is mostly about "turn it on, then go to the datadog UX to clicky clicky", it's worth taking advantage of this by not allowing too much configuration/fiddling/and manual operations.

This is somewhat similar to profiling: it's supported to be -- turn it on, then go look at the results. There's one or two public APIs but very very little surface, which has made it much easier to move fast and improve things without needing to walk on egg shells (or needing to put extra work or add extra overhead to comply with some specific public APIs)

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few notes!

Comment on lines +13 to +17
module Datadog
# Namespace for Datadog dynamic instrumentation.
#
# @api private
module DI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider actually putting this under DI::Base or something like that? Specifically, I always find it mega confusing if I'm looking at datadog/foo and I open datadog/foo.rb and I'm missing code -- and I might not ever realize it.

I think as much as possible, keeping the 1:1 between namespace and file helps a lot of not getting lost in the codebase.

Comment on lines +23 to +28
# Activates code tracking. Normally this method should be called
# when the application starts. If instrumenting third-party code,
# code tracking needs to be enabled before the third-party libraries
# are loaded. If you definitely will not be instrumenting
# third-party libraries, activating tracking after third-party libraries
# have been loaded may improve lookup performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you definitely will not be instrumenting third-party libraries...

This seems like like a potential support very sharp edge? E.g. person A onboards app X, then perhaps even leaves the company, then person B tries to use DI with X, and has a bad experience.

I would advise being careful about adding these kinds of sharp edges, at least until you have a very specific customer that really really wants this, and in that case you could still document it in a more "this is probably not something you want to ever enable, as it will break etc etc" kind of scary way.

# We do not have Datadog logger potentially because DI code tracker is
# being loaded early in application boot process and the rest of datadog
# wasn't loaded yet. Output to standard error.
warn("Failed to activate code tracking for DI: #{exc.class}: #{exc}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since in this mode there'll be no extra info in the log line, consider adding a "[DATADOG]" prefix or something like that. As a team member on some service, I might not know "what DI" is and that datadog is instrumenting my app, so it's useful to have something to quickly identify where this message is coming from.

Comment on lines +80 to +94
# DI code tracker is instantiated globally before the regular set of
# components is created, but the code tracker needs to call out to the
# "current" DI component to perform instrumentation when application
# code is loaded. Because this call may happen prior to Datadog
# components having been initialized, we maintain the "current component"
# which contains a reference to the most recently instantiated
# DI::Component. This way, if a DI component hasn't been instantiated,
# we do not try to reference Datadog.components.
# In other words, this method exists so that we never attempt to call
# Datadog.components from the code tracker.
def current_component
LOCK.synchronize do
@current_components&.last
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my note below on how this can be replaced with a simpler approach :)

@ivoanjo ivoanjo added this to the 2.9.0 milestone Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants