DEBUG-3182 Rework DI loading #4239

p-datadog · 2024-12-19T14:52:27Z

What does this PR do?

This PR separates the part of DI that is required for code tracker (which is the code tracker itself and the DI.current_component group of methods, plus exception classes) to be self-contained. Currently in master loading datadog/di/init would bring in datadog/di as well which is too much (for example, that would cause DI to attempt to load ActiveRecord contrib, which is definitely not the right time for it).

Motivation:

Ensuring DI code tracker is loaded early without causing the rest of dd-trace-rb to be loaded early.

Change log entry

Yes: Improved early loading mechanism of dynamic instrumentation (datadog/di/init).

Additional Notes:

While this PR changes the behavior of datadog/di/init, this behavior has not yet been documented anywhere and should not be used by customers.

How to test the change?

Integration tests specifically for datadog/di/init will be added in a follow-up PR. The existing unit tests verify that basic DI functionality continues to operate correctly.

lib/datadog/di/code_tracker.rb

datadog-datadog-prod-us1 · 2024-12-19T15:12:48Z

Datadog Report

Branch report: di-loading
Commit report: a9f892f
Test service: dd-trace-rb

✅ 0 Failed, 22117 Passed, 1476 Skipped, 5m 7.35s Total Time

pr-commenter · 2024-12-19T15:13:44Z

Benchmarks

Benchmark execution time: 2025-01-03 21:34:39

Comparing candidate commit a9f892f in PR branch di-loading with baseline commit f3e66ea in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 31 metrics, 2 unstable metrics.

ivoanjo · 2024-12-19T17:12:14Z

lib/datadog/di/base.rb

+      # DI code tracker is instantiated globally before the regular set of
+      # components is created, but the code tracker needs to call out to the
+      # "current" DI component to perform instrumentation when application
+      # code is loaded. Because this call may happen prior to Datadog
+      # components having been initialized, we maintain the "current component"
+      # which contains a reference to the most recently instantiated
+      # DI::Component. This way, if a DI component hasn't been instantiated,
+      # we do not try to reference Datadog.components.
+      # In other words, this method exists so that we never attempt to call
+      # Datadog.components from the code tracker.
+      def current_component
+        LOCK.synchronize do
+          @current_components&.last
+        end
+      end
+
+      # To avoid potential races with DI::Component being added and removed,
+      # we maintain a list of the components. Normally the list should contain
+      # either zero or one component depending on whether DI is enabled in
+      # Datadog configuration. However, if a new instance of DI::Component
+      # is created while the previous instance is still running, we are
+      # guaranteed to not end up with no component when one is running.
+      def add_current_component(component)
+        LOCK.synchronize do
+          @current_components ||= []
+          @current_components << component
+        end
+      end
+
+      def remove_current_component(component)
+        LOCK.synchronize do
+          @current_components&.delete(component)
+        end
+      end


If I understand your use-case well, you can maybe replace this with something along the lines of

components = Datadog.send(:components, allow_initialization: false) di = components.di if components

This allows you to access the components without triggering initialization. This is something we use in for instance Datadog::Tracing.correlation to allow the API to be used, but not cause initialization as a side-effect.

Check defined?(Datadog.components) before proceeding as above

defined? check is supposed to be thread-safe

codecov-commenter · 2024-12-24T15:40:39Z

Codecov Report

Attention: Patch coverage is 79.24528% with 11 lines in your changes missing coverage. Please review.

Project coverage is 97.76%. Comparing base (6fc4c8e) to head (a9f892f).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
lib/datadog/di/base.rb	71.42%	8 Missing ⚠️
spec/datadog/di/init_spec.rb	87.50%	2 Missing ⚠️
lib/datadog/di/code_tracker.rb	50.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #4239   +/-   ##
=======================================
  Coverage   97.75%   97.76%           
=======================================
  Files        1353     1355    +2     
  Lines       82392    82418   +26     
  Branches     4224     4228    +4     
=======================================
+ Hits        80541    80573   +32     
+ Misses       1851     1845    -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ivoanjo · 2025-01-06T14:36:53Z

While this PR changes the behavior of datadog/di/init, this behavior has not yet been documented anywhere and should not be used by customers.

If I can offer some advice on this, is as much as possible don't expose any API that's expected to be used directly by customers in DI. Specifically, since DI is mostly about "turn it on, then go to the datadog UX to clicky clicky", it's worth taking advantage of this by not allowing too much configuration/fiddling/and manual operations.

This is somewhat similar to profiling: it's supported to be -- turn it on, then go look at the results. There's one or two public APIs but very very little surface, which has made it much easier to move fast and improve things without needing to walk on egg shells (or needing to put extra work or add extra overhead to comply with some specific public APIs)

ivoanjo

Left a few notes!

ivoanjo · 2025-01-06T14:41:48Z

lib/datadog/di/base.rb

+module Datadog
+  # Namespace for Datadog dynamic instrumentation.
+  #
+  # @api private
+  module DI


Consider actually putting this under DI::Base or something like that? Specifically, I always find it mega confusing if I'm looking at datadog/foo and I open datadog/foo.rb and I'm missing code -- and I might not ever realize it.

I think as much as possible, keeping the 1:1 between namespace and file helps a lot of not getting lost in the codebase.

ivoanjo · 2025-01-06T14:48:49Z

lib/datadog/di/base.rb

+      # Activates code tracking. Normally this method should be called
+      # when the application starts. If instrumenting third-party code,
+      # code tracking needs to be enabled before the third-party libraries
+      # are loaded. If you definitely will not be instrumenting
+      # third-party libraries, activating tracking after third-party libraries
+      # have been loaded may improve lookup performance.


If you definitely will not be instrumenting third-party libraries...

This seems like like a potential support very sharp edge? E.g. person A onboards app X, then perhaps even leaves the company, then person B tries to use DI with X, and has a bad experience.

I would advise being careful about adding these kinds of sharp edges, at least until you have a very specific customer that really really wants this, and in that case you could still document it in a more "this is probably not something you want to ever enable, as it will break etc etc" kind of scary way.

ivoanjo · 2025-01-06T14:53:25Z

lib/datadog/di/base.rb

+            # We do not have Datadog logger potentially because DI code tracker is
+            # being loaded early in application boot process and the rest of datadog
+            # wasn't loaded yet. Output to standard error.
+            warn("Failed to activate code tracking for DI: #{exc.class}: #{exc}")


Minor: Since in this mode there'll be no extra info in the log line, consider adding a "[DATADOG]" prefix or something like that. As a team member on some service, I might not know "what DI" is and that datadog is instrumenting my app, so it's useful to have something to quickly identify where this message is coming from.

ivoanjo · 2025-01-06T14:54:40Z

lib/datadog/di/base.rb

+      # DI code tracker is instantiated globally before the regular set of
+      # components is created, but the code tracker needs to call out to the
+      # "current" DI component to perform instrumentation when application
+      # code is loaded. Because this call may happen prior to Datadog
+      # components having been initialized, we maintain the "current component"
+      # which contains a reference to the most recently instantiated
+      # DI::Component. This way, if a DI component hasn't been instantiated,
+      # we do not try to reference Datadog.components.
+      # In other words, this method exists so that we never attempt to call
+      # Datadog.components from the code tracker.
+      def current_component
+        LOCK.synchronize do
+          @current_components&.last
+        end
+      end


See my note below on how this can be replaced with a simpler approach :)

p added 5 commits December 18, 2024 12:07

Rework DI loading

fa252bc

add note

3579a0c

note purpose of current_component

ad35c7d

move current_component

d15e60a

current_component will always be defined

f87e9b1

p-datadog marked this pull request as ready for review December 19, 2024 14:52

p-datadog requested a review from a team as a code owner December 19, 2024 14:52

datadog-datadog-prod-us1 bot reviewed Dec 19, 2024

View reviewed changes

lib/datadog/di/code_tracker.rb Show resolved Hide resolved

standard

31d7a9b

ivoanjo reviewed Dec 19, 2024

View reviewed changes

p and others added 2 commits December 24, 2024 10:17

types

4eeb858

Merge branch 'master' into di-loading

e80940a

p and others added 6 commits December 24, 2024 11:48

integration test for DI loading

49a1728

add exit status assertions to execution spec

f62fcb2

typo fix

5b1a9e0

rubocop

4cc2d79

test that code tracker maintains mappings after datadog is loaded

d3b854c

Merge branch 'master' into di-loading

a9f892f

ivoanjo reviewed Jan 6, 2025

View reviewed changes

ivoanjo added this to the 2.9.0 milestone Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEBUG-3182 Rework DI loading #4239

DEBUG-3182 Rework DI loading #4239

p-datadog commented Dec 19, 2024

datadog-datadog-prod-us1 bot commented Dec 19, 2024 •

edited

Loading

pr-commenter bot commented Dec 19, 2024 •

edited

Loading

ivoanjo Dec 19, 2024

p-datadog Jan 6, 2025

p-datadog Jan 6, 2025

codecov-commenter commented Dec 24, 2024 •

edited

Loading

ivoanjo commented Jan 6, 2025

ivoanjo left a comment

ivoanjo Jan 6, 2025

ivoanjo Jan 6, 2025

ivoanjo Jan 6, 2025

ivoanjo Jan 6, 2025

DEBUG-3182 Rework DI loading #4239

Are you sure you want to change the base?

DEBUG-3182 Rework DI loading #4239

Conversation

p-datadog commented Dec 19, 2024

datadog-datadog-prod-us1 bot commented Dec 19, 2024 • edited Loading

Datadog Report

pr-commenter bot commented Dec 19, 2024 • edited Loading

Benchmarks

ivoanjo Dec 19, 2024

Choose a reason for hiding this comment

p-datadog Jan 6, 2025

Choose a reason for hiding this comment

p-datadog Jan 6, 2025

Choose a reason for hiding this comment

codecov-commenter commented Dec 24, 2024 • edited Loading

Codecov Report

ivoanjo commented Jan 6, 2025

ivoanjo left a comment

Choose a reason for hiding this comment

ivoanjo Jan 6, 2025

Choose a reason for hiding this comment

ivoanjo Jan 6, 2025

Choose a reason for hiding this comment

ivoanjo Jan 6, 2025

Choose a reason for hiding this comment

ivoanjo Jan 6, 2025

Choose a reason for hiding this comment

datadog-datadog-prod-us1 bot commented Dec 19, 2024 •

edited

Loading

pr-commenter bot commented Dec 19, 2024 •

edited

Loading

codecov-commenter commented Dec 24, 2024 •

edited

Loading