Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEBUG-3210 DI: change logging to be appropriate for customer inspection #4230

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/datadog/di/code_tracker.rb
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def start
# before release.
if component = DI.current_component # steep:ignore
raise if component.settings.dynamic_instrumentation.internal.propagate_all_exceptions
component.logger.warn("Unhandled exception in script_compiled trace point: #{exc.class}: #{exc}")
component.log_warn_internal("Unhandled exception in script_compiled trace point: #{exc.class}: #{exc}")
component.telemetry&.report(exc, description: "Unhandled exception in script_compiled trace point")
# TODO test this path
else
Expand Down
10 changes: 7 additions & 3 deletions lib/datadog/di/component.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# frozen_string_literal: true

require_relative 'logging'

module Datadog
module DI
# Component for dynamic instrumentation.
Expand All @@ -13,12 +15,14 @@ module DI
# intalled tracepoints and so on. Component will clean up all
# resources and installed tracepoints upon shutdown.
class Component
include Logging

class << self
def build(settings, agent_settings, telemetry: nil)
return unless settings.respond_to?(:dynamic_instrumentation) && settings.dynamic_instrumentation.enabled

unless settings.respond_to?(:remote) && settings.remote.enabled
Datadog.logger.debug("Dynamic Instrumentation could not be enabled because Remote Configuration Management is not available. To enable Remote Configuration, see https://docs.datadoghq.com/agent/remote_config")
Datadog.logger.warn("Dynamic Instrumentation could not be enabled because Remote Configuration Management is not available. To enable Remote Configuration, see https://docs.datadoghq.com/agent/remote_config")
return
end

Expand Down Expand Up @@ -53,12 +57,12 @@ def environment_supported?(settings)
# TODO add tests?
unless settings.dynamic_instrumentation.internal.development
if Datadog::Core::Environment::Execution.development?
Datadog.logger.debug("Not enabling dynamic instrumentation because we are in development environment")
Datadog.logger.warn("Not enabling dynamic instrumentation because we are in development environment")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since this is a customer-visible string, I'd suggest not using the "we" since it's a bit confusing. Here's a potential alternative

Suggested change
Datadog.logger.warn("Not enabling dynamic instrumentation because we are in development environment")
Datadog.logger.warn("Development environment detected; not enabling dynamic instrumentation")

return false
end
end
if RUBY_ENGINE != 'ruby' || RUBY_VERSION < '2.6'
Datadog.logger.debug("Not enabling dynamic instrumentation because of unsupported Ruby version")
Datadog.logger.warn("Not enabling dynamic instrumentation because of unsupported Ruby version")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I suggest being a bit more actionable, e.g.

Suggested change
Datadog.logger.warn("Not enabling dynamic instrumentation because of unsupported Ruby version")
Datadog.logger.warn("Cannot enable dynamic instrumentation: Ruby 2.6+ is required")

return false
end
true
Expand Down
9 changes: 9 additions & 0 deletions lib/datadog/di/configuration/settings.rb
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,15 @@ def self.add_settings!(base)
o.default false
end

# Increases logging verbosity level from 'debug' to 'warn'
# for internal DI diagnostics, i.e. log messages that
# should not be written to customers' logs in production but
# that are useful when developing DI.
option :verbose_logging do |o|
o.type :bool
o.default false
end

# Minimum interval, in seconds, between probe status and
# snapshot submissions to the agent. Probe notifier worker will
# batch together payloads submitted during each interval.
Expand Down
11 changes: 7 additions & 4 deletions lib/datadog/di/instrumenter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# rubocop:disable Lint/AssignmentInCondition

require 'benchmark'
require_relative 'logging'

module Datadog
module DI
Expand Down Expand Up @@ -54,6 +55,8 @@ module DI
#
# @api private
class Instrumenter
include Logging

def initialize(settings, serializer, logger, code_tracker: nil, telemetry: nil)
@settings = settings
@serializer = serializer
Expand Down Expand Up @@ -305,13 +308,13 @@ def hook_line(probe, &block)
end
rescue => exc
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions
logger.warn("Unhandled exception in line trace point: #{exc.class}: #{exc}")
log_warn_internal("Unhandled exception in line trace point: #{exc.class}: #{exc}")
telemetry&.report(exc, description: "Unhandled exception in line trace point")
# TODO test this path
end
rescue => exc
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions
logger.warn("Unhandled exception in line trace point: #{exc.class}: #{exc}")
log_warn_internal("Unhandled exception in line trace point: #{exc.class}: #{exc}")
telemetry&.report(exc, description: "Unhandled exception in line trace point")
Comment on lines 310 to 318
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to expose a block-based version of these methods. E.g. right now the whole string is built, even if the logger is going to throw it away.

In other parts of the codebase we often do Datadog.logger.debug do ... end or similar, and that version tends to be cheaper since debug logging is often off so it nopes out without building the string. (This is a common trick for logging libraries)

# TODO test this path
end
Expand Down Expand Up @@ -357,7 +360,7 @@ def hook(probe, &block)
hook_line(probe, &block)
else
# TODO add test coverage for this path
logger.warn("Unknown probe type to hook: #{probe}")
log_warn_internal("Unknown probe type to hook: #{probe}")
end
end

Expand All @@ -368,7 +371,7 @@ def unhook(probe)
unhook_line(probe)
else
# TODO add test coverage for this path
logger.warn("Unknown probe type to unhook: #{probe}")
log_warn_internal("Unknown probe type to unhook: #{probe}")
end
end

Expand Down
30 changes: 30 additions & 0 deletions lib/datadog/di/logging.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# frozen_string_literal: true

module Datadog
module DI
# Logging for DI
module Logging
Comment on lines +3 to +6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since every class that uses Logging ends up needing to get passed a logger object anyway, I wonder if it would make sense to turn this module into an actual class that would have a logger as an instance variable, and then you'd always call methods on this class instead of the logger directly.

This would also solve the potential (small) sharp edge of "oops I included Logging but this class doesn't already have a logger" -- since you know that if you have the object, then you have all that you need.

# Logs an internal warning message.
# When verbose logging is enabled, the logging happens at warn level.
# When verbose logging is disabled, the logging happens at debug level
# (which is how the rest of the library is reporting its internal
# warnings/errors).
def log_warn_internal(msg)
if settings.dynamic_instrumentation.internal.verbose_logging
logger.warn(msg)
else
logger.debug(msg)
end
end

# Logs an internal informational message.
# When verbose logging is enabled, the logging happens at info level.
# When verbose logging is disabled, nothing is logged.
def log_info_internal(msg)
if settings.dynamic_instrumentation.internal.verbose_logging
logger.info(msg)
end
end
Comment on lines +20 to +27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This is a bit unexpected to me; why not log it at debug level as well? E.g. it seems weird that if I enable debug logging with DD_TRACE_DEBUG=true, there's still some logging that won't show up?

end
end
end
11 changes: 7 additions & 4 deletions lib/datadog/di/probe_manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# rubocop:disable Lint/AssignmentInCondition

require 'monitor'
require_relative 'logging'

module Datadog
module DI
Expand All @@ -15,6 +16,8 @@ module DI
#
# @api private
class ProbeManager
include Logging

def initialize(settings, instrumenter, probe_notification_builder,
probe_notifier_worker, logger, telemetry: nil)
@settings = settings
Expand All @@ -32,7 +35,7 @@ def initialize(settings, instrumenter, probe_notification_builder,
install_pending_method_probes(tp.self)
rescue => exc
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions
logger.warn("Unhandled exception in definition trace point: #{exc.class}: #{exc}")
log_warn_internal("Unhandled exception in definition trace point: #{exc.class}: #{exc}")
telemetry&.report(exc, description: "Unhandled exception in definition trace point")
# TODO test this path
end
Expand Down Expand Up @@ -120,7 +123,7 @@ def add_probe(probe)
# In "propagate all exceptions" mode we will try to instrument again.
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions

logger.warn("Error processing probe configuration: #{exc.class}: #{exc}")
log_warn_internal("Error processing probe configuration: #{exc.class}: #{exc}")
telemetry&.report(exc, description: "Error processing probe configuration")
# TODO report probe as failed to agent since we won't attempt to
# install it again.
Expand Down Expand Up @@ -160,7 +163,7 @@ def remove_other_probes(probe_ids)
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions
# Silence all exceptions?
# TODO should we propagate here and rescue upstream?
logger.warn("Error removing probe #{probe.id}: #{exc.class}: #{exc}")
log_warn_internal("Error removing probe #{probe.id}: #{exc.class}: #{exc}")
telemetry&.report(exc, description: "Error removing probe")
end
end
Expand Down Expand Up @@ -190,7 +193,7 @@ def remove_other_probes(probe_ids)
rescue => exc
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions

logger.warn("Error installing probe after class is defined: #{exc.class}: #{exc}")
log_warn_internal("Error installing probe after class is defined: #{exc.class}: #{exc}")
telemetry&.report(exc, description: "Error installing probe after class is defined")
end
end
Expand Down
11 changes: 7 additions & 4 deletions lib/datadog/di/probe_notifier_worker.rb
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# frozen_string_literal: true

require_relative '../core/semaphore'
require_relative 'logging'

module Datadog
module DI
Expand All @@ -23,6 +24,8 @@ module DI
#
# @api private
class ProbeNotifierWorker
include Logging

def initialize(settings, transport, logger, telemetry: nil)
@settings = settings
@telemetry = telemetry
Expand Down Expand Up @@ -77,7 +80,7 @@ def start
rescue => exc
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions

logger.warn("Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})")
log_warn_internal("Error in probe notifier worker: #{exc.class}: #{exc} (at #{exc.backtrace.first})")
telemetry&.report(exc, description: "Error in probe notifier worker")
end
@lock.synchronize do
Expand Down Expand Up @@ -185,7 +188,7 @@ def io_in_progress?
queue = send("#{event_type}_queue")
# TODO determine a suitable limit via testing/benchmarking
if queue.length > 100
logger.warn("#{self.class.name}: dropping #{event_type} because queue is full")
log_warn_internal("#{self.class.name}: dropping #{event_type} because queue is full")
else
queue << event
end
Expand Down Expand Up @@ -242,7 +245,7 @@ def set_sleep_remaining
end
rescue => exc
raise if settings.dynamic_instrumentation.internal.propagate_all_exceptions
logger.warn("failed to send #{event_name}: #{exc.class}: #{exc} (at #{exc.backtrace.first})")
log_warn_internal("failed to send #{event_name}: #{exc.class}: #{exc} (at #{exc.backtrace.first})")
# Should we report this error to telemetry? Most likely failure
# to send is due to a network issue, and trying to send a
# telemetry message would also fail.
Expand All @@ -253,7 +256,7 @@ def set_sleep_remaining
# Normally the queue should only be consumed in this method,
# however if anyone consumes it elsewhere we don't want to block
# while consuming it here. Rescue ThreadError and return.
logger.warn("Unexpected #{event_name} queue underflow - consumed elsewhere?")
log_warn_internal("Unexpected #{event_name} queue underflow - consumed elsewhere?")
telemetry&.report(exc, description: "Unexpected #{event_name} queue underflow")
ensure
@lock.synchronize do
Expand Down
8 changes: 4 additions & 4 deletions lib/datadog/di/remote.rb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def receivers(telemetry)
probe = ProbeBuilder.build_from_remote_config(probe_spec)
payload = component.probe_notification_builder.build_received(probe)
component.probe_notifier_worker.add_status(payload)
component.logger.info("Received probe from RC: #{probe.type} #{probe.location}")
component.log_info_internal("Received probe from RC: #{probe.type} #{probe.location}")

begin
# TODO test exception capture
Expand All @@ -60,7 +60,7 @@ def receivers(telemetry)
rescue => exc
raise if component.settings.dynamic_instrumentation.internal.propagate_all_exceptions

component.logger.warn("Unhandled exception adding probe in DI remote receiver: #{exc.class}: #{exc}")
component.log_warn_internal("Unhandled exception adding probe in DI remote receiver: #{exc.class}: #{exc}")
component.telemetry&.report(exc, description: "Unhandled exception adding probe in DI remote receiver")

# If a probe fails to install, we will mark the content
Expand All @@ -81,7 +81,7 @@ def receivers(telemetry)
rescue => exc
raise if component.settings.dynamic_instrumentation.internal.propagate_all_exceptions

component.logger.warn("Unhandled exception handling probe in DI remote receiver: #{exc.class}: #{exc}")
component.log_warn_internal("Unhandled exception handling probe in DI remote receiver: #{exc.class}: #{exc}")
component.telemetry&.report(exc, description: "Unhandled exception handling probe in DI remote receiver")

content.errored("Error applying dynamic instrumentation configuration: #{exc.class.name} #{exc.message}: #{Array(exc.backtrace).join("\n")}")
Expand All @@ -95,7 +95,7 @@ def receivers(telemetry)
rescue => exc
raise if component.settings.dynamic_instrumentation.internal.propagate_all_exceptions

component.logger.warn("Unhandled exception removing probes in DI remote receiver: #{exc.class}: #{exc}")
component.log_warn_internal("Unhandled exception removing probes in DI remote receiver: #{exc.class}: #{exc}")
component.telemetry&.report(exc, description: "Unhandled exception removing probes in DI remote receiver")
end
end
Expand Down
1 change: 1 addition & 0 deletions spec/datadog/di/instrumenter_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
mock_settings_for_di do |settings|
allow(settings.dynamic_instrumentation).to receive(:enabled).and_return(true)
allow(settings.dynamic_instrumentation.internal).to receive(:untargeted_trace_points).and_return(false)
allow(settings.dynamic_instrumentation.internal).to receive(:verbose_logging).and_return(true)
allow(settings.dynamic_instrumentation).to receive(:max_capture_depth).and_return(2)
allow(settings.dynamic_instrumentation).to receive(:max_capture_attribute_count).and_return(2)
allow(settings.dynamic_instrumentation).to receive(:max_capture_string_length).and_return(100)
Expand Down
1 change: 1 addition & 0 deletions spec/datadog/di/probe_manager_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ class ProbeManagerSpecTestClass; end
mock_settings_for_di do |settings|
allow(settings.dynamic_instrumentation).to receive(:enabled).and_return(true)
allow(settings.dynamic_instrumentation.internal).to receive(:propagate_all_exceptions).and_return(false)
allow(settings.dynamic_instrumentation.internal).to receive(:verbose_logging).and_return(true)
end

let(:instrumenter) do
Expand Down
25 changes: 9 additions & 16 deletions spec/datadog/di/remote_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@
mock_settings_for_di do |settings|
allow(settings.dynamic_instrumentation).to receive(:enabled).and_return(true)
allow(settings.dynamic_instrumentation.internal).to receive(:propagate_all_exceptions).and_return(false)
allow(settings.dynamic_instrumentation.internal).to receive(:verbose_logging).and_return(true)
end

let(:serializer) do
Expand Down Expand Up @@ -144,8 +145,7 @@
end

it 'calls probe manager to add a probe' do
expect(component).to receive(:logger).and_return(logger)
expect(logger).to receive(:info) do |message|
expect(component).to receive(:log_info_internal) do |message|
expect(message).to match(/Received probe/)
end

Expand All @@ -162,15 +162,13 @@
context 'probe addition raises an exception' do
it 'logs warning and consumes the exception' do
expect(component).to receive(:telemetry).and_return(telemetry)
expect(component).to receive(:logger).and_return(logger)
expect(logger).to receive(:info) do |message|
expect(component).to receive(:log_info_internal) do |message|
expect(message).to match(/Received probe/)
end

expect(logger).to receive(:warn) do |msg|
expect(component).to receive(:log_warn_internal) do |msg|
expect(msg).to match(/Unhandled exception.*Runtime error from test/)
end
expect(component).to receive(:logger).and_return(logger)
expect(telemetry).to receive(:report)

expect(probe_manager).to receive(:add_probe).and_raise("Runtime error from test")
Expand All @@ -186,17 +184,15 @@

it 'calls probe manager to remove stale probes' do
allow(component).to receive(:telemetry)
expect(component).to receive(:logger).and_return(logger)
expect(logger).to receive(:info) do |message|
expect(component).to receive(:log_info_internal) do |message|
expect(message).to match(/Received probe/)
end

expect(logger).to receive(:warn) do |msg|
expect(component).to receive(:log_warn_internal) do |msg|
expect(msg).to match(/Unhandled exception.*Runtime error from test/)
end

allow(probe_manager).to receive(:add_probe).and_raise("Runtime error from test")
expect(component).to receive(:logger).and_return(logger)
expect(component).to receive(:probe_notification_builder).and_return(probe_notification_builder)
expect(probe_notification_builder).to receive(:build_received)
expect(component).to receive(:probe_notifier_worker).and_return(probe_notifier_worker)
Expand All @@ -209,27 +205,24 @@
context 'probe removal raises an exception' do
it 'logs warning and consumes the exception' do
expect(component).to receive(:telemetry).and_return(telemetry).at_least(:once)
expect(component).to receive(:logger).and_return(logger)
expect(logger).to receive(:info) do |message|
expect(component).to receive(:log_info_internal) do |message|
expect(message).to match(/Received probe/)
end

expect(logger).to receive(:warn) do |msg|
expect(component).to receive(:log_warn_internal) do |msg|
expect(msg).to match(/Unhandled exception.*Runtime error 1 from test/)
end
expect(telemetry).to receive(:report)

allow(probe_manager).to receive(:add_probe).and_raise("Runtime error 1 from test")
expect(component).to receive(:logger).and_return(logger)
expect(component).to receive(:probe_notification_builder).and_return(probe_notification_builder)
expect(probe_notification_builder).to receive(:build_received)
expect(component).to receive(:probe_notifier_worker).and_return(probe_notifier_worker)
expect(probe_notifier_worker).to receive(:add_status)

expect(logger).to receive(:warn) do |msg|
expect(component).to receive(:log_warn_internal) do |msg|
expect(msg).to match(/Unhandled exception.*Runtime error 2 from test/)
end
expect(component).to receive(:logger).and_return(logger)
expect(telemetry).to receive(:report)

expect(probe_manager).to receive(:remove_other_probes).with(['11']).and_raise("Runtime error 2 from test")
Expand Down
Loading