add span monitor process to optionally monitor the spans in a process #84

tsloughter · 2020-07-09T14:51:23Z

A span can optionally set monitor to true at the time it is created.
This results in a monitor on the process and if that process exits
for any reason all spans started in that process will be ended.

tsloughter · 2020-07-09T14:52:26Z

This is partly wrong. I realized right before opening the PR that this shouldn't be done for the current process when start_inactive_span is used. This has to be done for the process it becomes active in.

Not sure yet how I want to fix this.

tsloughter · 2020-07-09T14:55:28Z

Two questions I have are:

Should this be only per-span. Right now it will end all spans in the process no matter which span includes the monitor => true option. I could instead have the monitor only end spans that explicitly include the monitor option and leave the others open.
Maybe if the solution of having it end all spans (as it is in this PR currently) in the monitored process is agreed on it should then be a call like ot_tracer:monitor_spans instead of being an option on span start?

A span can optionally set `monitor` to true at the time it is created. This results in a monitor on the process and if that process exits for any reason all spans started in that process will be ended.

bryannaegele · 2020-07-09T16:42:21Z

It makes sense that you'd want to close all spans in the monitored process since the process dying is fatal to all of them.

tsloughter · 2020-07-09T17:00:58Z

@bryannaegele my thought too. But people do/want weird things.

I should probably change it to be a function called to monitor all spans in a process instead of adding it to the span start opts then...

garthk · 2020-07-09T23:21:05Z

I think I'd want to add this to my tracer config to cover every span it starts. I'd also want a way to turn the exit details into extra span attributes, though the latter'd also apply to any with_child_span construct taking a function argument and catching exceptions.

davydog187 · 2020-08-31T15:44:24Z

@tsloughter is there any remaining work left on this PR?

I'm seeing this manifest when using Task.async_stream

One of the tasks times out with no child span reported

tsloughter · 2020-09-01T23:00:53Z

@davydog187 oh, hm, I think I just wanted to offer a function to start monitoring in a process (instead of just being an option to start_span).

Maybe a config value so it could be turned on globally but we probably will want a pool of monitoring processes before supporting that so it'll come in a separate PR.

tsloughter · 2020-09-25T16:02:35Z

So I want to finish this up and can't decide what the best interface is.

An issue is how to handle inactive spans. I think this means that when attaching a context the active span in that context has to have its pid updated in the ets table...

derekkraan · 2021-10-28T12:04:59Z

I implemented (by peeking at this PR) this in a project I'm working on. In the end I went with the following interface:
OpenTelemetry.Span.monitor/1. This is nice because it allows us to do something like:

OpenTelemetry.Tracer.with_span "my_span" do
  OpenTelemetry.Span.monitor(OpenTelemetry.Tracer.current_span_ctx())
end

Here's the code I ended up with, for posterity:

defmodule OT.Monitor do                                                                        
  use GenServer                                                                                
                                                                                               
  def start_link(_arg) do                                                                      
    GenServer.start_link(__MODULE__, nil, name: __MODULE__)                                    
  end                                                                                          
                                                                                               
  def init(nil) do                                                                             
    _table_id = :ets.new(__MODULE__, [:bag, :public, {:write_concurrency, true}, :named_table])
    {:ok, nil}                                                                                 
  end                                                                                          
                                                                                               
  def handle_call({:monitor, pid}, _from, state) do                                            
    Process.monitor(pid)                                                                       
    {:reply, :ok, state}                                                                       
  end                                                                                          
                                                                                               
  def handle_info({:DOWN, _ref, :process, pid, :normal}, state) do                             
    :ets.take(__MODULE__, pid)                                                                 
    |> Enum.each(fn {_pid, ctx} ->                                                             
      _span_ctx = OpenTelemetry.Tracer.set_current_span(ctx)                                   
      _ = OpenTelemetry.Tracer.end_span()                                                      
    end)                                                                                       
                                                                                               
    {:noreply, state}                                                                          
  end                                                                                          
                                                                                               
  def handle_info({:DOWN, _ref, :process, pid, {:shutdown, _}}, state) do                      
    :ets.take(__MODULE__, pid)                                                                 
    |> Enum.each(fn {_pid, ctx} ->                                                             
      _span_ctx = OpenTelemetry.Tracer.set_current_span(ctx)                                   
      _ = OpenTelemetry.Tracer.end_span()                                                      
    end)                                                                                       
                                                                                               
    {:noreply, state}                                                                          
  end                                                                                          
                                                                                               
  def handle_info({:DOWN, _ref, :process, pid, reason}, state) do                              
    :ets.take(__MODULE__, pid)                                                                 
    |> Enum.each(fn {_pid, ctx} ->                                                             
      _span_ctx = OpenTelemetry.Tracer.set_current_span(ctx)                                   
      _ = OpenTelemetry.Tracer.add_event("Process died", [{"reason", inspect(reason)}])        
      _ = OpenTelemetry.Tracer.end_span()                                                      
    end)                                                                                       
                                                                                               
    {:noreply, state}                                                                          
  end                                                                                          
                                                                                               
  def monitor(span_ctx) do                                                                     
    if Application.fetch_env!(:opentelemetry, :enabled) do                                     
      # monitor first, because the monitor is necessary to clean the ets table.                
      :ok = GenServer.call(__MODULE__, {:monitor, self()})                                     
      true = :ets.insert(__MODULE__, {self(), span_ctx})                                       
    end                                                                                        
  end                                                                                          
end

hkrutzer · 2022-05-03T12:45:50Z

I think this functionality is pretty critical. Personally I would want to have this for as many spans as possible, and definitely for active spans. So that means I would prefer not to set it as an option for every span, but rather as a config value, but I'll take any version of this feature I can get 🙂

tsloughter · 2022-05-03T16:15:03Z

If we either make span processors not a pain or extend them to allow additional functionality attached to them then I think we could easily allow it to be configured for all spans instead of an option per-span.

fcevado · 2022-09-02T13:33:22Z

@tsloughter i was thinking if spans actually shoudn't live in a external process, that monitor the process starting the spans, in case of the process dying or raises, it closes all remaining open spans with error status.
I know it would be a big change on otel sdk architecture, should i start a discussion about that?

tsloughter · 2022-09-02T16:58:10Z

@fcevado that could work. There are other tracing libs that do just that, trying to remember which one... Ah, it is Tapper. And I think New Relic or AppSignal actually use (or used) a separate process for each trace -- which I wouldn't think would scale well, at least compared to Tapper or our use of ETS.

What I'd hope to be able to experiment with is alternate SDKs. Technically it should be possible for an SDK to be created that does make all span operations messages to another process. No user code should have to change, just which SDK they include in their project.

tsloughter · 2023-08-10T11:11:29Z

Closing for #602

tsloughter requested a review from a team July 9, 2020 14:51

add span monitor process

f4a197c

A span can optionally set `monitor` to true at the time it is created. This results in a monitor on the process and if that process exits for any reason all spans started in that process will be ended.

tsloughter force-pushed the monitor-span branch from 27e6989 to f4a197c Compare July 9, 2020 15:05

tsloughter changed the base branch from master to main February 11, 2021 19:38

derekkraan mentioned this pull request Oct 28, 2021

Add opentelemetry integration to Oban open-telemetry/opentelemetry-erlang-contrib#6

Merged

23 tasks

bryannaegele mentioned this pull request Sep 15, 2022

Add OpentelemetryMonitor to contrib open-telemetry/opentelemetry-erlang-contrib#109

Closed

tsloughter closed this Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add span monitor process to optionally monitor the spans in a process #84

add span monitor process to optionally monitor the spans in a process #84

tsloughter commented Jul 9, 2020

tsloughter commented Jul 9, 2020

tsloughter commented Jul 9, 2020

bryannaegele commented Jul 9, 2020

tsloughter commented Jul 9, 2020

garthk commented Jul 9, 2020

davydog187 commented Aug 31, 2020

tsloughter commented Sep 1, 2020

tsloughter commented Sep 25, 2020

derekkraan commented Oct 28, 2021 •

edited

Loading

hkrutzer commented May 3, 2022

tsloughter commented May 3, 2022

fcevado commented Sep 2, 2022

tsloughter commented Sep 2, 2022

tsloughter commented Aug 10, 2023

add span monitor process to optionally monitor the spans in a process #84

add span monitor process to optionally monitor the spans in a process #84

Conversation

tsloughter commented Jul 9, 2020

tsloughter commented Jul 9, 2020

tsloughter commented Jul 9, 2020

bryannaegele commented Jul 9, 2020

tsloughter commented Jul 9, 2020

garthk commented Jul 9, 2020

davydog187 commented Aug 31, 2020

tsloughter commented Sep 1, 2020

tsloughter commented Sep 25, 2020

derekkraan commented Oct 28, 2021 • edited Loading

hkrutzer commented May 3, 2022

tsloughter commented May 3, 2022

fcevado commented Sep 2, 2022

tsloughter commented Sep 2, 2022

tsloughter commented Aug 10, 2023

derekkraan commented Oct 28, 2021 •

edited

Loading