-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracer support #20
base: master
Are you sure you want to change the base?
Tracer support #20
Conversation
response stream populate must happen in a thread, or the stream is consumed in Entrypoint.handle_result and will not be available to any extensions later (i.e. in worker_result)
test we're returning requests/responses in streams
|
||
self.container.spawn_managed_thread( | ||
send_response, identifier="send_response" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send response had to be deferred to a thread, otherwise the response
iterator was exhausted before worker_result
was called in any DependencyProviders
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt, a few questions I saw on the first read up
} | ||
|
||
|
||
class GrpcTracer(Tracer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need for a custom tracer dependency provider? The adapter instance has worker_ctx
available, so it should be able to do all the extra logic required by grpc tracing.
Having a custom Tracer dependency provider would require to set separate dependency providers for each entrypoint type or sub-classing them. Instead nameko_tracer.Tracer
can be used as dependency on service class for all adapters and then only a config entry is required to make extra adapters available if there is a need for info specific to an entrypoint type:
# config.yaml
TRACER:
ADAPTERS:
"nameko_grpc.entrypoint.Grpc": "nameko_grpc.tracer.GrpcEntrypointAdapter"
The config above tells the nameko_tracer.Tracer
that for Grpc
entrypoints tracing GrpcEntrypointAdapter
adapter should be used.
If it is possible I would suggest moving the grpc specific info gathering to process method of GrpcEntrypointAdapter
https://github.com/mattbennett/nameko-grpc/pull/20/files#diff-ce5ef4103ac91cebceb781d1db019e0cR230 and removing the tracer subclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difficulty is with the streaming requests and responses. Unlike normal Nameko entrypoints, a streaming method yields multiple times. Similarly for streaming requests, the request object is iterated over yielding multiple items. I think each of these items should result in their own trace.
For a UNARY_STREAM RPC method (i.e. unary request, streaming response) the adapter is called once for the request, once for the "top-level" response (where the result is actually just a generator) and once for every response message in the response stream.
Maybe nameko-tracer
could support streaming requests/responses as a first-class thing, and then we could drop the subclass here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matt, you are right, streaming makes it more difficult! Maybe tracer should support it 🤔 😄
Streaming entrypoint yields multiple times but still inside one worker - one entrypoint fire.
If we stick to tracing worker/entrypoint request and response, then the request trace can include just the metadata sent and the response combined information of all responces (and also of all requests in case of stream-unary or stream-stream connection)
I can also see that sometimes "sub-tracing" each chunk/message can be useful. The question is what are the real use-cases and for what scenarios is the streaming used the most? Is it a stream as getting a data flow with getting whats there first before the later stuff is generated? Is it chunking big data sets over the wire? Or is it intended to implement a long lasting communication channel between the client and server which can be achieved in stream-stream scenario? All of these seem to be valid use-cases.
If it is just having a stream or chunking bigger data, I would say having a normal request trace for initiating the call and only one response trace logging results after receiving all responses and after sending all requests would be probably the best fit. The notable work is really done at the end of the entrypoint.
If it is used for some bidirectional messaging implementation, it's true that sub-tracing each may be a better fit.
I quite like having a straight forward tracing of worker start and worker end events, really one entrypoint - two traces. It's not immediately clear that iterating over gRPC requests produces multiple traces, same when yielding multiple times (although this is clearer as there are multiple yield statements present or the yield is in a loop).
Also the pair of caller worker ID and called entrypoint worker ID in the stack trace gets suddenly a bit more complicated as you can get multiple request or response traces per same worker ID. Also these ID pairs are not going to be unique anymore (caller to called worker IDs) Not a big issue, but worth noting.)
When thinking about logging and debugging failures, in a scenario where one sends two successful responses, but fails on generating or sending the third - is it better to have two good traces and one error trace or just logging one trace saying that entrypoint failed
on its third response? Same with streaming a request inside an entrypoint, it's the entrypoint what fails at the end.
Hard to say .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a look on gRPC opentracing trying to understand what the best practice might be or what's the standard for tracing gRPC streams.
It seems that a span contains all request and response parts and that one gRPC call results in one span of a trace no matter how many messages its streams had. The finished span is then added to a tracer buffer after the whole remote procedure call was handled.
If nameko-grpc
supports simple entrypoint/worker level tracing I think it should cover most of the cases .)
response_stream.close(error) | ||
|
||
self.container.spawn_managed_thread( | ||
send_response, identifier="send_response" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/send_response/grpc_send_response
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is better. Actually there are lots of places where the identifier
passed is quite sloppy. Even grpc_send_response
is a bit useless -- it'd be nicer if it used the call stack so you could differentiate multiple threads for concurrent workers.
|
||
clean_call_args(extra) | ||
clean_response(extra) | ||
clean_response_status(extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole module is super-nicely organised and documented! .)
Adds support for https://github.com/Overseas-Student-Living/nameko-tracer, including streaming requests and responses.