-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30397 Review existing server spans to match expected conventions #18056
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1915,11 +1915,16 @@ int CHttpRequest::receive(IMultiException *me) | |
return 0; | ||
} | ||
|
||
ISpan * CHttpRequest::createServerSpan() | ||
ISpan * CHttpRequest::createServerSpan(const char * serviceName, const char * methodName) | ||
{ | ||
//MORE: The previous code would be better off querying httpHeaders... | ||
StringBuffer spanName; | ||
spanName.append(serviceName); | ||
if (!isEmptyString(methodName)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @afishbeck depending on how the service function is called, sometimes the methodName is not known at this point (part of the soap envelope?) How can it be cleanly extracted? May require some refactoring, so a different PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this timing issue related to when we know the name comes up sometimes. You could open a JIRA to refactor to always get the name at the beginning. I actually did similar refactoring in roxie. One idea would be to do a quick pull parser extraction of the name even if it came from the SOAP/XML/JSON content of the message body. Generally the pull parser would usually only have to walk a few tags to find what it needs. It could then abandon the parsing until the body is fully parsed later. Another way would be to always parse the body much sooner. Early on we decided to avoid extra overhead in case we rejected the request for other reasons.. for example security. We save a little time and memory, or perhaps there may occasionally be huge requests. I think Roxie parses pretty early but isn't as close to the front lines as ESP. Either way would be good in the sense that it would let us refactor some other code that has to wait until the method is known. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See QueryNameExtractor in roxie. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HPCC-31013 created. |
||
spanName.append("/").append(methodName); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's prob fine, but I wonder if the / delimiter could cause problems downstream. (prometheous, grafana, elastic will likely use this value in UI widgets). I didn't find specific rules on span nomenclature or even attribute naming rules which excludes the slash, but I'd consider another delimiter There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of the examples on the open telemetry recommendations for span name conventions (https://github.com/open-telemetry/opentelemetry-specification/blob/v1.26.0/specification/trace/api.md#span) uses a / : "get_account/{accountId}" - so I think it can be considered safe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough. |
||
spanName.toLowerCase(); | ||
Owned<IProperties> httpHeaders = getHeadersAsProperties(m_headers); | ||
return queryTraceManager().createServerSpan("HTTPRequest", httpHeaders, SpanFlags::EnsureGlobalId); | ||
return queryTraceManager().createServerSpan(spanName, httpHeaders, SpanFlags::EnsureGlobalId); | ||
} | ||
|
||
void CHttpRequest::annotateSpan(const char * key, const char * value) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2001,7 +2001,7 @@ int CWsEclBinding::submitWsEclWorkunit(IEspContext & context, WsEclWuInfo &wsinf | |
|
||
Owned<ISpan> clientSpan; | ||
ISpan * activeSpan = context.queryActiveSpan(); | ||
clientSpan.setown(activeSpan->createClientSpan("wsecl/SubmitWorkunit")); | ||
clientSpan.setown(activeSpan->createClientSpan("run_workunit")); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is where the idea of PRODUCER spans gets interesting. I guess the producer span would be created here? or would we do it in workunit.dll when a workunit is created? Then even the most basic eclagent/roxie/hthor job would be represented by a CONSUMER span? And thor jobs may create multiple consumer spans. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, this is where the producer span would be. I'm not sure that I agree it makes sense to differentiate a consumer span from a server span though. In general the service does not know whether the parent span is waiting for it or not. It may make sense with an internal producer/consumer model. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ghalliday just wanted to make a note to retain this thought. The reality could be different then the theory. BUT I think the reason for a consumer span rather than a server span is that a server span should show up as a sequential child within the timeframe of the span of its parent. If a THOR job itermitantly / asynchronously works on a workunit the spans would not correlate to the original parent at all which isn't what I think is expected of server spans.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I am not sure about in this model is that the consumer does not really know whether the caller is waiting for it or not. Does a webservice know if the caller is waiting for it? On reflection maybe it does. What happens if a server span sends an acknowledgement to the client before it finishes processing? Is it require to create an internal producer/consumer span if that happens? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ghalliday, regarding "Does a webservice know if the caller is waiting for it?", I would think so in the case of simple synchronous HTTP. But what about the async or even asymmetric case? Such as where a web server does HTTP/2 Push, or Server-Sent Events? That seems to match the PRODUCER pattern. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. WsECL specifically lets the caller decide if they want to have it return immediately giving them the WUID so they can retrieve the result, or will wait for the workunit to complete. I'd consider:
|
||
Owned<IProperties> httpHeaders = ::getClientHeaders(clientSpan); | ||
recordTraceDebugOptions(workunit, httpHeaders); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3769,6 +3769,11 @@ void WsWuHelpers::submitWsWorkunit(IEspContext& context, IConstWorkUnit* cw, con | |
} | ||
} | ||
|
||
ISpan * activeSpan = context.queryActiveSpan(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @afishbeck if you submitted a workunit the globalids etc. where not preserved in the workunit. They are now. Should this code be commoned up with the code in ws_ecl_service.cpp? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, when ws_ecl_service.cpp runs the job as a workunit (as opposed to calling out to roxie) it clones the original workunit. Because of that submitWsEclWorkunit is a bit different from submitWsWorkunit here, BUT it has context available so this bit of code should be able to be commoned up. |
||
Owned<ISpan> clientSpan(activeSpan->createClientSpan("run_workunit")); | ||
Owned<IProperties> httpHeaders = ::getClientHeaders(clientSpan); | ||
recordTraceDebugOptions(wu, httpHeaders); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixing the headers not being recorded in the workunit revealed another problem - see thorsoapcall for details. |
||
|
||
if (resetWorkflow) | ||
wu->resetWorkflow(); | ||
if (!compile) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1179,7 +1179,8 @@ class RoxieWorkUnitWorker : public RoxieQueryWorker | |
Owned<StringContextLogger> logctx = new StringContextLogger(wuid.get()); | ||
|
||
Owned<IProperties> traceHeaders = extractTraceDebugOptions(wu); | ||
Owned<ISpan> requestSpan = queryTraceManager().createServerSpan(wu->queryWuid(), traceHeaders); | ||
Owned<ISpan> requestSpan = queryTraceManager().createServerSpan("run_workunit", traceHeaders); | ||
requestSpan->setSpanAttribute("hpcc.wuid", wuid); | ||
ContextSpanScope spanScope(*logctx, requestSpan); | ||
|
||
Owned<IQueryFactory> queryFactory; | ||
|
@@ -1438,7 +1439,7 @@ class RoxieProtocolMsgContext : implements IHpccProtocolMsgContext, public CInte | |
return *cascade; | ||
} | ||
|
||
virtual void startSpan(const char * id, const IProperties * headers) override | ||
virtual void startSpan(const char * id, const char * querySetName, const char * queryName, const IProperties * headers) override | ||
{ | ||
Linked<const IProperties> allHeaders = headers; | ||
SpanFlags flags = (ensureGlobalIdExists) ? SpanFlags::EnsureGlobalId : SpanFlags::None; | ||
|
@@ -1455,7 +1456,12 @@ class RoxieProtocolMsgContext : implements IHpccProtocolMsgContext, public CInte | |
|
||
ensureContextLogger(); | ||
|
||
Owned<ISpan> requestSpan = queryTraceManager().createServerSpan("request", allHeaders, flags); | ||
const char * spanQueryName = !isEmptyString(queryName) ? queryName : "run_query"; | ||
StringBuffer spanName(querySetName); | ||
if (spanName.length()) | ||
spanName.append('/'); | ||
spanName.append(spanQueryName); | ||
Owned<ISpan> requestSpan = queryTraceManager().createServerSpan(spanName, allHeaders, flags); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name of the deployed roxie query is used as the span name. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do wonder if prefixing with the queryset name (aka target, and generally in AKS this is just the roxie name) would help find queries from the same roxie.. but knowing that for sure will come down to experience. |
||
//The span has a lifetime the same length as the logctx, so no need to restore it at the end of the query | ||
logctx->setActiveSpan(requestSpan); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -707,7 +707,7 @@ class CServerSpan : public CSpan | |
{ | ||
bool success = CSpan::getSpanContext(ctxProps, otelFormatted); | ||
|
||
if (remoteParentSpanCtx.IsValid()) | ||
if (!otelFormatted && remoteParentSpanCtx.IsValid()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @afishbeck please check. remoteParentSpanId shouldn't have been being included in the client headers. I opened HPCC-30927 to split this function into two, because it is confusing as it stands. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. For what it's worth, I thought I had 2 independent versions of this function in at least one of my many PR commits. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with both of you, these should be 2 functions it's very confusing. So I guess if blocking remoteParentSpanID when otelFormatted is set prevents it from being sent as a header then this code is right? |
||
{ | ||
StringBuffer remoteParentSpanID; | ||
char remoteParentSpanId[16] = {0}; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
<Dataset name='GlobalId'> | ||
<Row><GlobalId>xPSDvT9akc1fGSTZWJKb</GlobalId></Row> | ||
<Row><GlobalId>true</GlobalId></Row> | ||
</Dataset> | ||
<Dataset name='CallerId'> | ||
<Row><CallerId>PkAntaCLkY4MknCnXA</CallerId></Row> | ||
<Row><CallerId>true</CallerId></Row> | ||
</Dataset> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1485,7 +1485,9 @@ CJobMaster::CJobMaster(IConstWorkUnit &_workunit, const char *graphName, ILoaded | |
init(); | ||
|
||
Owned<IProperties> traceHeaders = extractTraceDebugOptions(workunit); | ||
Owned<ISpan> requestSpan = queryTraceManager().createServerSpan(workunit->queryWuid(), traceHeaders); | ||
Owned<ISpan> requestSpan = queryTraceManager().createServerSpan("run_graph", traceHeaders); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ghalliday I wonder if workunit spans should be created as PRODUCER spans and then when a component works on that workunit it should be CONSUMER spans. (consumer may sound weird for something like thor, but see the description below):
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should probably be a server span, but the code that adds the workunit to the execution queue should probably be a producer span. I have added a jira to introduce the new concept and we can revisit existing spans when that is available. |
||
requestSpan->setSpanAttribute("hpcc.wuid", workunit->queryWuid()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like some of the most common attributes should be defined in JTrace There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is something I'm still not sure about. I can see the argument for having common strings - it catches typing mistakes, but I also find the code harder to read. I suspect I could be persuaded for strings like this that are likely to be scattered through the code. |
||
requestSpan->setSpanAttribute("hpcc.graph", graphName); | ||
ContextSpanScope spanScope(*logctx, requestSpan); | ||
|
||
resumed = WUActionResume == workunit->getAction(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a query set "hpcc-global-id" but the headers line set "global-id" or vice-versa you would get both values in the headers potentially causing confusion. Only likely to affect the regression tests - but it did once the ids were preserved where they were previously dropped.