Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-586 Add Otel Tracing support #714

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented May 31, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change is a breaking change (fix or feature that will cause existing behavior to change).

Checklist:

  • I have created a corresponding JIRA ticket for this submission
  • My code follows the code style of this project.
    • I have applied the Eclipse code-format template provided.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the HPCC Systems CONTRIBUTORS document (https://github.com/hpcc-systems/HPCC-Platform/wiki/Guide-for-contributors).
  • The change has been fully tested:
    • This change does not cause any existing JUnits to fail.
    • I have include JUnit coverage to test this change
    • I have performed system test and covered possible regressions and side effects.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Premature optimization
    • This change fixes the problem, not just the symptom

Testing:

@rpastrana rpastrana requested a review from jpmcmu May 31, 2024 03:13
@@ -164,6 +202,89 @@ private String getTargetHPCCBuildVersionString() throws Exception

}

public SpanBuilder getSpanBuilder(String spanName)
{
Tracer tracer = getTracer(INSTRUMENTED_LIBRARY_NAME);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use member tracer instead.

})).build();
*/
}
tracer = GlobalOpenTelemetry.getTracer(tracerName);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracer should be static

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's preferred to track opentelemetry singleton and fetch tracer from it

if (options != null)
{
if (currentSpan != null && currentSpan.getSpanContext().isValid())
W3CTraceContextPropagator.getInstance().inject(Context.current(), options, Options::setProperty);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we propagate the Context.current(), or the currentSpan.getContext() passed in?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use currentSpan.getContext() for consistency. We could also have an override of injectTraceParentHeader() that doesn't require any input and grabs the current span via Span.current()

{
((WsWorkunitsStub) stub).ping(request);
verifyStub();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass in span

Ping request = new Ping();

try
Span span = getSpanBuilder("WsWUClient_Ping").startSpan();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to disable if tracing not desirable?

Copy link
Contributor

@jpmcmu jpmcmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana look good, I don't see any major issues, but do have a few questions / comments

if (options != null)
{
if (currentSpan != null && currentSpan.getSpanContext().isValid())
W3CTraceContextPropagator.getInstance().inject(Context.current(), options, Options::setProperty);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use currentSpan.getContext() for consistency. We could also have an override of injectTraceParentHeader() that doesn't require any input and grabs the current span via Span.current()

{
Map<String, String> carrier = new HashMap<>();
TextMapSetter<Map<String, String>> setter = Map::put;
W3CTraceContextPropagator.getInstance().inject(Context.current(), carrier, setter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using currentSpawn.getContext() here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inject actually takes a Context rather than SpanContext

@@ -582,6 +726,10 @@ static public Stub setStubOptions(Stub thestub, Connection connection) throws Ax

opt.setProperty(HTTPConstants.CHUNKED, Boolean.FALSE);

//only do this if tracing enabled?
//injectCurrentSpanTraceParentHeader(opt);
//opt.setProperty("traceparent", getTraceParentHeader());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need these comments?

@@ -240,7 +279,16 @@ public boolean verify(String hostname,javax.net.ssl.SSLSession sslSession)
// Run the generate-datasets.ecl script if present in the project resources
try
{
executeECLScript("generate-datasets.ecl");
//Span eclscriptspan = tracer.spanBuilder("generate-datasets.ecl").startSpan();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be interesting to add tracing into our tests. This could be a good way to better understanding how the tracing works in practice, and it could help us track down issues. So, I think lets keep this trace code

@rpastrana rpastrana force-pushed the HPCC4J-586 branch 2 times, most recently from fb5e881 to ad53e15 Compare May 31, 2024 20:58
@rpastrana rpastrana requested a review from jpmcmu May 31, 2024 20:58
@rpastrana
Copy link
Member Author

@jpmcmu please review

Copy link
Contributor

@jpmcmu jpmcmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana looks good

pom.xml Outdated
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana Based on our discussion it probably makes sense to trim these dependencies. We can probably rely on the downstream client applications to provide the exporter jars they need / use.

// -D.otel.logs.exporter=none
// -Dotel.java.global-autoconfigure.enabled=true

globalOTel = GlobalOpenTelemetry.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle defaults here? Should we check if defaults have been set and if not default to no-op and / or logging exporter?

@rpastrana rpastrana force-pushed the HPCC4J-586 branch 2 times, most recently from 15cce48 to f763fda Compare June 4, 2024 22:31
@rpastrana rpastrana requested a review from jpmcmu June 4, 2024 23:28
@rpastrana
Copy link
Member Author

@jpmcmu see latest commit

.spanBuilder(method.toUpperCase() + " " + url.toExternalForm())
.setAttribute(ServerAttributes.SERVER_ADDRESS, getHost())
.setAttribute(ServerAttributes.SERVER_PORT, Long.getLong(getPort()))
.setAttribute(HttpAttributes.HTTP_REQUEST_METHOD, HttpAttributes.HttpRequestMethodValues.GET)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't always be an http get

Copy link
Contributor

@jpmcmu jpmcmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and I would also be curious to see what the trace output looks like from the ESP logging perspective.

* WARNING: Due to the inherent complications around initialization order involving this classand its single global instance, we strongly recommend *not* using GlobalOpenTelemetry unless youhave a use-case that absolutely requires it. Please favor using instances of OpenTelemetrywherever possible.
* If you are using the OpenTelemetry javaagent, it is generally best to only callGlobalOpenTelemetry.get() once, and then pass the resulting reference where you need to use it.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our conversation lets remove this documentation about default config. Maybe also add a comment specifying what happens if the client application hasn't setup Otel.

}
catch (Exception e)
{
System.out.println("OpenTelemetry autoconfiguration failed: " + e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will globalOTel be null if we get an exception here?

int responseCode = httpURLConnection.getResponseCode(); //throws IOException
try (Scope scope = sendHTTPReqSpan.makeCurrent())
{
Span currentSpan = Span.current();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use sendHTTPReqSpan here since currentSpan should always be sendHTTPReqSpan. Otherwise it is unclear why we would be expecting a different span.

Map<String, String> carrier = new HashMap<>();
TextMapSetter<Map<String, String>> setter = Map::put;
W3CTraceContextPropagator.getInstance().inject(Context.current(), carrier, setter);
String traceparent = carrier.getOrDefault("traceparent", "00-" + currentSpan.getSpanContext().getTraceId() + "-" + currentSpan.getSpanContext().getSpanId() + "-00");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this make sense to move this functionality to create trace into a utility so that dfsclient can use this as well?

@rpastrana rpastrana requested a review from jpmcmu June 5, 2024 16:15
@rpastrana
Copy link
Member Author

Sample trace output via console/log trace exporter:

INFO: 'FetchHPCCVersion' : 40d980b918457f3e20f1af6246069339 91d81b3e8a171805 CLIENT [tracer: WsClient:] AttributesMap{data={server.address=localhost, http.request.method=GET}, capacity=128, totalAddedValues=2}

INFO: 'GET http://localhost:8010/WsSMC/Activity?rawxml_' : 40d980b918457f3e20f1af6246069339 4ab0bf9585b5eb78 CLIENT [tracer: WsClient:] AttributesMap{data={server.address=localhost, hasCredentials=false, http.response.status_code=200, http.request.method=GET}, capacity=128, totalAddedValues=4}

Followed by corresponding ESP log output including server trace output:

00001051 PRG 2024-06-04 18:10:09.886  1648  3899 "GET /WsSMC/Activity, from 127.0.0.1"
00001052 PRG 2024-06-04 18:10:09.887  1648  3899 "TxSummary[activeReqs=1;auth=NA;contLen=-1;rcv=0;handleHttp=1;[email protected];req=GET wssmc.ACTIVITY v1.27;total=1;]"
00001053 MON 2024-06-04 18:10:09.888  1648  3899 "{ "type": "span", "name": "wssmc/activity", "trace_id": "40d980b918457f3e20f1af6246069339", "span_id": "d3bcbdcdb87e9088", "start": 1717539009887314522, "duration": 357500, "parent_span_id": "4ab0bf9585b5eb78", "attributes": {"http.request.method": "GET","id.local": "JmbS4Eu3dnweegcRerhoLz","id.global": "JmbS4Eu6bgXmd7i3C8nFw8" } }"

@rpastrana
Copy link
Member Author

Sample JVM arguments used to autoconfigure Junit tests and/or platformtester

-Dotel.traces.exporter=console
-Dotel.metrics.exporter=none
-Dotel.logs.exporter=none
-Dotel.java.global-autoconfigure.enabled=true

Copy link
Contributor

@jpmcmu jpmcmu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana looks good

- Injects trace context in traceparent http header
- Integrates autoconfigure otel SDK
- Adds manual WsWUClient.ping span
- Adds manual WsWUClientTest.ping span
- Adds manual span around getHPCCver and getHPCCContainerizedMode
- Adds manual http span on Utils.Connection.sendHTTPRequest
- Injects traceparent http header on outgoing Utils.Connection.sendHTTPRequest
- Only attempts to autoconfig remotetest if otel.java.global-autoconfig enabled
- Adds root span around platform tester

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana
Copy link
Member Author

@drealeed please take a look and let's discuss how your projects are/will be using OTEL for tracing etc.
Also, a similar change will applied to all rowservice calls immediately after this change in merged.

@rpastrana rpastrana merged commit ce6160c into hpcc-systems:candidate-9.6.x Jun 7, 2024
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants