Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC4J-636 DFSClient: Improve Opentelemetry tracing #743

Merged
merged 1 commit into from
Sep 4, 2024

Conversation

jpmcmu
Copy link
Contributor

@jpmcmu jpmcmu commented Aug 30, 2024

  • Improved span names
  • Transitioned read request events to read spans
  • Move events to spans for connect, version and close
  • Added span batch support

Signed-off-by: James McMullan [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change is a breaking change (fix or feature that will cause existing behavior to change).

Checklist:

  • I have created a corresponding JIRA ticket for this submission
  • My code follows the code style of this project.
    • I have applied the Eclipse code-format template provided.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the HPCC Systems CONTRIBUTORS document (https://github.com/hpcc-systems/HPCC-Platform/wiki/Guide-for-contributors).
  • The change has been fully tested:
    • This change does not cause any existing JUnits to fail.
    • I have include JUnit coverage to test this change
    • I have performed system test and covered possible regressions and side effects.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Premature optimization
    • This change fixes the problem, not just the symptom

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net/browse/HPCC4J-636

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@jpmcmu
Copy link
Contributor Author

jpmcmu commented Aug 30, 2024

@rpastrana Example tracing output, bottom screenshot shows attribute arrays for individual requests in the batch. Still looking into request times being incorrect.

Screenshot 2024-08-30 at 2 16 18 PM
Screenshot 2024-08-30 at 2 18 32 PM

@jpmcmu jpmcmu requested a review from rpastrana August 30, 2024 18:19
@jpmcmu
Copy link
Contributor Author

jpmcmu commented Aug 30, 2024

Screenshot with missing request added in:
Screenshot 2024-08-30 at 2 52 33 PM

Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpmcmu looks pretty good. Left a minor comment about reporting numeric values as strings, simply for presentation purposes

@jpmcmu
Copy link
Contributor Author

jpmcmu commented Sep 4, 2024

Updated screenshot with timestamp fix:
Screenshot 2024-09-04 at 11 08 21 AM

@jpmcmu jpmcmu requested a review from rpastrana September 4, 2024 15:09
Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpmcmu just a couple of comments

@@ -156,7 +156,7 @@ public HPCCRemoteFileWriter(FileWriteContext ctx, DataPartition dp, IRecordAcces

Attributes attributes = Attributes.of( AttributeKey.stringKey("server.0.address"), primaryIP,
AttributeKey.stringKey("server.1.address"), secondaryIP,
ServerAttributes.SERVER_PORT, Long.valueOf(dp.getPort()));
AttributeKey.stringKey("server.port"), "" + dp.getPort());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless this approach performs better, we should use explicit functionality like String.valueOf()

@@ -865,7 +867,7 @@ private void startNewReadRequestSpan()
readRequestCount++;

currentReadRequestEvent = new ReadRequestEvent();
currentReadRequestEvent.requestTime = System.nanoTime();
currentReadRequestEvent.requestTime = System.currentTimeMillis();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know why nano time doesn't work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it was getting truncated by something in the OTEL stack

- Improved span names
- Transitioned read request events to read spans
- Move events to spans for connect, version and close
- Added span batch support

Signed-off-by: James McMullan [email protected]
@jpmcmu
Copy link
Contributor Author

jpmcmu commented Sep 4, 2024

@rpastrana Changed string conversion and squashed

@rpastrana rpastrana merged commit 61ece0f into hpcc-systems:candidate-9.6.x Sep 4, 2024
3 of 5 checks passed
Copy link

github-actions bot commented Sep 4, 2024

Jirabot Action Result:
Added fix version: 9.6.46
Added fix version: 9.8.20
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants