Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-29334 Associate log entries with Active JTrace info #18556

Merged

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Apr 18, 2024

  • Add support for JLog Trace and Span columns
  • Update logging README with new cols
  • Avoids uncaught exceptions
  • Adds trace/span fields as default log layout
  • Adds trace/span fields to unittests log output
  • Adds various active span unittests
  • Ends span before reverting active threaded span so jlog output includes trace/span columns

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@rpastrana rpastrana requested a review from ghalliday April 18, 2024 19:28
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-29334

Jirabot Action Result:
Additional PR: #18556

@rpastrana
Copy link
Member Author

Sample log output:

15:07:56.295080 3788 UNK UNK Loading /home/ubuntu/GIT/HPCC-Platform/build/Debug/libs/libesphttp.so
15:07:56.300915 3788 UNK UNK Loading /home/ubuntu/GIT/HPCC-Platform/build/Debug/libs/libdatamasker.so
15:07:56.343912 3788 UNK UNK Including test JlibTraceTest
15:07:56.369276 3788 beca49ca8f3138a2842e5cf21402bfff bf525529dc1db08f { "type": "span", "name": "StringArrayPropegatedServerSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "bf525529dc1db08f", "start": 1713467276368255642, "duration": 452000, "parent_span_id": "4b960b3e4647da3f", "trace_state": "hpcc=4b960b3e4647da3f", "attributes": {"id.local": "Jeod4LbAL6oEjtPqgWRjJA","id.caller": "IncomingCID","id.global": "someGlobalID" } }
15:07:56.370542 3788 beca49ca8f3138a2842e5cf21402bfff 4d043032b8a5aca2 { "type": "span", "name": "propegatedServerSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "4d043032b8a5aca2", "start": 1713467276370243840, "duration": 180000, "parent_span_id": "4b960b3e4647da3f", "trace_state": "hpcc=4b960b3e4647da3f", "attributes": {"id.local": "Jeod4LfQABmZeMQovCsHF1","id.caller": "IncomingCID","id.global": "IncomingUGID" } }
15:07:56.371453 3788 382e010621714be208aab9237b8b4b57 ff4b24226c0b0f21 { "type": "span", "name": "propegatedServerSpan", "trace_id": "382e010621714be208aab9237b8b4b57", "span_id": "ff4b24226c0b0f21", "start": 1713467276371305539, "duration": 99500 }
15:07:56.372359 3788 beca49ca8f3138a2842e5cf21402bfff 980bfc94cd3a7d06 { "type": "span", "name": "propegatedServerSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "980bfc94cd3a7d06", "start": 1713467276372105038, "duration": 174200, "parent_span_id": "4b960b3e4647da3f", "trace_state": "hpcc=4b960b3e4647da3f", "attributes": {"id.local": "Jeod4LgmqBGJju45jH4TSC","id.caller": "IncomingCID","id.global": "IncomingUGID" } }
15:07:56.373071 3788 dfbbe5bd730411ce5d46eebc4d4c3084 2de1ce76c2c9a3ec testInvalidPropegatedServerSpan: traceparent: (null)
15:07:56.373305 3788 dfbbe5bd730411ce5d46eebc4d4c3084 2de1ce76c2c9a3ec { "type": "span", "name": "invalidPropegatedServerSpan", "trace_id": "dfbbe5bd730411ce5d46eebc4d4c3084", "span_id": "2de1ce76c2c9a3ec", "start": 1713467276372946537, "duration": 264200, "attributes": {"id.local": "Jeod4Lhm1iFFVcDWbTjDx6","id.caller": "IncomingCID","id.global": "IncomingUGID" } }
15:07:56.374001 3788 9e788617bdd2f3d84a974406657118cc b972e074823f9a8c { "type": "span", "name": "internalSpan", "trace_id": "9e788617bdd2f3d84a974406657118cc", "span_id": "b972e074823f9a8c", "start": 1713467276373867537, "duration": 83299, "parent_span_id": "b469b4e6bd1e066c" }
15:07:56.374244 3788 9e788617bdd2f3d84a974406657118cc b469b4e6bd1e066c { "type": "span", "name": "propegatedServerSpan", "trace_id": "9e788617bdd2f3d84a974406657118cc", "span_id": "b469b4e6bd1e066c", "start": 1713467276373716537, "duration": 470399 }
15:07:56.376008 3788 beca49ca8f3138a2842e5cf21402bfff 29ca52417a702569 { "type": "span", "name": "internalSpan2", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "29ca52417a702569", "start": 1713467276375382435, "duration": 551999, "parent_span_id": "9df5acd72a78a1ab", "trace_state": "hpcc=4b960b3e4647da3f" }
15:07:56.376320 3788 beca49ca8f3138a2842e5cf21402bfff 9df5acd72a78a1ab { "type": "span", "name": "internalSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "9df5acd72a78a1ab", "start": 1713467276375309435, "duration": 939899, "parent_span_id": "1e3a6757491310e3", "trace_state": "hpcc=4b960b3e4647da3f" }
15:07:56.376610 3788 beca49ca8f3138a2842e5cf21402bfff 1e3a6757491310e3 { "type": "span", "name": "clientSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "1e3a6757491310e3", "start": 1713467276375226335, "duration": 1314799, "parent_span_id": "53f4b20e911de72c", "trace_state": "hpcc=4b960b3e4647da3f" }
15:07:56.377072 3788 beca49ca8f3138a2842e5cf21402bfff 53f4b20e911de72c { "type": "span", "name": "propegatedServerSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "53f4b20e911de72c", "start": 1713467276375032735, "duration": 1878799, "parent_span_id": "4b960b3e4647da3f", "trace_state": "hpcc=4b960b3e4647da3f", "attributes": {"id.local": "Jeod4Ljab1fv6Mm9TKJ84g","id.caller": "IncomingCID","id.global": "IncomingUGID" } }
15:07:56.378222 3788 beca49ca8f3138a2842e5cf21402bfff 306b4311c1ff6995 { "type": "span", "name": "clientSpanWithGlobalID", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "306b4311c1ff6995", "start": 1713467276378069632, "duration": 97300, "parent_span_id": "9bc614a753d6efb3", "trace_state": "hpcc=4b960b3e4647da3f" }
15:07:56.378493 3788 beca49ca8f3138a2842e5cf21402bfff 9bc614a753d6efb3 { "type": "span", "name": "propegatedServerSpan", "trace_id": "beca49ca8f3138a2842e5cf21402bfff", "span_id": "9bc614a753d6efb3", "start": 1713467276377903733, "duration": 494499, "parent_span_id": "4b960b3e4647da3f", "trace_state": "hpcc=4b960b3e4647da3f", "attributes": {"id.local": "Jeod4LoAbzUAn4hTKKz5FS","id.caller": "IncomingCID","id.global": "IncomingUGID" } }
15:07:56.379018 3788 4bf3dca350947662d7da854a58f71738 cbc9b31dd65f4e17 { "type": "span", "name": "noRemoteParentEnsureTraceID", "trace_id": "4bf3dca350947662d7da854a58f71738", "span_id": "cbc9b31dd65f4e17", "start": 1713467276378910032, "duration": 59699 }
15:07:56.379690 3788 8d477a138f5253f198babd1b9e5e1b23 831e8a512a114ffd { "type": "span", "name": "spanWithEventsNoAtts", "trace_id": "8d477a138f5253f198babd1b9e5e1b23", "span_id": "831e8a512a114ffd", "start": 1713467276379392831, "duration": 243800 }
15:07:56.380535 3788 07f090cd266788f56e0bf033612d9bf3 05bdb03d3b0e4ea8 { "type": "span", "name": "spanWithEvent1Att", "trace_id": "07f090cd266788f56e0bf033612d9bf3", "span_id": "05bdb03d3b0e4ea8", "start": 1713467276380054930, "duration": 420300 }
15:07:56.381186 3788 146f0b7bce7634f8af8d07d2c23f7f8b df552a8ad1e67521 { "type": "span", "name": "spanWith2Events", "trace_id": "146f0b7bce7634f8af8d07d2c23f7f8b", "span_id": "df552a8ad1e67521", "start": 1713467276380882030, "duration": 246099 }
15:07:56.381876 3788 b6154d36eb72068f08c0db3ddcfb589e 87d42f4269770c3c { "type": "span", "name": "defaultErrorSpan", "trace_id": "b6154d36eb72068f08c0db3ddcfb589e", "span_id": "87d42f4269770c3c", "start": 1713467276381591029, "duration": 230200 }
15:07:56.382387 3788 2d374651765d32683fbf96b525cd2bb9 679702c623e2e79c { "type": "span", "name": "defaultErrorSpanStruct", "trace_id": "2d374651765d32683fbf96b525cd2bb9", "span_id": "679702c623e2e79c", "start": 1713467276382219428, "duration": 115500 }
15:07:56.382932 3788 e444da68c632db0fcac679acdaab72ca 4384ae7252fe211c { "type": "span", "name": "failedErrorSpanEscaped", "trace_id": "e444da68c632db0fcac679acdaab72ca", "span_id": "4384ae7252fe211c", "start": 1713467276382763428, "duration": 116800 }
15:07:56.383779 3788 8e311fc1dd712d22b2fbc3d5575d66a2 700459f3bafe44de { "type": "span", "name": "failedErrEscapedMsgErrCode", "trace_id": "8e311fc1dd712d22b2fbc3d5575d66a2", "span_id": "700459f3bafe44de", "start": 1713467276383272327, "duration": 447000 }
15:07:56.384299 3788 a09dc7fa0d7f22dfda9314725d7491c6 90eee15592457c4d { "type": "span", "name": "containsErrorAndMessageSpan", "trace_id": "a09dc7fa0d7f22dfda9314725d7491c6", "span_id": "90eee15592457c4d", "start": 1713467276384132526, "duration": 115300 }
15:07:56.384826 3788 be496ce3a12204b77cbebda946a3cfe7 e8530f90fdec6c4e { "type": "span", "name": "containsErrorAndMessageFailedNotEscapedSpan", "trace_id": "be496ce3a12204b77cbebda946a3cfe7", "span_id": "e8530f90fdec6c4e", "start": 1713467276384641626, "duration": 133000 }
15:07:56.385389 3788 742690dae03d2379aa2a3c74884de4ac 48a02a1ce13fbdfd { "type": "span", "name": "mockExceptionSpanNotFailedNotEscaped", "trace_id": "742690dae03d2379aa2a3c74884de4ac", "span_id": "48a02a1ce13fbdfd", "start": 1713467276385166825, "duration": 169500 }
15:07:56.386321 3788 335ddb9a34f44d9c7044b1a7ca03f582 cf2c507ad8cb1b05 { "type": "span", "name": "thrownExceptionSpan", "trace_id": "335ddb9a34f44d9c7044b1a7ca03f582", "span_id": "cf2c507ad8cb1b05", "start": 1713467276385733025, "duration": 524899 }
15:07:56.387132 3788 2d8156ae0a0b5baa6b25bd9ba7afbcbb 0d7be13833a83bd0 mySpan is alive
15:07:56.387529 3788 2d8156ae0a0b5baa6b25bd9ba7afbcbb 0d7be13833a83bd0 { "type": "span", "name": "mySpan", "trace_id": "2d8156ae0a0b5baa6b25bd9ba7afbcbb", "span_id": "0d7be13833a83bd0", "start": 1713467276387054523, "duration": 353100 }
15:07:56.387802 3788 UNK UNK mySpan is finished
15:07:56.490232 3788 720009f32a9db04f68f2b6545717ebe5 a7ef8749b5926acf This log entry should report traceID: '720009f32a9db04f68f2b6545717ebe5' and spanID: 'a7ef8749b5926acf'
15:07:56.490679 3788 720009f32a9db04f68f2b6545717ebe5 a7ef8749b5926acf { "type": "span", "name": "currentSpanScope", "trace_id": "720009f32a9db04f68f2b6545717ebe5", "span_id": "a7ef8749b5926acf", "start": 1713467276489953722, "duration": 559000, "attributes": {"id.local": "Jeod4NuuXU2i8a1yfHQvQG","id.caller": "IncomingCID","id.global": "IncomingUGID" } }
15:07:56.616624 3788 f1d0a588e763a986b02c3c5fbbf580ea bf74202de00fbfb8 { "type": "span", "name": "declaredSpanStartTime", "trace_id": "f1d0a588e763a986b02c3c5fbbf580ea", "span_id": "bf74202de00fbfb8", "start": 1713467276491061620, "duration": 125510776 }
15:07:56.667333 3788 80e911d331e74170faeedccc9d455e81 a6f0d89fee7e1378 { "type": "span", "name": "msTickOffsetStartTime", "trace_id": "80e911d331e74170faeedccc9d455e81", "span_id": "a6f0d89fee7e1378", "start": 1713467276616795296, "duration": 50486650 }
15:07:56.667501 3788 UNK UNK MsTickOffset span actual start-time timestamp: 1713467276666978246
15:07:56.743186 3788 d1842756455fb54d5700a6f8b2e87583 ccf6ae6ea1ebf579 { "type": "span", "name": "uninitializeddeclaredSpanStartTime", "trace_id": "d1842756455fb54d5700a6f8b2e87583", "span_id": "ccf6ae6ea1ebf579", "start": 1713467276667736145, "duration": 75293526 }

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One question about the implementation.

@rpastrana
Copy link
Member Author

Looks good. One question about the implementation.

Sounds good. Did the question get published?

@rpastrana rpastrana requested a review from ghalliday April 25, 2024 18:55
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where the comment went, I must have failed to add.

void setTraceID(const char * theTraceID) { traceID.set(theTraceID);};
void setSpanID(const char * theSpanID) { spanID.set(theSpanID);};
private:
StringAttr traceID = "UNK";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider storing an Owned in this class? It would avoid the string clones

MSGFIELD_all = 0xFFFFFF
} LogMsgField;

#ifdef _WIN32
#define MSGFIELD_STANDARD LogMsgField(MSGFIELD_timeDate | MSGFIELD_msgID | MSGFIELD_process | MSGFIELD_thread | MSGFIELD_code | MSGFIELD_quote | MSGFIELD_prefix | MSGFIELD_audience)
#define MSGFIELD_LEGACY LogMsgField(MSGFIELD_timeDate | MSGFIELD_milliTime | MSGFIELD_msgID | MSGFIELD_process | MSGFIELD_thread | MSGFIELD_code | MSGFIELD_quote | MSGFIELD_prefix)
#else

//should trace/span be included in the standard fields? As-is, stderr output will not include trace/span
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment no longer applies

@rpastrana rpastrana requested a review from ghalliday April 29, 2024 15:51
@rpastrana rpastrana force-pushed the HPCC-29334-60traceidlogs branch from 1855920 to f00f933 Compare April 29, 2024 15:53
@ghalliday
Copy link
Member

Appologies.... Looking back I see that my comment was truncated by github, so it wasn't clear. What I meant to say was
"Did you consider storing an Owned<ISpan> in this class?", but unfortunately the <....> is hidden.

What I was suggesting was something like this:

class jlib_decl LogMsgTraceInfo
{
public:
    LogMsgTraceInfo() = default;
    LogMsgTraceInfo(ISpan * _span) : span(_span)
    {
    }

    const char * queryTraceID() const
    {
        if (span)
        {
            const char * traceId = span->queryTraceId();
            if (traceId)
                return traceId;
        }
        return UNK_LOG_ENTRY;
    }

    const char * querySpanID() const
    {
        if (span)
        {
            const char * spanId = span->querySpanId();
            if (spanId)
                return spanId;
        }
        return UNK_LOG_ENTRY;
    }

private:
    Linked<ISpan> span;
};

, where queryTraceid() and querySpanId() were added as members of ISpan.

@rpastrana
Copy link
Member Author

Appologies.... Looking back I see that my comment was truncated by github, so it wasn't clear. What I meant to say was "Did you consider storing an Owned in this class?", but unfortunately the <....> is hidden.

What I was suggesting was something like this:

class jlib_decl LogMsgTraceInfo
{
public:
    LogMsgTraceInfo() = default;
    LogMsgTraceInfo(ISpan * _span) : span(_span)
    {
    }

    const char * queryTraceID() const
    {
        if (span)
        {
            const char * traceId = span->queryTraceId();
            if (traceId)
                return traceId;
        }
        return UNK_LOG_ENTRY;
    }

    const char * querySpanID() const
    {
        if (span)
        {
            const char * spanId = span->querySpanId();
            if (spanId)
                return spanId;
        }
        return UNK_LOG_ENTRY;
    }

private:
    Linked<ISpan> span;
};

, where queryTraceid() and querySpanId() were added as members of ISpan.

Yeah, I figured as much, but w/out the queryTrace/Span, it didn't make sense. I don't know if you recall, but I originally had those 2 methods in the ispan interface and was asked to remove in a code review.
So, are we in agreement this particular PR should include the changes you proposed?

@ghalliday
Copy link
Member

Yeah, I figured as much, but w/out the queryTrace/Span, it didn't make sense. I don't know if you recall, but I originally had those 2 methods in the ispan interface and was asked to remove in a code review. So, are we in agreement this particular PR should include the changes you proposed?

Yes please. You were obviously more forward looking than me!

@rpastrana rpastrana force-pushed the HPCC-29334-60traceidlogs branch from f00f933 to 6fe65db Compare May 3, 2024 02:36
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please squash and remove the logging left in.

@@ -791,6 +791,16 @@ class CSpan : public CInterfaceOf<ISpan>
recordError(SpanError(msg.str(), e->errorCode(), spanFailed, escapedScope));
};

virtual const char * queryTraceId() const override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One things I noticed. The trace id could only be stored in the server span, and all other spans could call parent->querySpan(). Definitely NOT something to do in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -442,7 +463,7 @@ class JlibTraceTest : public CppUnit::TestFixture
{
Owned<IProperties> emptyMockHTTPHeaders = createProperties();
OwnedSpanScope serverSpan = queryTraceManager().createServerSpan("propegatedServerSpan", emptyMockHTTPHeaders);

ERRLOG("Random");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have these been left in by accident?

@rpastrana rpastrana force-pushed the HPCC-29334-60traceidlogs branch from 46b7604 to 192720f Compare May 9, 2024 16:18
@rpastrana rpastrana requested a review from ghalliday May 9, 2024 17:04
@rpastrana
Copy link
Member Author

Squashed.
This feature does cause a "problem" we might want to address first (at least for ALA):
https://hpccsystems.atlassian.net/browse/HPCC-31791

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved to merge, but I will hold off merging until I have recommendation about the linked issue.

@rpastrana rpastrana requested a review from ghalliday May 14, 2024 16:38
@rpastrana
Copy link
Member Author

@ghalliday chose to make the trace log columns optional, and provided sample configuration.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana looks good. Please squash.
Is the plan to add support elsewhere, and then change the defaults once that support is in place?

@rpastrana
Copy link
Member Author

@rpastrana looks good. Please squash. Is the plan to add support elsewhere, and then change the defaults once that support is in place?

Yes, it's the safest approach because we would lose ALA LogAccess if we forced this prematurely. I'd rather have the end users explicitly add the columns via configuration (for now)

- Add support for JLog Trace and Span columns
- Update logging README with new cols
- Avoids uncaught exceptions
- Adds  trace/span fields to unittests log output
- Adds various active span unittests
- Ends span before reverting active threaded span so jlog output includes trace/span columns
- Adds ISpan query Trace/Span ID methods
- Jlog tracks linked Ispan
- Makes trace/span Id log columns optional
- Updates README, provides sample

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-29334-60traceidlogs branch from cc2fa54 to 0d3a29c Compare May 15, 2024 13:28
@rpastrana
Copy link
Member Author

@ghalliday squashed

@ghalliday ghalliday merged commit 4a181eb into hpcc-systems:candidate-9.6.x May 17, 2024
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants