Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30374 Provide JTrace option to declare span start time #18130

Conversation

rpastrana
Copy link
Member

@rpastrana rpastrana commented Dec 8, 2023

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@rpastrana rpastrana force-pushed the HPCC-30374-SpanStartTimeOverride branch 6 times, most recently from aeedea5 to 0ac68b5 Compare December 8, 2023 21:13
@rpastrana rpastrana force-pushed the HPCC-30374-SpanStartTimeOverride branch from 0ac68b5 to 8a8f7ec Compare December 8, 2023 21:31
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana a few minor comments. A couple of other issues though:

  • the commit has the wrong jira reference
  • having thought about it some more this change is only partial. The start time will be set for the span, but the duration will not (because that uses start_steady_time). It really needs to have unit tests that are looking at the full span information that is output.

@@ -171,6 +171,7 @@ class CTraceManager : implements ITraceManager, public CInterface
CTraceManager();
IMPLEMENT_IINTERFACE

virtual ISpan * createServerSpan(const char * name, const IProperties * httpHeaders, SpanFlags flags, std::chrono::nanoseconds spanStartTime) const override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment: Another solution would be to add a default parameter to the standard constructor. E.g.

std::chrono::nanoseconds spanStartTime = 0

which would avoid a proliferation of constructors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually started this way but thought you wouldn't appreciate the special default value.

CServerSpan(const char * spanName, const IProperties * httpHeaders, SpanFlags flags, std::chrono::nanoseconds spanStartTime)
: CSpan(spanName)
{
opts.start_system_time = opentelemetry::common::SystemTimestamp(std::chrono::duration_cast<std::chrono::nanoseconds>(spanStartTime));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the duration cast - the argument is already in nanoseconds.

@@ -70,6 +72,7 @@ interface ITraceManager : extends IInterface
extern jlib_decl ISpan * getNullSpan();
extern jlib_decl void initTraceManager(const char * componentName, const IPropertyTree * componentConfig, const IPropertyTree * globalConfig);
extern jlib_decl ITraceManager & queryTraceManager();
extern jlib_decl std::chrono::nanoseconds nowTimesStampInNanos();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many ss nowTimeStamp


Owned<IProperties> retrievedSpanCtxAttributes = createProperties();
bool getSpanCtxSuccess = serverSpan->getSpanContext(retrievedSpanCtxAttributes, false);
CPPUNIT_ASSERT_EQUAL_MESSAGE("Unexpected getSpanContext failure detected", true, getSpanCtxSuccess);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this test is testing - it isn't using nownanos. I think you will need to add the unit tests that look at the trace output before you can test properly.

@rpastrana
Copy link
Member Author

@rpastrana a few minor comments. A couple of other issues though:

* the commit has the wrong jira reference

* having thought about it some more this change is only partial.  The start time will be set for the span, but the duration will not (because that uses start_steady_time).  It really needs to have unit tests that are looking at the full span information that is output.

I'm not sure what you mean by the "wrong jira reference", this is targeting HPCC-30374. Will update comment to ensure syntax is correct.
Yes, I had mentioned the 2nd parameter.

@rpastrana rpastrana requested a review from ghalliday December 13, 2023 02:19
@rpastrana rpastrana changed the title Hpcc 30374 span start time override HPCC-30474 Provide JTrace option to declare span start time Dec 13, 2023
@rpastrana
Copy link
Member Author

@rpastrana a few minor comments. A couple of other issues though:

* the commit has the wrong jira reference

* having thought about it some more this change is only partial.  The start time will be set for the span, but the duration will not (because that uses start_steady_time).  It really needs to have unit tests that are looking at the full span information that is output.

@ghalliday please take a look at the new unit test and let me know if you don't find any value in exercising the jtrace calls. At this point we don't have any exporter in place to begin considering capturing the trace output.
Would you care to use the inmemory exporter for cppunit type testing of recordable data?

https://github.com/open-telemetry/opentelemetry-cpp/blob/96e5078cfffbf35dca590326e321db607730d3f6/sdk/test/trace/tracer_test.cc#L205

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be designed to be easy for the caller to use, and the unit tests need to check that the values are having the desired effect. I suspect the latter cannot be done until the span output has been integrated into the unit testing.

@@ -819,9 +816,16 @@ class CServerSpan : public CSpan

//---------------------------------------------------------------------------------------------------------------------

std::chrono::nanoseconds nowTimesStampInNanos()
const std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds> nowSystemClock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design in the second commit hasn't fully considered how this feature will be used. From the user's point of view they want to be able to preserve the start time for the minimum effort. They don't really want to call two separate calls, they ideally want a single function()

SpanTimestamp getSpanTimestamp() {}
or
void getSpanTimestamp(SpanTimestamp &) {}

which returns an object that contains the two start times. There should then be an option SpanTimestamp * parameter to createServerSpan to pass in the times.

The type of the parameters/values is also too verbose. A user should not be using std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds> as a typename in their code.

Either a typedef within the header. (If it is defined to be a time in nanoseconds then using an __uint64 would be sufficient and simpler.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday the design considered usability amongst many other aspects, however I cannot read your mind. Just so you know, several of my iterations cloned the sysClockStart and steadyClockStart, but backtracked to this version because I simply didn't imagine you could find fault w/ this version...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpastrana @ghalliday from a users perspective I was picturing something very simple and that could be used in the code as is. There may be more powerful things you could do, and improvements could be made for example to support nanoseconds, but if we supported the following it would allow our existing code to work very easily:

virtual ISpan * createServerSpan(const char * name, const IProperties * httpHeaders, const, unsigned msStartTIme, SpanFlags flags = SpanFlags::None) const = 0;

Roxie would just pass in the existing variable "qstart" and it would be done.

Internally you would do something like "unsigned elapsed = msTick() - msStartTime;" and then use elapsed to adjust everything else. I didn't look closely enough to see if something is wrong with that, but that's the idea.

ESP already has a similar value as well.

This prevents rearchitecting the servers now. We can make further enhancements in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afishbeck I did notice the ESP keeps a creation timestamp (in ms granularity) in the request context and the current design would likely force a change to track that timestamp in the granularity (nanos) used by OTel internals.
Your suggestion seems ok w/ me if you're ok w/ the potential loss of precision. We'd have to confirm there are no other issues performing the duration casts between millis and a nanos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afishbeck I think I jumped the gun on your suggestion. I assumed the context tracked an actual start timestamp, but it looks like it's more of a tick mark used to calculated elapsed time at completion time?

We're relying on OTel to calculate the span duration time, and only overriding the start time (in std::chrono::nanos) on-demand.
I think your suggestion is to offset that time stamp by a count of millis (elapsed) correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. My suggestion was a way to make it easy to work with existing code. The idea was to use the existing value to figure out the already elapsed time and use that as a delta to correct what Otel would currently generate. But if we want to go ahead and refactor a bit to "do it right" that would be fine. I guess if we don't do it now it may never happen. There are places in both roxie and ESP now that get the current tick and later use that for reporting and other things. We can change that how ever makes sense as long as current uses aren't broken and otels needs are met.

But I think we should decide what that roxie and ESP code will look like when deciding on the design for passing in the start time here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday btw this is what I meant the other day when I said that changing surrounding code can help get the design right, but can also make the current PR bigger :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to do a little more research, but something like this seems to produce reasonable values (now - offsetMS)
systemClockTime = std::chrono::duration_caststd::chrono::nanoseconds(std::chrono::system_clock::now().time_since_epoch() - std::chrono::milliseconds(msTick() - msTicksOffset));

Anyway, have we considered migrating off of mstick in favor of a std::chrono based duration?
std::chrono::steady_clock::now() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't on my radar, but definitely worth considering.

}

systemClockTimePoint = nowSystemClock();
steadyClockTimePoint = nowSteadyClock() + std::chrono::hours(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really testing anything. My previous comment stands - you need to be able to read the values that are generated for duration etc from the logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday what I think you mean is that this test isn't programmatically verifying the results, however you surely see the value in providing these tests for manual debugging, API verification, developer level documentation, etc. or am I mistaken about that? Are you asserting this block of code has no value and should be removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good point.

@rpastrana rpastrana force-pushed the HPCC-30374-SpanStartTimeOverride branch from 3ec0612 to 1dc3979 Compare December 14, 2023 17:24
@rpastrana rpastrana requested a review from ghalliday December 14, 2023 17:25
@rpastrana
Copy link
Member Author

rpastrana commented Dec 14, 2023

@ghalliday this latest commit should include all the changed we discussed offline
Also, corrected the commit message and amended

@rpastrana
Copy link
Member Author

@afishbeck you should be able to adjust span starttime based on mstick marks via the
new SpanTimeStamp struct, use setMSTickTime(const unsigned int msTickTime) as illustrated in "manualTestsDeclaredSpanStartTime" cppunit test.

@rpastrana rpastrana force-pushed the HPCC-30374-SpanStartTimeOverride branch from 1dc3979 to c62932e Compare December 14, 2023 17:31
@rpastrana rpastrana changed the title HPCC-30474 Provide JTrace option to declare span start time HPCC-30374 Provide JTrace option to declare span start time Dec 14, 2023
@rpastrana
Copy link
Member Author

FWIW build failure doesn't seem related.

steadyClockTime = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::steady_clock::now().time_since_epoch());
}

void setMSTickTime(const unsigned int msTickTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this gives us two possible ways of instrumenting ESP and Roxie. Either add add SpanTimeStamp (and maybe get rid of the msStartTime that's there now, or just use the existing msStartTime with this function.

@@ -1339,6 +1346,10 @@ ISpan * CTraceManager::createServerSpan(const char * name, const IProperties * h
return new CServerSpan(name, httpHeaders, flags);
}

ISpan * CTraceManager::createServerSpan(const char * name, const IProperties * httpHeaders, const SpanTimeStamp * spanStartTimeStamp, SpanFlags flags = SpanFlags::None) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit inconsistent with the other methods, and I think unnecessary, to add the default value for flags SpanFlags::None here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, other options were considered:

  • different named method (name, httpheaders, spanStartTimeStamp, SpanFlags flags = SpanFlags::None) // Allows declared starttime, and options span flags
  • createServerSpan(name, httpHeaders, SpanFlags flags, const SpanTimeStamp * spanStartTimeStamp) //Forces users to provide spanFlags but is more consistent

Ultimately Gavin and I landed on the somewhat inconsistent , but flexible version w/ optional spanflags param. How do strong is your stance on the consistency aspect, and are you ok proceeding as-is?

Copy link
Member

@afishbeck afishbeck Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I thought this might be unnecessary was because the "SpanFlags flags = SpanFlags::None" is in the .cpp file.. and doesn't look like it's used?

I'm ok with it if there's a reason for it here. And I'm ok with the ones in the header file although I usually try to keep them to a minimum.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afishbeck I think I misinterpreted your comment, the default parameter here was likely a copy/paste stowaway. Thanks for catching that.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks really good. Please squash (and fix typo in the function name.)

}

systemClockTimePoint = nowSystemClock();
steadyClockTimePoint = nowSteadyClock() + std::chrono::hours(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good point.

: CSpan(spanName)
{
if (spanStartTimeStamp && spanStartTimeStamp->isInitilized())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: typo missing a in isInitialized()

Owned<IProperties> emptyMockHTTPHeaders = createProperties();
SpanTimeStamp declaredSpanStartTime;
declaredSpanStartTime.now(); // must be initialized via now(), or setMSTickTime
std::this_thread::sleep_for(std::chrono::milliseconds(50));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info: Most code uses MilliSleep(50)

@rpastrana rpastrana force-pushed the HPCC-30374-SpanStartTimeOverride branch from 496aab3 to c55f19f Compare December 15, 2023 20:42
MilliSleep(125);

{
//duration should be at least 125 milliseconds
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I squashed the commits, I wanted to point out these new comments and varying sleep times and also actual CPPUNIT_ASSERTs on uninitialized states.

- Exposes helper method to obtain system and steady clock time_points
- Exposes createServerSpan function w/ optional starttime params
- Adds simple cppunit tests excersizing both newly exposed functions

Signed-off-by: Rodrigo Pastrana <[email protected]>
@rpastrana rpastrana force-pushed the HPCC-30374-SpanStartTimeOverride branch from c55f19f to b7bedba Compare December 18, 2023 18:55
@rpastrana
Copy link
Member Author

@ghalliday I think we're good to merge

@ghalliday
Copy link
Member

This is good to merge, except it is triggering a problem in wasmembed.

@GordonSmith it could possibly be a missing
#include

You could look at the symbols exported from abi.cpp and see how they differ from the symbols the other file is expecting.

@GordonSmith GordonSmith force-pushed the HPCC-30374-SpanStartTimeOverride branch from f079777 to b7bedba Compare December 19, 2023 15:09
@ghalliday ghalliday merged commit bac2dae into hpcc-systems:candidate-9.4.x Dec 19, 2023
88 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants