-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30374 Provide JTrace option to declare span start time #18130
HPCC-30374 Provide JTrace option to declare span start time #18130
Conversation
aeedea5
to
0ac68b5
Compare
0ac68b5
to
8a8f7ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpastrana a few minor comments. A couple of other issues though:
- the commit has the wrong jira reference
- having thought about it some more this change is only partial. The start time will be set for the span, but the duration will not (because that uses start_steady_time). It really needs to have unit tests that are looking at the full span information that is output.
system/jlib/jtrace.cpp
Outdated
@@ -171,6 +171,7 @@ class CTraceManager : implements ITraceManager, public CInterface | |||
CTraceManager(); | |||
IMPLEMENT_IINTERFACE | |||
|
|||
virtual ISpan * createServerSpan(const char * name, const IProperties * httpHeaders, SpanFlags flags, std::chrono::nanoseconds spanStartTime) const override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment: Another solution would be to add a default parameter to the standard constructor. E.g.
std::chrono::nanoseconds spanStartTime = 0
which would avoid a proliferation of constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually started this way but thought you wouldn't appreciate the special default value.
system/jlib/jtrace.cpp
Outdated
CServerSpan(const char * spanName, const IProperties * httpHeaders, SpanFlags flags, std::chrono::nanoseconds spanStartTime) | ||
: CSpan(spanName) | ||
{ | ||
opts.start_system_time = opentelemetry::common::SystemTimestamp(std::chrono::duration_cast<std::chrono::nanoseconds>(spanStartTime)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the duration cast - the argument is already in nanoseconds.
system/jlib/jtrace.hpp
Outdated
@@ -70,6 +72,7 @@ interface ITraceManager : extends IInterface | |||
extern jlib_decl ISpan * getNullSpan(); | |||
extern jlib_decl void initTraceManager(const char * componentName, const IPropertyTree * componentConfig, const IPropertyTree * globalConfig); | |||
extern jlib_decl ITraceManager & queryTraceManager(); | |||
extern jlib_decl std::chrono::nanoseconds nowTimesStampInNanos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many ss nowTimeStamp
testing/unittests/jlibtests.cpp
Outdated
|
||
Owned<IProperties> retrievedSpanCtxAttributes = createProperties(); | ||
bool getSpanCtxSuccess = serverSpan->getSpanContext(retrievedSpanCtxAttributes, false); | ||
CPPUNIT_ASSERT_EQUAL_MESSAGE("Unexpected getSpanContext failure detected", true, getSpanCtxSuccess); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this test is testing - it isn't using nownanos. I think you will need to add the unit tests that look at the trace output before you can test properly.
I'm not sure what you mean by the "wrong jira reference", this is targeting HPCC-30374. Will update comment to ensure syntax is correct. |
@ghalliday please take a look at the new unit test and let me know if you don't find any value in exercising the jtrace calls. At this point we don't have any exporter in place to begin considering capturing the trace output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be designed to be easy for the caller to use, and the unit tests need to check that the values are having the desired effect. I suspect the latter cannot be done until the span output has been integrated into the unit testing.
system/jlib/jtrace.cpp
Outdated
@@ -819,9 +816,16 @@ class CServerSpan : public CSpan | |||
|
|||
//--------------------------------------------------------------------------------------------------------------------- | |||
|
|||
std::chrono::nanoseconds nowTimesStampInNanos() | |||
const std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds> nowSystemClock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design in the second commit hasn't fully considered how this feature will be used. From the user's point of view they want to be able to preserve the start time for the minimum effort. They don't really want to call two separate calls, they ideally want a single function()
SpanTimestamp getSpanTimestamp() {}
or
void getSpanTimestamp(SpanTimestamp &) {}
which returns an object that contains the two start times. There should then be an option SpanTimestamp * parameter to createServerSpan to pass in the times.
The type of the parameters/values is also too verbose. A user should not be using std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>
as a typename in their code.
Either a typedef within the header. (If it is defined to be a time in nanoseconds then using an __uint64 would be sufficient and simpler.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghalliday the design considered usability amongst many other aspects, however I cannot read your mind. Just so you know, several of my iterations cloned the sysClockStart and steadyClockStart, but backtracked to this version because I simply didn't imagine you could find fault w/ this version...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpastrana @ghalliday from a users perspective I was picturing something very simple and that could be used in the code as is. There may be more powerful things you could do, and improvements could be made for example to support nanoseconds, but if we supported the following it would allow our existing code to work very easily:
virtual ISpan * createServerSpan(const char * name, const IProperties * httpHeaders, const, unsigned msStartTIme, SpanFlags flags = SpanFlags::None) const = 0;
Roxie would just pass in the existing variable "qstart" and it would be done.
Internally you would do something like "unsigned elapsed = msTick() - msStartTime;" and then use elapsed to adjust everything else. I didn't look closely enough to see if something is wrong with that, but that's the idea.
ESP already has a similar value as well.
This prevents rearchitecting the servers now. We can make further enhancements in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afishbeck I did notice the ESP keeps a creation timestamp (in ms granularity) in the request context and the current design would likely force a change to track that timestamp in the granularity (nanos) used by OTel internals.
Your suggestion seems ok w/ me if you're ok w/ the potential loss of precision. We'd have to confirm there are no other issues performing the duration casts between millis and a nanos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afishbeck I think I jumped the gun on your suggestion. I assumed the context tracked an actual start timestamp, but it looks like it's more of a tick mark used to calculated elapsed time at completion time?
We're relying on OTel to calculate the span duration time, and only overriding the start time (in std::chrono::nanos) on-demand.
I think your suggestion is to offset that time stamp by a count of millis (elapsed) correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. My suggestion was a way to make it easy to work with existing code. The idea was to use the existing value to figure out the already elapsed time and use that as a delta to correct what Otel would currently generate. But if we want to go ahead and refactor a bit to "do it right" that would be fine. I guess if we don't do it now it may never happen. There are places in both roxie and ESP now that get the current tick and later use that for reporting and other things. We can change that how ever makes sense as long as current uses aren't broken and otels needs are met.
But I think we should decide what that roxie and ESP code will look like when deciding on the design for passing in the start time here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghalliday btw this is what I meant the other day when I said that changing surrounding code can help get the design right, but can also make the current PR bigger :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to do a little more research, but something like this seems to produce reasonable values (now - offsetMS)
systemClockTime = std::chrono::duration_caststd::chrono::nanoseconds(std::chrono::system_clock::now().time_since_epoch() - std::chrono::milliseconds(msTick() - msTicksOffset));
Anyway, have we considered migrating off of mstick in favor of a std::chrono based duration?
std::chrono::steady_clock::now() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't on my radar, but definitely worth considering.
testing/unittests/jlibtests.cpp
Outdated
} | ||
|
||
systemClockTimePoint = nowSystemClock(); | ||
steadyClockTimePoint = nowSteadyClock() + std::chrono::hours(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really testing anything. My previous comment stands - you need to be able to read the values that are generated for duration etc from the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghalliday what I think you mean is that this test isn't programmatically verifying the results, however you surely see the value in providing these tests for manual debugging, API verification, developer level documentation, etc. or am I mistaken about that? Are you asserting this block of code has no value and should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good point.
3ec0612
to
1dc3979
Compare
@ghalliday this latest commit should include all the changed we discussed offline |
@afishbeck you should be able to adjust span starttime based on mstick marks via the |
1dc3979
to
c62932e
Compare
FWIW build failure doesn't seem related. |
steadyClockTime = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::steady_clock::now().time_since_epoch()); | ||
} | ||
|
||
void setMSTickTime(const unsigned int msTickTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this gives us two possible ways of instrumenting ESP and Roxie. Either add add SpanTimeStamp (and maybe get rid of the msStartTime that's there now, or just use the existing msStartTime with this function.
system/jlib/jtrace.cpp
Outdated
@@ -1339,6 +1346,10 @@ ISpan * CTraceManager::createServerSpan(const char * name, const IProperties * h | |||
return new CServerSpan(name, httpHeaders, flags); | |||
} | |||
|
|||
ISpan * CTraceManager::createServerSpan(const char * name, const IProperties * httpHeaders, const SpanTimeStamp * spanStartTimeStamp, SpanFlags flags = SpanFlags::None) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit inconsistent with the other methods, and I think unnecessary, to add the default value for flags SpanFlags::None here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, other options were considered:
- different named method (name, httpheaders, spanStartTimeStamp, SpanFlags flags = SpanFlags::None) // Allows declared starttime, and options span flags
- createServerSpan(name, httpHeaders, SpanFlags flags, const SpanTimeStamp * spanStartTimeStamp) //Forces users to provide spanFlags but is more consistent
Ultimately Gavin and I landed on the somewhat inconsistent , but flexible version w/ optional spanflags param. How do strong is your stance on the consistency aspect, and are you ok proceeding as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I thought this might be unnecessary was because the "SpanFlags flags = SpanFlags::None" is in the .cpp file.. and doesn't look like it's used?
I'm ok with it if there's a reason for it here. And I'm ok with the ones in the header file although I usually try to keep them to a minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@afishbeck I think I misinterpreted your comment, the default parameter here was likely a copy/paste stowaway. Thanks for catching that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks really good. Please squash (and fix typo in the function name.)
testing/unittests/jlibtests.cpp
Outdated
} | ||
|
||
systemClockTimePoint = nowSystemClock(); | ||
steadyClockTimePoint = nowSteadyClock() + std::chrono::hours(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a good point.
system/jlib/jtrace.cpp
Outdated
: CSpan(spanName) | ||
{ | ||
if (spanStartTimeStamp && spanStartTimeStamp->isInitilized()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: typo missing a in isInitialized()
testing/unittests/jlibtests.cpp
Outdated
Owned<IProperties> emptyMockHTTPHeaders = createProperties(); | ||
SpanTimeStamp declaredSpanStartTime; | ||
declaredSpanStartTime.now(); // must be initialized via now(), or setMSTickTime | ||
std::this_thread::sleep_for(std::chrono::milliseconds(50)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info: Most code uses MilliSleep(50)
496aab3
to
c55f19f
Compare
MilliSleep(125); | ||
|
||
{ | ||
//duration should be at least 125 milliseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I squashed the commits, I wanted to point out these new comments and varying sleep times and also actual CPPUNIT_ASSERTs on uninitialized states.
- Exposes helper method to obtain system and steady clock time_points - Exposes createServerSpan function w/ optional starttime params - Adds simple cppunit tests excersizing both newly exposed functions Signed-off-by: Rodrigo Pastrana <[email protected]>
c55f19f
to
b7bedba
Compare
@ghalliday I think we're good to merge |
This is good to merge, except it is triggering a problem in wasmembed. @GordonSmith it could possibly be a missing You could look at the symbols exported from abi.cpp and see how they differ from the symbols the other file is expecting. |
f079777
to
b7bedba
Compare
bac2dae
into
hpcc-systems:candidate-9.4.x
Type of change:
Checklist:
Smoketest:
Testing: