Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30599 Fix file access costs for keyed join #18023

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Nov 13, 2023

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

system/jlib/jstats.cpp Outdated Show resolved Hide resolved
@shamser shamser force-pushed the issue30599 branch 2 times, most recently from bf1859f to 6fd995b Compare November 13, 2023 16:59
@shamser shamser marked this pull request as ready for review November 13, 2023 17:00
@shamser shamser requested a review from jakesmith November 13, 2023 17:00
@shamser shamser marked this pull request as draft November 14, 2023 13:05
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments.

extern da_decl cost_type calcFileAccessCost(IDistributedFile *f, __int64 numDiskWrites, __int64 numDiskReads)
{
StringBuffer clusterName;
f->getClusterName(0, clusterName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may not be supported in containerized, but logical files can be on >1 cluster,
although only 1 will have been accessed..

Maybe we can gloss over this for now, but should add a comment here at least.

system/jhtree/jhtree.cpp Outdated Show resolved Hide resolved
@@ -350,17 +350,21 @@ class CIndexReadSlaveBase : public CSlaveActivity
}
void mergeFileStats(IPartDescriptor *partDesc, IFileIO *partIO)
{
if (!currentManager)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mergeFileStats is only called when 'lazyIFileIO' is valid (from updateStats())
Which is called when configureNextInput is called (and in done()).
'currentManager' will always be null afaics (except in done()) at the moment.
It should be the outgoing IKeyManager for the the part it's just finished with but configureNextInput has set it to null before updateStats() is called.

So I don't think the updates can be working atm, but perhaps not noticed as long as the query finishes.

updateStats() probably need to change to keep currentManager until after the updateStats() call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also means (not new), that there are no fileStats updates serialized whilst it is reading a single part.
So for most index reads that means it won't send anything for fileStats until the end of the activity.
Needs revisiting in a separate JIRA.

thorlcr/graph/thgraphmaster.cpp Outdated Show resolved Hide resolved
thorlcr/graph/thgraphmaster.cpp Outdated Show resolved Hide resolved
thorlcr/activities/keyedjoin/thkeyedjoinslave.cpp Outdated Show resolved Hide resolved
thorlcr/activities/keyedjoin/thkeyedjoinslave.cpp Outdated Show resolved Hide resolved
@shamser shamser force-pushed the issue30599 branch 7 times, most recently from a0916cb to c3b17fc Compare November 22, 2023 14:50
@shamser shamser marked this pull request as ready for review November 22, 2023 14:50
@shamser shamser requested a review from jakesmith November 22, 2023 14:50
ecl/hthor/hthor.cpp Outdated Show resolved Hide resolved
ecl/hthor/hthor.cpp Outdated Show resolved Hide resolved
system/jlib/jstats.cpp Outdated Show resolved Hide resolved
thorlcr/activities/indexread/thindexreadslave.cpp Outdated Show resolved Hide resolved
@@ -91,6 +91,7 @@ const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSiz
const StatisticsMapping diskReadPartStatistics({StNumDiskRowsRead}, diskReadRemoteStatistics);
const StatisticsMapping indexDistribActivityStatistics({}, basicActivityStatistics, jhtreeCacheStatistics);
const StatisticsMapping soapcallActivityStatistics({}, basicActivityStatistics, soapcallStatistics);
const StatisticsMapping indexReadFileStatistics({StNumRowsProcessed}, diskReadRemoteStatistics, jhtreeCacheStatistics);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth reusing this, since it's a subset of indexReadActivityStatistics, i.e. move above it and have:

const StatisticsMapping indexReadActivityStatistics({}, basicActivityStatistics, indexReadFileStatistics);

@@ -1535,16 +1527,28 @@ class CKeyedJoinSlave : public CSlaveActivity, implements IJoinProcessor, implem
{
handles.resize(parts.size());
PARENT::init();
contextLoggers.clear();
for (unsigned i=0; i<parts.size(); i++)
contextLoggers.push_back(new CStatsContextLogger(jhtreeCacheStatistics, thorJob));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mapping need to be indexReadFileStatistics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::atomic<IKeyManager *> keyManager{nullptr};
public:
CKeyLookupMergeHandler(CKeyedJoinSlave &_activity) : CKeyLookupLocalBase(_activity)
CKeyLookupMergeHandler(CKeyedJoinSlave &_activity) : CKeyLookupLocalBase(_activity), contextLogger(jhtreeCacheStatistics, thorJob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mapping need to be a indexReadFileStatistics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1367,6 +1368,9 @@ class CKeyedJoinSlave : public CSlaveActivity, implements IJoinProcessor, implem
keyManagers.reset(new std::vector<std::atomic<IKeyManager *>>(parts.size()));
for (auto & k: *keyManagers)
k = nullptr;
contextLoggers.clear();
for (unsigned i=0; i<parts.size(); i++)
contextLoggers.push_back(new CStatsContextLogger(jhtreeCacheStatistics, thorJob));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mapping need to be indexReadFileStatistics (same as fileStats) ?
(because they are used in the keymanager which is going to [try] to populate with io stats as well as jhtree stats)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, contextLoggers will only ever tracking jhtree stats - they do not track io stats. We only need indexReadFileStatistics for fileStats. Note,

  • jhtree cache stats are returned CKeyLookupLocalHandler::getFileStats() and CRemoteLookupHandler::getFileStats, or
  • io stats are returned for CFetchLocalLookupHandler::getFileStats

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recent change:

  • jhtree and io stats are returned from CKeyLookupLocalHandler::getFileStats(). It uses keyManager->mergeStats()
  • some jhtree and io stats are returned CRemoteLookupHandler::getFileStats().
  • io stats are returned from CFetchLocalLookupHandler::getFileStats

The original point is still valid: contextLoggers only track jhtree stats, so they don't need indexReadFileStatistics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to the offline conversation we had, where the conclusion was that we don't currently collect and publish the io stats, either from the local handler, or from the remote - so we don't want to fix it 1 side now ?
But should fix in a subsequent PR so io stats are collected, and published whether using a local or remote handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still true.
i.e. because of this jhtree only mapping, we are no longer tracking disk io stats, although they IKeyManager::mergeStats is capable of returning them.

I think for this PR, that's okay, and probably preferable, becuase the remote handler never did serialize those stats back, so now it is consistent.

But a JIRA should be opened to revisit this so both handlers return io stats too.
Can you open one and link it to this JIRA?

Copy link
Contributor Author

@shamser shamser Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected:
rank_t lookupSlave = RANK_NULL;
mptag_t replyTag = TAG_NULL;
ICommunicator *comm = nullptr;
std::vector<unsigned> handles;
// One context logger per part. (Extract stats for logical file by mapping part to logical file/subfile)
std::vector<Owned<CStatsContextLogger>> contextLoggers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be moved out of CRemoteLookupHandler, into CKeyLookupRemoteHandler.

CRemoteLookupHandler is also the base of the remote fetch handler (CFetchRemoteLookupHandler).
Neiher the local or the remote fetch handlers use context loggers.

@@ -3368,7 +3377,6 @@ class CKeyedJoinSlave : public CSlaveActivity, implements IJoinProcessor, implem
}
virtual void stop() override
{
CStatsScopedDeltaUpdater scoped(statsUpdater);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final stats will be collected and serialized via the serializeStats->gatherActiveStats when the subgraph is stopped, i.e. via standard mechanism.
fileStats at that point will get their final update (as will activeStats), and then be serialized.

I think it's all correct, just checking the logic.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good in general.

I few minor issues only within KJ I think.

However, I think there's at least 1 change, that should be extracted out as an initial PR before this PR (see comment : #18023 (comment)).
That will help reduce the scope/size of the actual KJ change.

Also, it looks like there may be some double counting issues in index read, since there's only one contextLogger shared amongst parts (you may not notice unless the index read was reading >1 part, but it also looked like it was double counting IO stats.
I've opened a new JIRA re. creating some regression tests that grab stats and output them as results, to get some coverage of stat. changes:

Could you also update the PR and commit message, to say what changes have been made and why, e.g. the removal of the deltaUpdater, the introduction of a CStatsContextLogger per part, contained in the handlers etc, and the reasoning.
We may know the reasoning, but it would help others if description explained.

@jakesmith
Copy link
Member

@shamser - hadn't noticed until now, but I see there are some regression tests failing that look relevant.

@shamser shamser force-pushed the issue30599 branch 7 times, most recently from 471e477 to 6df6af9 Compare November 28, 2023 16:25
@shamser shamser requested a review from jakesmith November 28, 2023 16:28
@shamser
Copy link
Contributor Author

shamser commented Nov 28, 2023

@jakesmith I've fixed the remote key lookup stats. jobstats shows consistent jhtreecache stats between local and remote.

@shamser
Copy link
Contributor Author

shamser commented Nov 28, 2023

@jakesmith this PR is based on https://track.hpccsystems.com/browse/HPCC-30911. The first 2 commits shown in github, is related to https://track.hpccsystems.com/browse/HPCC-30911.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith this PR is based on https://track.hpccsystems.com/browse/HPCC-30911. The first 2 commits shown in github, is related to https://track.hpccsystems.com/browse/HPCC-30911.

I'll hold off reviewing this further until the PR for 30911 is merged.
Please rebase this PR and tag me, when that is merged.

@shamser shamser force-pushed the issue30599 branch 2 times, most recently from ed64962 to 41fafa0 Compare December 7, 2023 17:39
@shamser shamser requested a review from jakesmith December 7, 2023 17:40
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - I haven't re-reviewed the keyedjoin changes yet, I had some comments on indexread changes,
and I think it would be best, unless there's a blocking reason not to, to split those changes into a separate PR.
Will discuss offline.

@@ -98,7 +98,7 @@ class CIndexReadSlaveBase : public CSlaveActivity
if (!keyManager)
throw MakeActivityException(&activity, 0, "Callback attempting to read blob with no key manager - index being read remotely?");
needsBlobCleaning = true;
return (byte *) keyManager->loadBlob(id, dummy, &activity.contextLogger);
return (byte *) keyManager->loadBlob(id, dummy, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mean blob stats won't be recorded at all?
The loadBlob implementation is not using stats.ctx.

Either this needs to pass in use the associated contextLogger[], or IKeyManager::loadBlob should use the one it has in its KeyStatsCollector.

@@ -169,6 +169,7 @@ class CIndexReadSlaveBase : public CSlaveActivity
unsigned p = partNum;
while (p<partDescs.ordinality()) // will process all parts if localMerge
{
CStatsContextLogger * contextLogger = contextLoggers[p];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: only used on line 308, would be clearer if declared near its usage, or contextLogger[p] used directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexreadslave changes will be moved to separate PR.

@@ -331,7 +332,8 @@ class CIndexReadSlaveBase : public CSlaveActivity
return createIndexLookup(keyManager);
}
}
keyMergerManager.setown(createKeyMerger(helper->queryDiskRecordSize()->queryRecordAccessor(true), keyIndexSet, seekGEOffset, &contextLogger, helper->hasNewSegmentMonitors(), false));
//Not tracking jhtree cache stats in KeyMerger at the moment. Future: something to consider
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create a linked JIRA to revisit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexreadslave changes will be moved to separate PR.

ISuperFileDescriptor *superFDesc = partDescs.item(0).queryOwner().querySuperFileDescriptor();
for (unsigned partNum=0; partNum<partDescs.ordinality(); partNum++)
{
if (!keyManagers.isItem(partNum))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is thread safe.
getNextInput could be in the process of expanding the keyManagers array, isItem could be true, but entry still uninitialized.

An alternative may be to presize the keyManagers array (and initialize with nulls), and test for null rather than isItem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments change of approach, re. updateStats()

It used to be thread-safe, by locking a muxex (ioStatsCS), and dealing with just the current part.
I think it can still do the same, dealing with the currentManager only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexreadslave changes will be moved to separate PR.

@@ -762,6 +762,10 @@ class CStatsContextLogger : public CSimpleInterfaceOf<IContextLogger>
previous.updateDelta(to, stats);
}
virtual const LogMsgJobInfo & queryJob() const override { return job; }
void reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: we normally group declare the non-interface/non-virtuals. i.e. can this be moved after the ctor in this class.

if (lazyIFileIO)
{
mergeStats(inactiveStats, lazyIFileIO);
if (currentPart<partDescs.ordinality())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the approach here, should have been changed.
i.e. it used to roll the outgoing part into inactiveStats.
It used to just merge the current io, but can equally be applied to currentManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexreadslave changes will be moved to separate PR.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good, one question re. blob stats, and some minors + comments re. opening 'future' JIRAs

@@ -1309,7 +1308,7 @@ class CKeyedJoinSlave : public CSlaveActivity, implements IJoinProcessor, implem
joinGroup->setAtMostLimitHit(); // also clears existing rows
break;
}
KLBlobProviderAdapter adapter(keyManager, &activity.contextLogger);
KLBlobProviderAdapter adapter(keyManager, nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are blob stats. now being tracked ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the remote side (slavmain) would return track and serialize blob stats.
Please test an example to verify same whether using local or remote. Regression test 'indexblobs.ecl' may work as a test query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blob stats are not being tracked for keyed join. However, the blob stats for keyed join was never correct because slavmain was not tracking blob stats. Also, hthor was not tracking blob stats. It would be better to address this in a separate jira as the changes affect not only thor but roxie and hthor too. TODO: create ticket once jira is working again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1367,6 +1368,9 @@ class CKeyedJoinSlave : public CSlaveActivity, implements IJoinProcessor, implem
keyManagers.reset(new std::vector<std::atomic<IKeyManager *>>(parts.size()));
for (auto & k: *keyManagers)
k = nullptr;
contextLoggers.clear();
for (unsigned i=0; i<parts.size(); i++)
contextLoggers.push_back(new CStatsContextLogger(jhtreeCacheStatistics, thorJob));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still true.
i.e. because of this jhtree only mapping, we are no longer tracking disk io stats, although they IKeyManager::mergeStats is capable of returning them.

I think for this PR, that's okay, and probably preferable, becuase the remote handler never did serialize those stats back, so now it is consistent.

But a JIRA should be opened to revisit this so both handlers return io stats too.
Can you open one and link it to this JIRA?

@@ -2077,6 +2106,8 @@ class CKeyedJoinSlave : public CSlaveActivity, implements IJoinProcessor, implem
PARENT::end();
doClose(kjs_fetchclose);
}
// Not tracking remote fetch file stats. Future: may be something to consider.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you open a JIRA and link it to this JIRA ?

Copy link
Contributor Author

@shamser shamser Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -649,25 +649,48 @@ void CMasterActivity::done()

void CMasterActivity::updateFileReadCostStats()
{
// Updates numDiskReads & readCost in the file attributes and returns the readCost
auto updateReadCosts = [](IDistributedFile *file, CThorStatsCollection &stats)
// Returns updates numDiskReads & readCost in the file attributes and returns the readCost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this comment accidentally reverted? (the one on deleted line 652 seems correct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed now.

thorlcr/graph/thgraphmaster.cpp Outdated Show resolved Hide resolved
}
// CKeyLookupMergeHandler is not tracking the stats at the part level and that means it is not possible to associate the stats with a logical file level.
// Superfile: all stats will be associated with the first sub index file. For non-super, it makes no difference (there is only one index file).
// (Future: make necessary changes to createKeyMerger to track stats at the part level/logical level.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you open a JIRA and link it to this one?

Copy link
Contributor Author

@shamser shamser Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser shamser force-pushed the issue30599 branch 2 times, most recently from 05935cf to 8681024 Compare December 11, 2023 11:19
@shamser shamser requested a review from jakesmith December 11, 2023 11:24
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good, 1 question re. reset() of contextLogger's

{
PARENT::reset();
for (auto contextLogger: contextLoggers)
contextLogger->reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct? has it been tested?
i.e. what happens when this is called (because KJ is in a child query or loop), are the total stats correct?

@shamser shamser requested a review from jakesmith December 14, 2023 12:35
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good. Please squash.

Do the remote vs local handler stats. (for index seek/scans) match for the CQ test query ?

thorlcr/slave/slavmain.cpp Outdated Show resolved Hide resolved
@shamser
Copy link
Contributor Author

shamser commented Dec 15, 2023

@shamser - looks good. Please squash.

Do the remote vs local handler stats. (for index seek/scans) match for the CQ test query ?

seeks and scans match.

CacheHits is greater (202,062 vs 159499) on non-remote lookup. NumDiskReads is greater on non-remote (16 vs 12). Would you expect these to match?

@jakesmith
Copy link
Member

CacheHits is greater (202,062 vs 159499) on non-remote lookup. NumDiskReads is greater on non-remote (16 vs 12). Would you expect these to match?

not sure either (both local and remote) are deterministic, because there could be multiple handlers or multiple parts being dealt with by the same process (with shared jhtree cache). However, if N part key, on a N-way Thor - then it may be deterministic..

I'm unsure, I think we can investigate these differences post this PR though.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - there are some build problems reported in the tests - I think needs rebasing.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser looks good. A couple of questions - please could you answer them. Even if they require changes it is probably best to merge as-is and then revisit as separate PRs.

if (keyCursor)
keyCursor->mergeStats(stats);
keyCursor->mergeStats(targetStats); // merge IO stats
if (activeCtx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't feel like it is in the correct place. I would expect this call to be in the code that calls mergeStats():
as well as calling mergeStats on each keyManager it would call merge stats on the context's stats.

As it is this forces you to have a context for each key, which seems unnecessary, and wrong - e.g., in roxie it would only have a shared context, and prior to this change so did thor.

Probably best to merge and then revisit to avoid this being blocked.

Also, this should be testing ctx rather than activectx (otherwise if the last lookup was on a TLK that matched nothing it could avoid the merge.

Copy link
Contributor Author

@shamser shamser Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed to use ctx.

I agree that it does make sense to move the ctx mergeStats out to the caller. However, I'd prefer doing this separately so I can take the time to make the change and re-test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


CRuntimeStatisticCollection deltaStats(startStats.queryMapping());
contextLogger.updateStatsDeltaTo(deltaStats, startStats);
replyMb.append(deltaStats.getStatisticValue(StNumIndexSeeks));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Would it be better to use deltaStats.serialize(replyMb)? Otherwise you are missing some of the stats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be. I was thinking about doing this separately later as it was not related to kj costs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stat_type curDiskReads = stats.getStatisticSum(StNumDiskReads);
if(useJhtreeCacheStats)
{
stat_type numActualReads = stats.getStatisticSum(StNumNodeDiskFetches)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this code be setting curDiskReads? It will affect the number of reads recorded against the file in line 675. (It would also allow the cost calculation to be shared.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall @jakesmith mentioned that it would be useful to have both: the original NumDiskReads and the jhtree stats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine as it stands.
As it is, we have 2 pieces of info., we have the number of attempts to read from disk (StNumDiskReads) and the number of actual disk reads, at least in the index case (StNumNodeDiskFetches+StNumLeafDiskFetches+StNumBlobDiskFetches).

StNumDiskReads is a tad misleading (and always was), and may not closely correspond to actual disk reads in flat files in some cases either.
The explanation for what it is (and other stats) is something that should be clarified somewhere, but not directly related to this PR I would say.

@shamser shamser requested a review from ghalliday December 19, 2023 15:26
@ghalliday ghalliday merged commit 5d78149 into hpcc-systems:candidate-9.4.x Dec 19, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants