-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-32241 Temp size for sort to use disk size and track actual graph temp disk usage #18883
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32241 Jirabot Action Result: |
@shamser - could you update the commit message/PR message, to describe the what the problem was caused by/how it was fixed? It will aide the review process and future change history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - could you update the commit message/PR message, to describe the what the problem was caused by/how it was fixed? It will aide the review process and future change history.
I think this has essentially change from recalculating the peak in serializeStats, to introducing a size tracker in the graph and pushing up the grow/shrink stat sto the subgraph.
That makes sense afaics (see comment re. child graphs).
Please update the commit message/PR message to contain a description of the bug, and how this addresses it.
thorlcr/graph/thgraph.hpp
Outdated
@@ -1175,7 +1192,9 @@ class graph_decl CActivityBase : implements CInterfaceOf<IThorRowInterfaces>, im | |||
CFileSizeTracker * queryTempFileSizeTracker() | |||
{ | |||
if (!tempFileSizeTracker) | |||
tempFileSizeTracker.setown(new CFileSizeTracker); | |||
{ | |||
tempFileSizeTracker.setown(new CFileSizeTracker(queryGraph().queryTempFileSizeTracker())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to use queryGraph().queryParent(), because in a child query, queryGraph() will be the child graph, not the subgraph that's tracking the usage. queryGraph().queryParent(), should always point to the subgraph, no matter what the depth of the nested child graphs.
NB: note to self - CGraphBase::queryParent() and CGraphBase::queryOwner() are counter-intuitive, it would have made more sense if queryOwner() was the top-level graph (the subgraph) and queryOwner() was the activity containing graph (whether that be the subgraph, or a nested child graph).
Do you have a spilling child graph test, you can test this with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've attached loopsort2.ecl to the parent JIRA in case useful. It should spill in each iteration of the loop.
I've had a very brief look at the SizePeakTempDisk numbers:
sg1 (subgraph):
<attr kind='SizePeakTempDisk' value='166813325' formatted='159.084MB' unit='sz' ctype='thor' creator='[email protected]' ts='2024-07-17T08:40:07.780Z'/>
sg5 (child loop graph):
<attr kind='SizePeakTempDisk' value='132142234' formatted='126.020MB' unit='sz' ctype='thor' creator='[email protected]' ts='2024-07-17T08:40:07.780Z'/>
sg5:a7 (sort act in child graph)
<attr kind='SizePeakTempDisk' value='29424207' formatted='28.060MB' unit='sz' ctype='thor' creator='[email protected]' ts='2024-07-17T08:40:07.780Z'/>
I think the numbers should be broadly the same, i.e. it's not clear why sg5's number is significantly higher.
NB: the loop result may also spill, but I don't think it's currently being tracked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good. Please squash, keeping the new commit message in the squashed commit and add it to the PR message.
temp disk usage SizePeakTempDisk for graphs was previous calculated by summing the each activity's active temp file size. However, as this was done at internals, spikes in temp file sizes between intervals were not recorded in the SizePeakTempDisk stats. This change uses a TempFileSize tracker at the graph level to track the actual peak/active temp file size for the graph. Sort activities was using the size of the uncompressed data as the size of the temp file. This change uses the actual size of temp file for sort activity's peak/active temp file size Signed-off-by: Shamser Ahmed <[email protected]>
SizePeakTempDisk for graphs was previous calculated by summing the each Sort activities was using the size of the uncompressed data as the size |
@shamser - can you also copy the commit message in to this PR description, i.e. above the checkbox test (which is done automatically when pushing the 1st commit) ? Done now. @jakesmith |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - tagging you back re. #18883 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good
SizePeakTempDisk for graphs was previous calculated by summing the each
activity's active temp file size. However, as this was done at internals,
spikes in temp file sizes between intervals were not recorded in the
SizePeakTempDisk stats. This change uses a TempFileSize tracker at the
graph level to track the actual peak/active temp file size for the graph.
Sort activities was using the size of the uncompressed data as the size
of the temp file. This change uses the actual size of temp file for sort
activity's peak/active temp file size
Type of change:
Checklist:
Smoketest:
Testing: