-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-28757 New StSizePeakEphemeralDisk and StSizePeakTempDisk for graphs #18482
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1280,8 +1280,28 @@ bool CSlaveGraph::serializeStats(MemoryBuffer &mb) | |
jobS->querySharedAllocator()->queryRowManager()->reportSummaryStatistics(stats); | ||
|
||
IGraphTempHandler *tempHandler = owner ? queryTempHandler(false) : queryJob().queryTempHandler(); | ||
offset_t sizeGraphSpill = tempHandler ? tempHandler->getActiveUsageSize() : 0; | ||
if (tempHandler) | ||
stats.mergeStatistic(StSizeGraphSpill, tempHandler->getActiveUsageSize()); | ||
stats.mergeStatistic(StSizeGraphSpill, sizeGraphSpill); | ||
|
||
// calculate peak spill size | ||
if (started&&initialized) | ||
{ | ||
offset_t activeTempSize = 0; | ||
Owned<IThorActivityIterator> iter = getConnectedIterator(); | ||
ForEach (*iter) | ||
{ | ||
CGraphElementBase &element = iter->query(); | ||
CSlaveActivity &activity = (CSlaveActivity &)*element.queryActivity(); | ||
activeTempSize += activity.queryActiveTempSize(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be calling activity.quertPeakTempSize()? I can see problems either way.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. IMO is it is better as it is, sampling active sum to see if > current peak. An alterntive perhaps (still sampling), is to recalculate more frequently at the time CFileSizeTracker::growSize is called. e.g. call back ito subgraph and have it collate peak (if hasn't in last in last e.g. 1s). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the question is why are we gathering this information. I think it is so that we know an upper bound on the amount of temp space that a workunit will use - so we can correctly size a disk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shamser - I discussed this with Gavin (I meant to update this PR before) and the conclusion was that we tracking the active peak is the right approach, but calculating that peak at low-intervals is insufficient. i.e. calculating the subgraph peak every ~15seconds when the progress is collected is too infrequent and will often lead to inaccurate (low) estimates. As I began to outline in my comment above, we think a good approach is for all the trackers to callback into the subgraph each time they grow in size, and for the subgraph to recalculate the peak. Those calls will be frequent, but it (the subgraph impl.) can decide when to recalculate, e.g. it has recalculated <1s ago, so do nothing. @ghalliday - have I forgotten anything? And, I vote for this not to hold up this PR being merged. A new PR to implement this strategy can be opened. I'll re-approve this PR. |
||
} | ||
if (activeTempSize > peakTempSize) | ||
peakTempSize = activeTempSize; | ||
} | ||
if (peakTempSize) | ||
stats.mergeStatistic(StSizePeakTempDisk, peakTempSize); | ||
if (peakTempSize + sizeGraphSpill) | ||
stats.mergeStatistic(StSizePeakEphemeralDisk, peakTempSize + sizeGraphSpill); | ||
stats.serialize(mb); | ||
|
||
unsigned cPos = mb.length(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only updated when the file is closed. @jakesmith I assume that is going to work for hash dedup, but will it scale to other situations - e.g. lookahead - where the file is not closed until the end.
Also are you planning to publish the active and peak spill file size for each activity? What should we call them rather than spill files? Do we need to rename the existing internal spills to match this convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are restricting 'spill' to mean inter-subgraph spills. And 'temp' to mean any internal activity temp files usage (i.e. inc. lookahead etc.). NB: we already do distinguish that way via "spill" and "temp" plane categories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tracker could be updated at any time, but yes in hashdedup, it only updates it at the end, when a bucket spills. That seemed reasonable since the write is expected to be relatively quick.
For lookahead, we should update periodically not at end.
The actual collection is periodic (in this PR) and will collect whatever updates there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookahead will be done differently, so that the size is updated intermittently.
I'm not attempting to record active temp file size as at the active temp size always ends up 0 because at the end of each activity the temp files are closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Active could be useful, e.g. a long running subgraph that hit peak early on, but is now slow because active is growing slowly.
We should also capture the total temp written too. e.g. peak might be a few GB, but it may have written near to that 100's of times.
Can revisit in another PR though.