-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-28757 New StSizePeakEphemeralDisk and StSizePeakTempDisk for graphs #18482
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-28757 Jirabot Action Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - I've given this a 1st review, I think the logic is good, mostly minor comments, but worth looking at the naming and the separate publishing of "Temp" vs "Spill" stats.
thorlcr/thorutil/thormisc.hpp
Outdated
@@ -359,6 +359,54 @@ class graph_decl CStreamFileOwner : public CSimpleInterfaceOf<IExtRowStream> | |||
} | |||
}; | |||
|
|||
// Tracks the current and peak storage used for some files | |||
class FilesSizeTracker: public CInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: class names are normally prefixed with 'C',
and normally have a space before ':'
thorlcr/thorutil/thormisc.hpp
Outdated
__int64 fileSize = 0; | ||
public: | ||
CFileOwnerSizeUpdater(IFile *_iFile, FilesSizeTracker * _filesSizeTracker): CFileOwner(_iFile), filesSizeTracker(_filesSizeTracker) | ||
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: for formatting consistency, '}' on newline.
thorlcr/thorutil/thormisc.hpp
Outdated
Linked<FilesSizeTracker> filesSizeTracker; | ||
__int64 fileSize = 0; | ||
public: | ||
CFileOwnerSizeUpdater(IFile *_iFile, FilesSizeTracker * _filesSizeTracker): CFileOwner(_iFile), filesSizeTracker(_filesSizeTracker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: space before ':' and there's a double space after.
system/jlib/jstats.cpp
Outdated
@@ -974,6 +974,7 @@ static const constexpr StatisticMeta statsMetaData[StMax] = { | |||
{ SIZESTAT(ContinuationData), "The total size of continuation data sent from agent to the server\nA large number may indicate a poor filter, or merging from many different index locations" }, | |||
{ NUMSTAT(ContinuationRequests), "The number of times the agent indicated there was more data to be returned" }, | |||
{ NUMSTAT(Failures), "The number of times a query has failed" }, | |||
{ PEAKSIZESTAT(PeakSpill), "The high water mark for spill files"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed the description of pre-existing stat is misleading:
{ PEAKSIZESTAT(GraphSpill), "Peak size of spill memory usage" },
Not memory, but inter subgraph spill files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should track the non-subgraph spill file in a separate stat. too if possible and I think it is.
and we clearer if we distinguied it everywhere (as do in the helm charts/planes) by referring to it as "Temp" vs "Spill".
So we need 3 stats.
- "GraphSpill' tracks the peak inflight inter-subgraph spill file size
- "SubgraphTemp" tracks the peak internal activity temp usage of a subgraph
- "EphemeralStorage" tracks the active total/peak of the above 2 stats.
(not sure they're the best names!)
NB: we already support a tempPlane vs a spillPlane to differentiate (where the "temp" plane category will default to the spill plane.
NB: [3] will be less useful if [1] and [2] are on different planes, but not useless.
thorlcr/graph/thgraphslave.cpp
Outdated
// calculate peak spill size | ||
if (started&&initialized) | ||
{ | ||
unsigned __int64 activeSpillSize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to prev. comment, let's rename to activeTempSize.
OR, could do in next PR, because confusingly queryTempHandler is actually solely for spills, so it would be best to rename to querySpillHandler for clarify.
thorlcr/thorutil/thormisc.cpp
Outdated
@@ -88,11 +88,11 @@ const StatisticsMapping joinActivityStatistics({StNumLeftRows, StNumRightRows}, | |||
const StatisticsMapping diskReadActivityStatistics({StNumDiskRowsRead, }, basicActivityStatistics, diskReadRemoteStatistics); | |||
const StatisticsMapping diskWriteActivityStatistics({StPerReplicated}, basicActivityStatistics, diskWriteRemoteStatistics); | |||
const StatisticsMapping sortActivityStatistics({}, basicActivityStatistics, spillStatistics); | |||
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics); | |||
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StSizePeakSpill, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inconsistent naming between StSizeGraphSpill and StSizePeakSpill, in the sense that both are PEAKSIZESTAT stats, maybe make them both contain 'Peak' for clarity.
thorlcr/thorutil/thormisc.hpp
Outdated
} | ||
}; | ||
|
||
class CFileOwnerSizeUpdater : public CFileOwner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not now, but I suspect every CFileOwner should have this functionality?
suggesting that code should be rolled into CFileOwner
afa65da
to
6c64228
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - look good, 1 minor comment.
Please consider and squash.
thorlcr/thorutil/thormisc.hpp
Outdated
// simple class which takes ownership of the underlying file and deletes it on destruction | ||
class graph_decl CFileOwner : public CSimpleInterface, implements IInterface | ||
{ | ||
OwnedIFile iFile; | ||
Linked<CFileSizeTracker> fileSizeTracker; | ||
__int64 fileSize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: this and noteSpill/growSize/shrinkSize should be dealing in offset_t's really (offset_t is the underlying type of file size, __int64 is the underlying type of stats)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: noteSpill is now an offset_t, but really everything in CFileSizeTracker should be using offset_t too.
thorlcr/thorutil/thormisc.hpp
Outdated
iFile->remove(); | ||
} | ||
void noteSpill(__int64 size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: should be offset_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks fine, but see comment re. offset_t's in CFileSizeTracker
Sorry, forgot to push the last batch of changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser broadly looks good. A few questions/comments. (If none make sense then lets discuss them.)
system/jlib/jstats.cpp
Outdated
@@ -944,7 +944,7 @@ static const constexpr StatisticMeta statsMetaData[StMax] = { | |||
{ CYCLESTAT(LeafFetch) }, | |||
{ TIMESTAT(BlobFetch), "Time spent reading blobs from disk (EXCLUDING the linux page cache)" }, | |||
{ CYCLESTAT(BlobFetch) }, | |||
{ PEAKSIZESTAT(GraphSpill), "Peak size of spill memory usage" }, | |||
{ PEAKSIZESTAT(GraphSpill), "Peak size of graph spills" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of inter-subgraph spills?
system/jlib/jstatcodes.h
Outdated
@@ -302,6 +302,8 @@ enum StatisticKind | |||
StSizeContinuationData, | |||
StNumContinuationRequests, | |||
StNumFailures, | |||
StSizePeakSubgraphTemp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this name is quite right - but hard to think of the correct name
StSizePeakSubgraphStorage
StSizePeakSubgraphTempStorage
StSizePeakInternalSpillFile
StSizePeakInternalStorage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghalliday Are we proceeding with naming convention suggested by @jakesmith
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do use "Disk" in all stats (relating to disk), and I think this should have "Temp" in it (vs "Spill").
I'd go for:
StSizePeakInternalTempDisk
Which aligns better with StSizePeakEphemeralDisk too - description for that should make it clear it's for {internal+inter-subgraph spills}
{ PEAKSIZESTAT(GraphSpill), "Peak inter-subgraph spill size" },
Given others are named "Peak" this one should be too to be consistent:
StSizePeakInterSubgraphDisk
system/jlib/jstatcodes.h
Outdated
@@ -302,6 +302,8 @@ enum StatisticKind | |||
StSizeContinuationData, | |||
StNumContinuationRequests, | |||
StNumFailures, | |||
StSizePeakSubgraphTemp, | |||
StSizePeakEphemeralStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use disk rather than Storage for consistency?
} | ||
offset_t queryActiveTempSize() const | ||
{ | ||
return tempFileSizeTracker ? tempFileSizeTracker->queryActiveSize() : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only updated when the file is closed. @jakesmith I assume that is going to work for hash dedup, but will it scale to other situations - e.g. lookahead - where the file is not closed until the end.
Also are you planning to publish the active and peak spill file size for each activity? What should we call them rather than spill files? Do we need to rename the existing internal spills to match this convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we call them rather than spill files?
we are restricting 'spill' to mean inter-subgraph spills. And 'temp' to mean any internal activity temp files usage (i.e. inc. lookahead etc.). NB: we already do distinguish that way via "spill" and "temp" plane categories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only updated when the file is closed.
the tracker could be updated at any time, but yes in hashdedup, it only updates it at the end, when a bucket spills. That seemed reasonable since the write is expected to be relatively quick.
For lookahead, we should update periodically not at end.
The actual collection is periodic (in this PR) and will collect whatever updates there are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lookahead will be done differently, so that the size is updated intermittently.
I'm not attempting to record active temp file size as at the active temp size always ends up 0 because at the end of each activity the temp files are closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Active could be useful, e.g. a long running subgraph that hit peak early on, but is now slow because active is growing slowly.
We should also capture the total temp written too. e.g. peak might be a few GB, but it may have written near to that 100's of times.
Can revisit in another PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. naming, I'd go renaming the 3 stats to:
StSizePeakInterSubgraphDisk
StSizePeakInternalTempDisk
StSizePeakEphemeralDisk
You want them to be consistent with the other names. StSizeInterSubgraphSpill[Write?] It is picky, but I think it will help us to have some clear terminology for the different aspects. |
@ghalliday If I have understood correctly, shouldn't StSizeInternalTemp be zero at the end of each activity? |
Yes. I'm not sure it is useful - but as Jake says, it might be useful as the job is running. I suspect as long as you have the peak the active doesn't provide much value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks like there are some build issues? (github actions failing)
@shamser - please squash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser looks good. A few more questions to finish up.
system/jlib/jstatcodes.h
Outdated
@@ -302,6 +302,8 @@ enum StatisticKind | |||
StSizeContinuationData, | |||
StNumContinuationRequests, | |||
StNumFailures, | |||
StSizePeakSubgraphTemp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the conclusion was to use StSizePeakTempDisk, did I misremember?
{ | ||
CGraphElementBase &element = iter->query(); | ||
CSlaveActivity &activity = (CSlaveActivity &)*element.queryActivity(); | ||
activeTempSize += activity.queryActiveTempSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be calling activity.quertPeakTempSize()? I can see problems either way.
- if calling active size then you will only record the peak when the stats were gathered. So if a temp file was created and then deleted it would not show up.
- If calling peak size, then the number may end up too high. E.g. if you have sequential operations where one temp file is always deleted before another is created. I suspect that it fairly unusual, and we are probably more interested in the worse case, rather than the general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure.
Sum peaks (if >1 act), will I think usually result in an peak that is significantly higher than the true peak, because it is normal for downstream acts to consume some of the spill files. I think this over-estimate will be problematic.
IMO is it is better as it is, sampling active sum to see if > current peak.
My only concern is that the sample period is too long.
An alterntive perhaps (still sampling), is to recalculate more frequently at the time CFileSizeTracker::growSize is called. e.g. call back ito subgraph and have it collate peak (if hasn't in last in last e.g. 1s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question is why are we gathering this information. I think it is so that we know an upper bound on the amount of temp space that a workunit will use - so we can correctly size a disk.
If that is the case it is better to over-estimate than under-estimate - which is why I would go for peaksize.
Should we call to discuss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - I discussed this with Gavin (I meant to update this PR before) and the conclusion was that we tracking the active peak is the right approach, but calculating that peak at low-intervals is insufficient. i.e. calculating the subgraph peak every ~15seconds when the progress is collected is too infrequent and will often lead to inaccurate (low) estimates.
As I began to outline in my comment above, we think a good approach is for all the trackers to callback into the subgraph each time they grow in size, and for the subgraph to recalculate the peak. Those calls will be frequent, but it (the subgraph impl.) can decide when to recalculate, e.g. it has recalculated <1s ago, so do nothing.
Then that more frequently calculated peak can be serialized back with the graph progress when needed.
@ghalliday - have I forgotten anything? And, I vote for this not to hold up this PR being merged. A new PR to implement this strategy can be opened. I'll re-approve this PR.
thorlcr/thorutil/thormisc.cpp
Outdated
@@ -88,7 +88,7 @@ const StatisticsMapping joinActivityStatistics({StNumLeftRows, StNumRightRows}, | |||
const StatisticsMapping diskReadActivityStatistics({StNumDiskRowsRead, }, basicActivityStatistics, diskReadRemoteStatistics); | |||
const StatisticsMapping diskWriteActivityStatistics({StPerReplicated}, basicActivityStatistics, diskWriteRemoteStatistics); | |||
const StatisticsMapping sortActivityStatistics({}, basicActivityStatistics, spillStatistics); | |||
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics); | |||
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StSizePeakSubgraphTemp, StSizePeakEphemeralDisk, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith should SizePeakTempDisk also be published for each activity that can create a temporary file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, less interesting perhaps as we already have total written per act.
The per activity CFileSizeTracker is already tracknig peak, so should be easy to add this stat. per activity as well I think.
StSizePeakTempDisk tracks the high water mark for intra-graph temp files. StSizePeakEphemeralDisk tracks the overall high water mark for temp files (i.e. both intra-graph and inter-graph temp files) Note at present, only hash dedup produces the information necessary for generating these 2 stats and so the stats will not be accurate until further work is completed. Other activities that make use of temp files(such as join and sort) will need to be amended to the info necessary to produce these stats. Signed-off-by: Shamser Ahmed <[email protected]> HPCC-28757 Change following review Signed-off-by: Shamser Ahmed <[email protected]> HPCC-28757 Changes following review Signed-off-by: Shamser Ahmed <[email protected]> HPCC-28757 Rename StSizePeakSubgraphTemp to StSizePeakTempDisk and publish StSizePeakTempDisk for activities Signed-off-by: Shamser Ahmed <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - reapproving
@ghalliday - please see comments : #18482 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser I will merge. Hopefully recalculating more frequently will come relatively soon.
Type of change:
Checklist:
Smoketest:
Testing: