Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-28757 New StSizePeakEphemeralDisk and StSizePeakTempDisk for graphs #18482

Merged
merged 1 commit into from
May 23, 2024

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Apr 3, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

github-actions bot commented Apr 3, 2024

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-28757

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@shamser shamser requested a review from jakesmith April 3, 2024 10:43
@shamser shamser marked this pull request as draft April 3, 2024 12:40
@shamser shamser removed the request for review from jakesmith April 3, 2024 12:40
@shamser shamser changed the title HPCC-28757 Report the peak activity and graph spill HPCC-28757 New StSizePeakSpill for activity & graph: hash dedep Apr 8, 2024
@shamser shamser changed the title HPCC-28757 New StSizePeakSpill for activity & graph: hash dedep HPCC-28757 New StSizePeakSpill stats for hash dedup Apr 8, 2024
@shamser shamser marked this pull request as ready for review April 8, 2024 12:48
@shamser shamser requested a review from jakesmith April 8, 2024 12:48
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - I've given this a 1st review, I think the logic is good, mostly minor comments, but worth looking at the naming and the separate publishing of "Temp" vs "Spill" stats.

@@ -359,6 +359,54 @@ class graph_decl CStreamFileOwner : public CSimpleInterfaceOf<IExtRowStream>
}
};

// Tracks the current and peak storage used for some files
class FilesSizeTracker: public CInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: class names are normally prefixed with 'C',
and normally have a space before ':'

__int64 fileSize = 0;
public:
CFileOwnerSizeUpdater(IFile *_iFile, FilesSizeTracker * _filesSizeTracker): CFileOwner(_iFile), filesSizeTracker(_filesSizeTracker)
{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: for formatting consistency, '}' on newline.

Linked<FilesSizeTracker> filesSizeTracker;
__int64 fileSize = 0;
public:
CFileOwnerSizeUpdater(IFile *_iFile, FilesSizeTracker * _filesSizeTracker): CFileOwner(_iFile), filesSizeTracker(_filesSizeTracker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: space before ':' and there's a double space after.

@@ -974,6 +974,7 @@ static const constexpr StatisticMeta statsMetaData[StMax] = {
{ SIZESTAT(ContinuationData), "The total size of continuation data sent from agent to the server\nA large number may indicate a poor filter, or merging from many different index locations" },
{ NUMSTAT(ContinuationRequests), "The number of times the agent indicated there was more data to be returned" },
{ NUMSTAT(Failures), "The number of times a query has failed" },
{ PEAKSIZESTAT(PeakSpill), "The high water mark for spill files"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed the description of pre-existing stat is misleading:

{ PEAKSIZESTAT(GraphSpill), "Peak size of spill memory usage" },

Not memory, but inter subgraph spill files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should track the non-subgraph spill file in a separate stat. too if possible and I think it is.
and we clearer if we distinguied it everywhere (as do in the helm charts/planes) by referring to it as "Temp" vs "Spill".
So we need 3 stats.

  1. "GraphSpill' tracks the peak inflight inter-subgraph spill file size
  2. "SubgraphTemp" tracks the peak internal activity temp usage of a subgraph
  3. "EphemeralStorage" tracks the active total/peak of the above 2 stats.

(not sure they're the best names!)

NB: we already support a tempPlane vs a spillPlane to differentiate (where the "temp" plane category will default to the spill plane.
NB: [3] will be less useful if [1] and [2] are on different planes, but not useless.

// calculate peak spill size
if (started&&initialized)
{
unsigned __int64 activeSpillSize = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to prev. comment, let's rename to activeTempSize.

OR, could do in next PR, because confusingly queryTempHandler is actually solely for spills, so it would be best to rename to querySpillHandler for clarify.

@@ -88,11 +88,11 @@ const StatisticsMapping joinActivityStatistics({StNumLeftRows, StNumRightRows},
const StatisticsMapping diskReadActivityStatistics({StNumDiskRowsRead, }, basicActivityStatistics, diskReadRemoteStatistics);
const StatisticsMapping diskWriteActivityStatistics({StPerReplicated}, basicActivityStatistics, diskWriteRemoteStatistics);
const StatisticsMapping sortActivityStatistics({}, basicActivityStatistics, spillStatistics);
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics);
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StSizePeakSpill, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inconsistent naming between StSizeGraphSpill and StSizePeakSpill, in the sense that both are PEAKSIZESTAT stats, maybe make them both contain 'Peak' for clarity.

}
};

class CFileOwnerSizeUpdater : public CFileOwner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not now, but I suspect every CFileOwner should have this functionality?
suggesting that code should be rolled into CFileOwner

@shamser shamser force-pushed the issue28757 branch 2 times, most recently from afa65da to 6c64228 Compare April 18, 2024 13:53
@shamser shamser changed the title HPCC-28757 New StSizePeakSpill stats for hash dedup HPCC-28757 New StSizePeakEphemeralStorage and StSizePeakSubgraphTemp for graphs Apr 19, 2024
@shamser shamser changed the title HPCC-28757 New StSizePeakEphemeralStorage and StSizePeakSubgraphTemp for graphs HPCC-28757 New StSizePeakEphemeralStorage and StSizePeakSubgraphTemp for hash dedup Apr 19, 2024
@shamser shamser requested a review from jakesmith April 19, 2024 09:54
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - look good, 1 minor comment.
Please consider and squash.

// simple class which takes ownership of the underlying file and deletes it on destruction
class graph_decl CFileOwner : public CSimpleInterface, implements IInterface
{
OwnedIFile iFile;
Linked<CFileSizeTracker> fileSizeTracker;
__int64 fileSize = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: this and noteSpill/growSize/shrinkSize should be dealing in offset_t's really (offset_t is the underlying type of file size, __int64 is the underlying type of stats)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: noteSpill is now an offset_t, but really everything in CFileSizeTracker should be using offset_t too.

iFile->remove();
}
void noteSpill(__int64 size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: should be offset_t

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks fine, but see comment re. offset_t's in CFileSizeTracker

@shamser
Copy link
Contributor Author

shamser commented Apr 25, 2024

@shamser - looks fine, but see comment re. offset_t's in CFileSizeTracker

Sorry, forgot to push the last batch of changes.

@shamser shamser requested a review from jakesmith April 25, 2024 15:30
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser broadly looks good. A few questions/comments. (If none make sense then lets discuss them.)

@@ -944,7 +944,7 @@ static const constexpr StatisticMeta statsMetaData[StMax] = {
{ CYCLESTAT(LeafFetch) },
{ TIMESTAT(BlobFetch), "Time spent reading blobs from disk (EXCLUDING the linux page cache)" },
{ CYCLESTAT(BlobFetch) },
{ PEAKSIZESTAT(GraphSpill), "Peak size of spill memory usage" },
{ PEAKSIZESTAT(GraphSpill), "Peak size of graph spills" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of inter-subgraph spills?

@@ -302,6 +302,8 @@ enum StatisticKind
StSizeContinuationData,
StNumContinuationRequests,
StNumFailures,
StSizePeakSubgraphTemp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this name is quite right - but hard to think of the correct name
StSizePeakSubgraphStorage
StSizePeakSubgraphTempStorage
StSizePeakInternalSpillFile
StSizePeakInternalStorage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday Are we proceeding with naming convention suggested by @jakesmith

Copy link
Member

@jakesmith jakesmith Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do use "Disk" in all stats (relating to disk), and I think this should have "Temp" in it (vs "Spill").
I'd go for:
StSizePeakInternalTempDisk

Which aligns better with StSizePeakEphemeralDisk too - description for that should make it clear it's for {internal+inter-subgraph spills}

{ PEAKSIZESTAT(GraphSpill), "Peak inter-subgraph spill size" },

Given others are named "Peak" this one should be too to be consistent:
StSizePeakInterSubgraphDisk

@@ -302,6 +302,8 @@ enum StatisticKind
StSizeContinuationData,
StNumContinuationRequests,
StNumFailures,
StSizePeakSubgraphTemp,
StSizePeakEphemeralStorage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use disk rather than Storage for consistency?

}
offset_t queryActiveTempSize() const
{
return tempFileSizeTracker ? tempFileSizeTracker->queryActiveSize() : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only updated when the file is closed. @jakesmith I assume that is going to work for hash dedup, but will it scale to other situations - e.g. lookahead - where the file is not closed until the end.

Also are you planning to publish the active and peak spill file size for each activity? What should we call them rather than spill files? Do we need to rename the existing internal spills to match this convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we call them rather than spill files?

we are restricting 'spill' to mean inter-subgraph spills. And 'temp' to mean any internal activity temp files usage (i.e. inc. lookahead etc.). NB: we already do distinguish that way via "spill" and "temp" plane categories.

Copy link
Member

@jakesmith jakesmith Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only updated when the file is closed.

the tracker could be updated at any time, but yes in hashdedup, it only updates it at the end, when a bucket spills. That seemed reasonable since the write is expected to be relatively quick.

For lookahead, we should update periodically not at end.

The actual collection is periodic (in this PR) and will collect whatever updates there are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookahead will be done differently, so that the size is updated intermittently.

I'm not attempting to record active temp file size as at the active temp size always ends up 0 because at the end of each activity the temp files are closed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active could be useful, e.g. a long running subgraph that hit peak early on, but is now slow because active is growing slowly.
We should also capture the total temp written too. e.g. peak might be a few GB, but it may have written near to that 100's of times.
Can revisit in another PR though.

@shamser shamser requested review from ghalliday and jakesmith April 29, 2024 09:12
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re. naming, I'd go renaming the 3 stats to:

StSizePeakInterSubgraphDisk
StSizePeakInternalTempDisk
StSizePeakEphemeralDisk

@ghalliday ?

@ghalliday
Copy link
Member

Re. naming, I'd go renaming the 3 stats to:

StSizePeakInterSubgraphDisk StSizePeakInternalTempDisk StSizePeakEphemeralDisk

@ghalliday ?
And what do StSizeSpillFile and StSizeGraphSpill become? (I'm not quite sure what they currently represent).

You want them to be consistent with the other names.
Some more suggestions

StSizeInterSubgraphSpill[Write?]
StSizePeakInterSubgraphSpill
StSizePeakInternalTemp - peak size
StSizeInternalTemp - current size
StSizeInternalTempWrite - total size
StSizePeakEphemeralDiskUsage

It is picky, but I think it will help us to have some clear terminology for the different aspects.

@shamser
Copy link
Contributor Author

shamser commented May 1, 2024

StSizeInternalTemp

@ghalliday If I have understood correctly, shouldn't StSizeInternalTemp be zero at the end of each activity?

@ghalliday
Copy link
Member

StSizeInternalTemp

@ghalliday If I have understood correctly, shouldn't StSizeInternalTemp be zero at the end of each activity?

Yes. I'm not sure it is useful - but as Jake says, it might be useful as the job is running. I suspect as long as you have the peak the active doesn't provide much value.

@shamser shamser changed the title HPCC-28757 New StSizePeakEphemeralStorage and StSizePeakSubgraphTemp for hash dedup HPCC-28757 New StSizePeakEphemeral and StSizePeakSubgraphTemp for hash dedup May 2, 2024
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks like there are some build issues? (github actions failing)

@jakesmith
Copy link
Member

@shamser - please squash.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser looks good. A few more questions to finish up.

@@ -302,6 +302,8 @@ enum StatisticKind
StSizeContinuationData,
StNumContinuationRequests,
StNumFailures,
StSizePeakSubgraphTemp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the conclusion was to use StSizePeakTempDisk, did I misremember?

{
CGraphElementBase &element = iter->query();
CSlaveActivity &activity = (CSlaveActivity &)*element.queryActivity();
activeTempSize += activity.queryActiveTempSize();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be calling activity.quertPeakTempSize()? I can see problems either way.

  • if calling active size then you will only record the peak when the stats were gathered. So if a temp file was created and then deleted it would not show up.
  • If calling peak size, then the number may end up too high. E.g. if you have sequential operations where one temp file is always deleted before another is created. I suspect that it fairly unusual, and we are probably more interested in the worse case, rather than the general case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure.
Sum peaks (if >1 act), will I think usually result in an peak that is significantly higher than the true peak, because it is normal for downstream acts to consume some of the spill files. I think this over-estimate will be problematic.

IMO is it is better as it is, sampling active sum to see if > current peak.
My only concern is that the sample period is too long.

An alterntive perhaps (still sampling), is to recalculate more frequently at the time CFileSizeTracker::growSize is called. e.g. call back ito subgraph and have it collate peak (if hasn't in last in last e.g. 1s).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is why are we gathering this information. I think it is so that we know an upper bound on the amount of temp space that a workunit will use - so we can correctly size a disk.
If that is the case it is better to over-estimate than under-estimate - which is why I would go for peaksize.
Should we call to discuss?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - I discussed this with Gavin (I meant to update this PR before) and the conclusion was that we tracking the active peak is the right approach, but calculating that peak at low-intervals is insufficient. i.e. calculating the subgraph peak every ~15seconds when the progress is collected is too infrequent and will often lead to inaccurate (low) estimates.

As I began to outline in my comment above, we think a good approach is for all the trackers to callback into the subgraph each time they grow in size, and for the subgraph to recalculate the peak. Those calls will be frequent, but it (the subgraph impl.) can decide when to recalculate, e.g. it has recalculated <1s ago, so do nothing.
Then that more frequently calculated peak can be serialized back with the graph progress when needed.

@ghalliday - have I forgotten anything? And, I vote for this not to hold up this PR being merged. A new PR to implement this strategy can be opened. I'll re-approve this PR.

@@ -88,7 +88,7 @@ const StatisticsMapping joinActivityStatistics({StNumLeftRows, StNumRightRows},
const StatisticsMapping diskReadActivityStatistics({StNumDiskRowsRead, }, basicActivityStatistics, diskReadRemoteStatistics);
const StatisticsMapping diskWriteActivityStatistics({StPerReplicated}, basicActivityStatistics, diskWriteRemoteStatistics);
const StatisticsMapping sortActivityStatistics({}, basicActivityStatistics, spillStatistics);
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics);
const StatisticsMapping graphStatistics({StNumExecutions, StSizeSpillFile, StSizeGraphSpill, StSizePeakSubgraphTemp, StSizePeakEphemeralDisk, StTimeUser, StTimeSystem, StNumContextSwitches, StSizeMemory, StSizePeakMemory, StSizeRowMemory, StSizePeakRowMemory}, basicActivityStatistics);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith should SizePeakTempDisk also be published for each activity that can create a temporary file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, less interesting perhaps as we already have total written per act.
The per activity CFileSizeTracker is already tracknig peak, so should be easy to add this stat. per activity as well I think.

StSizePeakTempDisk tracks the high water mark for intra-graph temp
files.  StSizePeakEphemeralDisk tracks the overall high water mark
for temp files (i.e. both intra-graph and inter-graph temp files)

Note at present, only hash dedup produces the information necessary for
generating these 2 stats and so the stats will not be accurate until
further work is completed.  Other activities that make use of temp
files(such as join and sort) will need to be amended to the info
necessary to produce these stats.

Signed-off-by: Shamser Ahmed <[email protected]>

HPCC-28757 Change following review

Signed-off-by: Shamser Ahmed <[email protected]>

HPCC-28757 Changes following review

Signed-off-by: Shamser Ahmed <[email protected]>

HPCC-28757 Rename StSizePeakSubgraphTemp to StSizePeakTempDisk and publish StSizePeakTempDisk for activities

Signed-off-by: Shamser Ahmed <[email protected]>
@shamser shamser changed the title HPCC-28757 New StSizePeakEphemeral and StSizePeakSubgraphTemp for hash dedup HPCC-28757 New StSizePeakEphemeralDisk and StSizePeakTempDisk for graphs May 15, 2024
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - reapproving
@ghalliday - please see comments : #18482 (comment)

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser I will merge. Hopefully recalculating more frequently will come relatively soon.

@ghalliday ghalliday merged commit e5240c5 into hpcc-systems:candidate-9.6.x May 23, 2024
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants