-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31649 New StSizePeakEphemeralDisk and StSizePeakTempDisk look ahead and hash distribute spilling #18585
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31649 Jirabot Action Result: |
7f3bc57
to
5a3bd59
Compare
56580e5
to
da22007
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good, but worth changing the linking semantics of CFileOwner (and e.g. leaving createSmartBuffer as it was).
And side-comments re. separate JIRA to track std. temp stats.
thorlcr/thorutil/thbuf.cpp
Outdated
@@ -609,8 +607,7 @@ class CSmartRowInMemoryBuffer: public CSimpleInterface, implements ISmartRowBuff | |||
|
|||
ISmartRowBuffer * createSmartBuffer(CActivityBase *activity, const char * tempname, size32_t buffsize, IThorRowInterfaces *rowif) | |||
{ | |||
Owned<IFile> file = createIFile(tempname); | |||
return new CSmartRowBuffer(activity,file,buffsize,rowif); | |||
return new CSmartRowBuffer(activity,createIFile(tempname),buffsize,rowif); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is slightly at odds with the linking semantics of other utility classes, I think it better left as it was, where CSmartRowBuffer takes a link to the passed in IFile.
And, CFileOwner's iFile member is changed to a Linked, so it's consistent, and consitent with it's other Linked member). And then change the places that currently pass a linked file into CFileOwner to just pass the file and let it link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to CFileOwner: #18726
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's merge that one 1st, then change this back.
@@ -182,7 +182,8 @@ class CSmartRowBuffer: public CSimpleInterface, implements ISmartRowBuffer, impl | |||
size32_t left = nb*blocksize-mb.length(); | |||
memset(mb.reserve(left),0,left); | |||
} | |||
fileio->write(blk*(offset_t)blocksize,mb.length(),mb.bufferBase()); | |||
tempFileIO->write(blk*(offset_t)blocksize,mb.length(),mb.bufferBase()); | |||
tmpFileOwner.noteSize(numblocks*blocksize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR, but we're not capturing standard spill (temp) stats from this/these helper functions, i.e. StCycleSpillElapsedCycles, StTimeSpillElapsed, StSizeSpillFile ( NB: we do already in CThorRowCollectorBase )
Is there an existing JIRA? If not can you open one and link it to this/other JIRA's so it doesn't get lost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jira for this: https://hpccsystems.atlassian.net/browse/HPCC-31984
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also link it to this JIRA and/or others (makes it easier to find in future)
@@ -1484,7 +1481,8 @@ class CSharedWriteAheadDisk : public CSharedWriteAheadBase | |||
mb.append((byte)0); | |||
size32_t len = mb.length(); | |||
chunk.setown(getOutOffset(len)); // will find space for 'len', might be bigger if from free list | |||
spillFileIO->write(chunk->offset, len, mb.toByteArray()); | |||
tempFileIO->write(chunk->offset, len, mb.toByteArray()); | |||
tempFileOwner->noteSize(highOffset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per comment above. If there isn't one, we need a JIRA to add the basic spill(temp) stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1536,7 +1533,8 @@ class CSharedWriteAheadDisk : public CSharedWriteAheadBase | |||
freeChunks.kill(); | |||
freeChunksSized.kill(); | |||
highOffset = 0; | |||
spillFileIO->setSize(0); | |||
tempFileIO->setSize(0); | |||
tempFileOwner->noteSize(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this (the setSize(0)) was implemented this way.
I think it should just delete the file, and let it be recreated if it was needed.
(not a change for now in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jira to look into this later: https://hpccsystems.atlassian.net/browse/HPCC-31985
@shamser - infact this also covers a hashdistribute spilling, which is also used as part of hash dedup (so https://hpccsystems.atlassian.net/browse/HPCC-28757 alone won't give a full picture of a hash dedup;s spilling). Can you update this JIRA and it's PR title to mention that it covers hash distribute spilling as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thorlcr/thorutil/thbuf.cpp
Outdated
@@ -609,8 +607,7 @@ class CSmartRowInMemoryBuffer: public CSimpleInterface, implements ISmartRowBuff | |||
|
|||
ISmartRowBuffer * createSmartBuffer(CActivityBase *activity, const char * tempname, size32_t buffsize, IThorRowInterfaces *rowif) | |||
{ | |||
Owned<IFile> file = createIFile(tempname); | |||
return new CSmartRowBuffer(activity,file,buffsize,rowif); | |||
return new CSmartRowBuffer(activity,createIFile(tempname),buffsize,rowif); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's merge that one 1st, then change this back.
de5a065
to
84c3c04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - please see comments.
thorlcr/thorutil/thbuf.cpp
Outdated
Owned<IFile> iFile = createIFile(spillName); | ||
iFile->setShareMode(IFSHnone); | ||
tempFileIO.setown(iFile->open(IFOcreaterw)); | ||
tempFileOwner.setown(new CFileOwner(iFile.getClear(), activity->queryTempFileSizeTracker())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this now needs to be changed to avoid leaking iFile, e.g. to:
tempFileOwner.setown(new CFileOwner(iFile, activity->queryTempFileSizeTracker()));
thorlcr/thorutil/thbuf.cpp
Outdated
fileio.clear(); | ||
file->remove(); | ||
} | ||
if (tempFileIO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial: no need to test
thorlcr/thorutil/thormisc.hpp
Outdated
fileSizeTracker->growSize(size-fileSize); | ||
else | ||
fileSizeTracker->shrinkSize(fileSize-size); | ||
fileSize = size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If fileSizeTracker is not set, fileSize won't be updated. I think should be :
if (fileSize!=size)
{
if (fileSizeTracker)
{
if (size > fileSize)
fileSizeTracker->growSize(size-fileSize);
else
fileSizeTracker->shrinkSize(fileSize-size);
}
fileSize = size;
}
( Gavin made a similar comment in this PR : #18573 (comment) )
Some changes made in #18573 will mean that this PR will need to be updated. I'll update this PR once 18573 is merged. @jakesmith |
4d40d35
to
2353100
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamser - looks good.
@shamser - there's a clash now, can you rebase |
…k ahead and hash distribute spilling Signed-off-by: Shamser Ahmed <[email protected]>
786aa0d
into
hpcc-systems:candidate-9.6.x
Type of change:
Checklist:
Smoketest:
Testing: