Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31984 Capture additional stats from CSmartRowBuffer temp files #18831

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Jun 28, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31984

Jirabot Action Result:
Workflow Transition: Merge Pending
Updated PR

@shamser shamser changed the base branch from candidate-9.6.x to candidate-9.8.x July 2, 2024 13:29
@shamser shamser changed the base branch from candidate-9.8.x to candidate-9.6.x August 12, 2024 11:36
@shamser shamser force-pushed the issue31984 branch 2 times, most recently from 1a2cce2 to 713bff9 Compare August 12, 2024 12:00
@shamser shamser changed the title HPCC-31984 Capture StCycleSpillElapsedCycles, StTimeSpillElapsed, StSizeSpillFile from CSmartRowBuffer HPCC-31984 Capture additional stats from CSmartRowBuffer temp files Aug 12, 2024
@shamser shamser marked this pull request as ready for review August 12, 2024 15:32
@shamser shamser requested a review from jakesmith August 13, 2024 09:16
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good, but I think some thread safety issues.

@@ -424,6 +424,13 @@ class CSmartRowBuffer: public CSimpleInterface, implements ISmartRowBuffer, impl
{
return this;
}
virtual unsigned __int64 getStatistic(StatisticKind kind) const override
{
if (tempFileIO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think theoretically thread unsafe, the other thread could be assigning to the pointer, as this is testing it. I think should really be protected with an atomic. In practice on Intel it may be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - I don't think this one has been addressed, still thread unsafe?

virtual unsigned __int64 getStatistic(StatisticKind kind) const
{
unsigned __int64 v = inactiveStats.queryStatistic(kind).get();
if (currentOutputIFileIO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look thread safe.
An existing currentOutputIFileIO could be destroyed (in createNextOutputStream()) after this test.

@shamser shamser requested a review from jakesmith September 13, 2024 15:24
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1433,6 +1433,8 @@ class CDistributorBase : implements IHashDistributor, implements IExceptionHandl

virtual void mergeStats(CRuntimeStatisticCollection &stats) const
{
if (piperd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed this one before, but is this thread safe?
Thread 1: stop()->disconnect()->piperd.clear()
Thread 2: [passed reading pointer from Owned], piperd could be cleared by T1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put a criticalblock around this as well.

if (currentOutputIFileIO)
{
CriticalBlock b(critCurrentOutputIFileIO);
if (currentOutputIFileIO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think strictly speaking, the 1st check needs to be atomic for it to be thread safe, otherwise it could be reordered.
But it's not worth it. Better to just enter mutex and test once - this is not highly contended, i.e. make it:

    virtual unsigned __int64 getStatistic(StatisticKind kind) const
    {
        unsigned __int64 v = inactiveStats.queryStatistic(kind).get();
        CriticalBlock b(critCurrentOutputIFileIO);
        if (currentOutputIFileIO)
            v += currentOutputIFileIO->getStatistic(kind);
        return v;
    }

@shamser shamser force-pushed the issue31984 branch 2 times, most recently from 426179c to 28bf41f Compare September 16, 2024 11:19
@shamser shamser requested a review from jakesmith September 17, 2024 09:12
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good, but needs rebasing to resolve clash

@@ -146,6 +147,7 @@ class CSmartRowBuffer: public CSimpleInterface, implements ISmartRowBuffer, impl
}
if (!tempFileIO) {
SpinUnblock unblock(lock);
CriticalBlock block(critTmpFileIO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using AtomicShared for tempFileIO would be an alternative here.

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good please squash.

But we should probably not be targeting 9.6 for this kind of change.
Please target against 9.8.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser looks good. Please squash.
One comment on naming.

Also, recent experience has taught us to be very careful about calling set/setown inside a critical section because releasing the old object can be expensive. A better alternative is to use swap. In this case I think the only thread that would be blocked would be the stats reporting - so it is unlikely to be an issue.

@jakesmith are we completely happy with this going in 9.6.x?

@@ -95,6 +95,7 @@ class CDistributorBase : implements IHashDistributor, implements IExceptionHandl
size32_t fixedEstSize;
Owned<IRowWriter> pipewr;
Owned<ISmartRowBuffer> piperd;
mutable CriticalSection critPiperd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random general comment on naming:
piperdCrit is a more natural name than critPiperd (and more consistent with other variables e.g. fixedEstSize) - because piperd is an adjective/qualifier for the critical section, and in english qualifiers come before the noun. Not worth changing in this PR.

@jakesmith
Copy link
Member

@jakesmith are we completely happy with this going in 9.6.x?

I suggested previously (#18831 (review)) that it should be retargeted to 9.8 (arguably shouldn't go into 9.8 either, but separate discussion).
@shamser - please retarget

* Capture stats such as StCycleSpillElapsedCycles and StTimeSpillElapsed
from temp file
* Have CSmartRowBuffer use StSizeDiskWrite from tempFileIO for noteSize (should mean
actual disk size used for size tracking)

Signed-off-by: Shamser Ahmed <[email protected]>
@shamser shamser changed the base branch from candidate-9.6.x to candidate-9.8.x October 4, 2024 14:42
@shamser
Copy link
Contributor Author

shamser commented Oct 4, 2024

@jakesmith @ghalliday Squashed & retargeted.

@ghalliday ghalliday changed the base branch from candidate-9.8.x to master October 18, 2024 08:01
@ghalliday ghalliday merged commit 33db4c0 into hpcc-systems:master Oct 18, 2024
Copy link

Jirabot Action Result:
Added fix version: 9.10.0
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants