Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32480 Capture "look ahead" timings for unordered concat (parallel funnel) #19164

Merged
merged 1 commit into from
Nov 11, 2024

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Sep 27, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32480

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@shamser shamser changed the base branch from candidate-9.8.x to master October 16, 2024 12:21
@shamser shamser marked this pull request as ready for review October 16, 2024 12:22
@shamser shamser requested a review from jakesmith October 16, 2024 12:23
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - I am not sure what is being considered lookahead or blocked time is correct at the moment - please see comments.

@@ -315,7 +315,7 @@ class SimpleActivityTimer
cycle_t startCycles;
cycle_t &accumulator;
protected:
const bool enabled;
bool enabled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to make mutable.

@@ -85,6 +85,7 @@ class CParallelFunnel : implements IRowStream, public CSimpleInterface
inputStream = funnel.activity.queryInputStream(inputIndex);
while (!stopping)
{
LookAheadTimer timer(funnel.activity.getActivityTimerAccumulator(), funnel.activity.queryTimeActivities());
Copy link
Member

@jakesmith jakesmith Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about what this is collecting, but think:

  • it should include the time it spent in startInput
  • should not include the time it may block in pushMulti

basically, lookahead time should only be the time it spend reading the the input ahead (which in a single thread regular act. would be done inline and be part of total cycles).

size32_t rowSize = thorRowMemoryFootprint(serializer, row);

bool waitForSpace = false;
// only allow a single writer at a time, so only a single thread is waiting on the semaphore - otherwise signal() takes a very long time
{

BlockedActivityTimer timer(activity.getActivityTimerAccumulator(), activity.queryTimeActivities());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why measure this blocking time (on crit), but not on waitSem blockage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this time should be considered blocked at all.

As it stands, if there are 3 inputs to the funnel, the 1st will gain this crit instantly, and then (if full) hold the crit and wait on the fullSem semaphore (but will not count toward blocked time)
The other 2 input handlers will both block here and count toward blocked time.
If there are 100 inputs, 99 will add blocked time.

But I'm not sure any should consider this blocked time, they are not "blocking" the downstream act except for the very small amount of time they spend in the crit adding to 'rows'.
If they (the input handlers) are blocked on fullSem it's because the downstream act isn't pulling or pulling fast enough - this act. is not blocking it.

@shamser shamser requested a review from jakesmith October 22, 2024 09:00
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - please see comments.

It would be helpful in these less than obvious cases, if there was a comment explaining the logic of why lookahead or blocked - and how it will factor in to the local cycles calculation.

started = true;
inputStream = funnel.activity.queryInputStream(inputIndex);
while (!stopping)
{
numRows = 0;
for (;numRows < chunkSize; numRows++)
{
LookAheadTimer timer(funnel.activity.getActivityTimerAccumulator(), funnel.activity.queryTimeActivities());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for efficiency should move this outside of the for loop.

@@ -198,6 +201,7 @@ class CParallelFunnel : implements IRowStream, public CSimpleInterface
if (waitForSpace)
{
CriticalBlock b(writerCrit);
BlockedActivityTimer timer(activity.getActivityTimerAccumulator(), activity.queryTimeActivities());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not sure about this. My previous comment [https://github.com//pull/19164#discussion_r1806771480]:

But I'm not sure any should consider this blocked time, they are not "blocking" the downstream act except for the very small amount of time they spend in the crit adding to 'rows'.
If they (the input handlers) are blocked on fullSem it's because the downstream act isn't pulling or pulling fast enough - this act. is not blocking it.

It is not blocking the downstream act. at this point, it is blocked because it is full, because the downstream act. hasn't pulled enough out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That makes sense.

OwnedConstThorRow row = rows.dequeue();
timer.leave();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this blocked time will be the time blocked because the input-handlers haven't started or kept up and pushed rows in time.
I think this is correct, with lookahead time tracking the time it spent for input, that time (that will be added to totaltime)

  • blocked time is any time it spent dawdling waiting for the input handlers to catch up and push more.

I don't think the leave() mechanism is useful here though (and would be better to be removed until/if it really is).
It would better to scope the BlockedActivityTimer as normal.

try { startInput(i); }
try
{
LookAheadTimer timer(slaveTimerStats, timeActivities);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not include for i == 0, because it is not called async/on a separate, see deferred comment below

@shamser shamser requested a review from jakesmith October 29, 2024 10:19
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - 1 minor comment, please look at and squash.

@@ -384,7 +383,8 @@ class FunnelSlaveActivity : public CSlaveActivity
{
try
{
LookAheadTimer timer(slaveTimerStats, timeActivities);
// n.b. i>0 is started asynchronously, so track look ahead time
LookAheadTimer timer(slaveTimerStats, (i==0) ? false : timeActivities);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: better to avoid constructing LookAheadTimer and coding as something like:

if (i == 0) // 1st input is started synchronously, so time already included in start() timing.
    startInput(i);
else
{
    LookAheadTimer timer(slaveTimerStats, timeActivities);
    startInput(i);
}

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamser - looks good.

@ghalliday ghalliday merged commit 5c6a5ef into hpcc-systems:master Nov 11, 2024
53 checks passed
Copy link

Jirabot Action Result:
Added fix version: 9.10.0
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants