Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track and propagate applied where filter specs to outer plan nodes #1302

Merged
merged 3 commits into from
Jun 26, 2024

Conversation

tlento
Copy link
Contributor

@tlento tlento commented Jun 25, 2024

The PredicatePushdownOptimizer currently pushes predicates down
along the DataflowPlan DAG from the outermost WhereConstraintNode to
as close to the source node for that branch as possible. This results
in duplicate where filter application, because the WhereConstraintNode
does not have any way of evaluating whether or not a given set of
where filter specs could be applied downstream.

This change adds the tracking mechanism for propagating the filters
applied back up along the branch. As of now this is a tracking-change
only - the selective application of these filters will follow shortly.

In addition to the added test cases for the propagation mechanism, the
propagation mechanics were observed via testing several pushdown-enabled
rendering tests with the log-cli-level=DEBUG flag set in pytest.

Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link
Contributor Author

tlento commented Jun 25, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlento and the rest of your teammates on Graphite Graphite

@tlento tlento force-pushed the simplify-pushdown-state-tracking branch from dc7ee4f to eaf6079 Compare June 25, 2024 05:21
@tlento tlento force-pushed the propagate-applied-filter-specs-upwards branch from 5542165 to 63efcda Compare June 25, 2024 05:21
time_range_constraint=TimeRangeConstraint.all_time(),
pushdown_enabled_types=frozenset(),
where_filter_specs=tuple(),
)
Copy link
Contributor Author

@tlento tlento Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the branch_state_tracker method, everything above this line existed in the previous file.


This is necessary only for cases where we wish to back-propagate some updated state attribute
for handling in the exit condition of the preceding node in the DAG. Since it is something of an
extraordinary circumstance we designate it as a special method rather than making it a property setter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate / describe examples of the extraordinary circumstance? I see use cases here, but not sure about the context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the use case is in this PR. I can update the comment here to be a bit more explicit, but it pretty much boils down to this:

You generally don't want to mess with branch state tracking in these recursive graph walks, and if you do it by accident it'll be really hard to reason about, so I make it an explicit method call instead of an assignment operator. Something like tracker.last_pushdown_state = current_pushdown_state is easier to screw up - or at least leave unexamined - than tracker.override_last_pushdown_state(current_pushdown_state).

Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining so thoroughly 🙏

if len(self._current_branch_state) > 0:
return self._current_branch_state[-1]
return self._initial_state
return self._current_branch_state[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any concern about KeyError handling here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because there's always a value set since we initialize it non-empty, but I should put in an assertion guard so we get a useful error message if anybody changes that.

Copy link
Contributor Author

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'll add some assertion guards to the pushdown state tracker accessors before merge.

I do think there's got to be a better way to explain what's happening here. I'll see if I can figure that out later.

if len(self._current_branch_state) > 0:
return self._current_branch_state[-1]
return self._initial_state
return self._current_branch_state[-1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because there's always a value set since we initialize it non-empty, but I should put in an assertion guard so we get a useful error message if anybody changes that.

@tlento tlento force-pushed the simplify-pushdown-state-tracking branch from eaf6079 to 11f049a Compare June 26, 2024 00:14
@tlento tlento force-pushed the propagate-applied-filter-specs-upwards branch from 3d9b148 to 2f94016 Compare June 26, 2024 00:14
Copy link
Contributor Author

tlento commented Jun 26, 2024

Merge activity

  • Jun 25, 5:24 PM PDT: @tlento started a stack merge that includes this pull request via Graphite.
  • Jun 25, 5:30 PM PDT: Graphite rebased this pull request as part of a merge.
  • Jun 25, 5:34 PM PDT: @tlento merged this pull request with Graphite.

@tlento tlento force-pushed the simplify-pushdown-state-tracking branch from 11f049a to 677f26e Compare June 26, 2024 00:24
Base automatically changed from simplify-pushdown-state-tracking to main June 26, 2024 00:29
tlento added 3 commits June 26, 2024 00:30
The PredicatePushdownOptimizer currently pushes predicates down
along the DataflowPlan DAG from the outermost WhereConstraintNode to
as close to the source node for that branch as possible. This results
in duplicate where filter application, because the WhereConstraintNode
does not have any way of evaluating whether or not a given set of
where filter specs could be applied downstream.

This change adds the tracking mechanism for propagating the filters
applied back up along the branch. As of now this is a tracking-change
only - the selective application of these filters will follow shortly.

In addition to the added test cases for the propagation mechanism, the
propagation mechanics were observed via testing several pushdown-enabled
rendering tests with the `log-cli-level=DEBUG` flag set in pytest.
Added some assertions to make it more obvious what the current
branch state expectations are, and greatly expanded the documentation
of current behavior.

There is an update to the order of operations in the join handling nodes
as well, which aligns the processing with the expanded documentation in
the state tracking object.
@tlento tlento force-pushed the propagate-applied-filter-specs-upwards branch from 2f94016 to cac8451 Compare June 26, 2024 00:30
@tlento tlento merged commit d61835b into main Jun 26, 2024
15 checks passed
@tlento tlento deleted the propagate-applied-filter-specs-upwards branch June 26, 2024 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants