-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track and propagate applied where filter specs to outer plan nodes #1302
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
dc7ee4f
to
eaf6079
Compare
5542165
to
63efcda
Compare
time_range_constraint=TimeRangeConstraint.all_time(), | ||
pushdown_enabled_types=frozenset(), | ||
where_filter_specs=tuple(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the branch_state_tracker method, everything above this line existed in the previous file.
|
||
This is necessary only for cases where we wish to back-propagate some updated state attribute | ||
for handling in the exit condition of the preceding node in the DAG. Since it is something of an | ||
extraordinary circumstance we designate it as a special method rather than making it a property setter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate / describe examples of the extraordinary circumstance
? I see use cases here, but not sure about the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the use case is in this PR. I can update the comment here to be a bit more explicit, but it pretty much boils down to this:
You generally don't want to mess with branch state tracking in these recursive graph walks, and if you do it by accident it'll be really hard to reason about, so I make it an explicit method call instead of an assignment operator. Something like tracker.last_pushdown_state = current_pushdown_state
is easier to screw up - or at least leave unexamined - than tracker.override_last_pushdown_state(current_pushdown_state)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining so thoroughly 🙏
if len(self._current_branch_state) > 0: | ||
return self._current_branch_state[-1] | ||
return self._initial_state | ||
return self._current_branch_state[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any concern about KeyError
handling here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because there's always a value set since we initialize it non-empty, but I should put in an assertion guard so we get a useful error message if anybody changes that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll add some assertion guards to the pushdown state tracker accessors before merge.
I do think there's got to be a better way to explain what's happening here. I'll see if I can figure that out later.
if len(self._current_branch_state) > 0: | ||
return self._current_branch_state[-1] | ||
return self._initial_state | ||
return self._current_branch_state[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because there's always a value set since we initialize it non-empty, but I should put in an assertion guard so we get a useful error message if anybody changes that.
eaf6079
to
11f049a
Compare
3d9b148
to
2f94016
Compare
11f049a
to
677f26e
Compare
The PredicatePushdownOptimizer currently pushes predicates down along the DataflowPlan DAG from the outermost WhereConstraintNode to as close to the source node for that branch as possible. This results in duplicate where filter application, because the WhereConstraintNode does not have any way of evaluating whether or not a given set of where filter specs could be applied downstream. This change adds the tracking mechanism for propagating the filters applied back up along the branch. As of now this is a tracking-change only - the selective application of these filters will follow shortly. In addition to the added test cases for the propagation mechanism, the propagation mechanics were observed via testing several pushdown-enabled rendering tests with the `log-cli-level=DEBUG` flag set in pytest.
Added some assertions to make it more obvious what the current branch state expectations are, and greatly expanded the documentation of current behavior. There is an update to the order of operations in the join handling nodes as well, which aligns the processing with the expanded documentation in the state tracking object.
2f94016
to
cac8451
Compare
The PredicatePushdownOptimizer currently pushes predicates down
along the DataflowPlan DAG from the outermost WhereConstraintNode to
as close to the source node for that branch as possible. This results
in duplicate where filter application, because the WhereConstraintNode
does not have any way of evaluating whether or not a given set of
where filter specs could be applied downstream.
This change adds the tracking mechanism for propagating the filters
applied back up along the branch. As of now this is a tracking-change
only - the selective application of these filters will follow shortly.
In addition to the added test cases for the propagation mechanism, the
propagation mechanics were observed via testing several pushdown-enabled
rendering tests with the
log-cli-level=DEBUG
flag set in pytest.