Track and propagate applied where filter specs to outer plan nodes #1302

tlento · 2024-06-25T03:13:10Z

The PredicatePushdownOptimizer currently pushes predicates down
along the DataflowPlan DAG from the outermost WhereConstraintNode to
as close to the source node for that branch as possible. This results
in duplicate where filter application, because the WhereConstraintNode
does not have any way of evaluating whether or not a given set of
where filter specs could be applied downstream.

This change adds the tracking mechanism for propagating the filters
applied back up along the branch. As of now this is a tracking-change
only - the selective application of these filters will follow shortly.

In addition to the added test cases for the propagation mechanism, the
propagation mechanics were observed via testing several pushdown-enabled
rendering tests with the log-cli-level=DEBUG flag set in pytest.

github-actions · 2024-06-25T03:13:28Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

tlento · 2024-06-25T03:13:35Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlento and the rest of your teammates on Graphite

tlento · 2024-06-25T19:22:45Z

tests_metricflow/dataflow/optimizer/test_predicate_pushdown_optimizer.py

+            time_range_constraint=TimeRangeConstraint.all_time(),
+            pushdown_enabled_types=frozenset(),
+            where_filter_specs=tuple(),
+        )


Apart from the branch_state_tracker method, everything above this line existed in the previous file.

plypaul · 2024-06-25T22:31:07Z

metricflow/dataflow/optimizer/predicate_pushdown_optimizer.py

+
+        This is necessary only for cases where we wish to back-propagate some updated state attribute
+        for handling in the exit condition of the preceding node in the DAG. Since it is something of an
+        extraordinary circumstance we designate it as a special method rather than making it a property setter.


Can you elaborate / describe examples of the extraordinary circumstance? I see use cases here, but not sure about the context.

So the use case is in this PR. I can update the comment here to be a bit more explicit, but it pretty much boils down to this:

You generally don't want to mess with branch state tracking in these recursive graph walks, and if you do it by accident it'll be really hard to reason about, so I make it an explicit method call instead of an assignment operator. Something like tracker.last_pushdown_state = current_pushdown_state is easier to screw up - or at least leave unexamined - than tracker.override_last_pushdown_state(current_pushdown_state).

courtneyholcomb

Thanks for explaining so thoroughly 🙏

courtneyholcomb · 2024-06-25T22:10:50Z

metricflow/dataflow/optimizer/predicate_pushdown_optimizer.py

-        if len(self._current_branch_state) > 0:
-            return self._current_branch_state[-1]
-        return self._initial_state
+        return self._current_branch_state[-1]


any concern about KeyError handling here?

No, because there's always a value set since we initialize it non-empty, but I should put in an assertion guard so we get a useful error message if anybody changes that.

tlento

Thank you! I'll add some assertion guards to the pushdown state tracker accessors before merge.

I do think there's got to be a better way to explain what's happening here. I'll see if I can figure that out later.

tlento · 2024-06-25T23:47:31Z

metricflow/dataflow/optimizer/predicate_pushdown_optimizer.py

-        if len(self._current_branch_state) > 0:
-            return self._current_branch_state[-1]
-        return self._initial_state
+        return self._current_branch_state[-1]


No, because there's always a value set since we initialize it non-empty, but I should put in an assertion guard so we get a useful error message if anybody changes that.

tlento · 2024-06-26T00:24:16Z

Merge activity

Jun 25, 5:24 PM PDT: @tlento started a stack merge that includes this pull request via Graphite.
Jun 25, 5:30 PM PDT: Graphite rebased this pull request as part of a merge.
Jun 25, 5:34 PM PDT: @tlento merged this pull request with Graphite.

The PredicatePushdownOptimizer currently pushes predicates down along the DataflowPlan DAG from the outermost WhereConstraintNode to as close to the source node for that branch as possible. This results in duplicate where filter application, because the WhereConstraintNode does not have any way of evaluating whether or not a given set of where filter specs could be applied downstream. This change adds the tracking mechanism for propagating the filters applied back up along the branch. As of now this is a tracking-change only - the selective application of these filters will follow shortly. In addition to the added test cases for the propagation mechanism, the propagation mechanics were observed via testing several pushdown-enabled rendering tests with the `log-cli-level=DEBUG` flag set in pytest.

Added some assertions to make it more obvious what the current branch state expectations are, and greatly expanded the documentation of current behavior. There is an update to the order of operations in the join handling nodes as well, which aligns the processing with the expanded documentation in the state tracking object.

cla-bot bot added the cla:yes label Jun 25, 2024

tlento mentioned this pull request Jun 25, 2024

Simplify predicate pushdown state tracking #1301

Merged

tlento requested review from courtneyholcomb and plypaul June 25, 2024 03:23

tlento added the Skip Changelog label Jun 25, 2024

tlento force-pushed the simplify-pushdown-state-tracking branch from dc7ee4f to eaf6079 Compare June 25, 2024 05:21

tlento force-pushed the propagate-applied-filter-specs-upwards branch from 5542165 to 63efcda Compare June 25, 2024 05:21

tlento mentioned this pull request Jun 25, 2024

Add direct tests for predicate pushdown optimizer #1303

Merged

tlento commented Jun 25, 2024

View reviewed changes

tlento mentioned this pull request Jun 25, 2024

Enable PredicatePushdownOptimization for all MetricFlowEngine queries #1308

Merged

plypaul reviewed Jun 25, 2024

View reviewed changes

courtneyholcomb approved these changes Jun 25, 2024

View reviewed changes

tlento commented Jun 25, 2024

View reviewed changes

tlento force-pushed the simplify-pushdown-state-tracking branch from eaf6079 to 11f049a Compare June 26, 2024 00:14

tlento force-pushed the propagate-applied-filter-specs-upwards branch from 3d9b148 to 2f94016 Compare June 26, 2024 00:14

tlento force-pushed the simplify-pushdown-state-tracking branch from 11f049a to 677f26e Compare June 26, 2024 00:24

Base automatically changed from simplify-pushdown-state-tracking to main June 26, 2024 00:29

tlento added 3 commits June 26, 2024 00:30

Clarify docstring on override_last_pushdown_state method

036cdb2

tlento force-pushed the propagate-applied-filter-specs-upwards branch from 2f94016 to cac8451 Compare June 26, 2024 00:30

tlento merged commit d61835b into main Jun 26, 2024
15 checks passed

tlento deleted the propagate-applied-filter-specs-upwards branch June 26, 2024 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track and propagate applied where filter specs to outer plan nodes #1302

Track and propagate applied where filter specs to outer plan nodes #1302

tlento commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

tlento commented Jun 25, 2024 •

edited

Loading

tlento Jun 25, 2024 •

edited

Loading

plypaul Jun 25, 2024

tlento Jun 25, 2024

courtneyholcomb left a comment

courtneyholcomb Jun 25, 2024

tlento Jun 25, 2024

tlento left a comment

tlento Jun 25, 2024

tlento commented Jun 26, 2024 •

edited

Loading

Track and propagate applied where filter specs to outer plan nodes #1302

Track and propagate applied where filter specs to outer plan nodes #1302

Conversation

tlento commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

tlento commented Jun 25, 2024 • edited Loading

tlento Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

plypaul Jun 25, 2024

Choose a reason for hiding this comment

tlento Jun 25, 2024

Choose a reason for hiding this comment

courtneyholcomb left a comment

Choose a reason for hiding this comment

courtneyholcomb Jun 25, 2024

Choose a reason for hiding this comment

tlento Jun 25, 2024

Choose a reason for hiding this comment

tlento left a comment

Choose a reason for hiding this comment

tlento Jun 25, 2024

Choose a reason for hiding this comment

tlento commented Jun 26, 2024 • edited Loading

Merge activity

tlento commented Jun 25, 2024 •

edited

Loading

tlento Jun 25, 2024 •

edited

Loading

tlento commented Jun 26, 2024 •

edited

Loading