-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove duplicated WhereConstraintNodes in predicate pushdown #1304
Conversation
@@ -48,71 +46,55 @@ FROM ( | |||
ON | |||
subq_36.listing = listings_latest_src_28000.listing_id | |||
) subq_41 | |||
WHERE (listing__is_lux_latest) AND (booking__is_instant) | |||
WHERE listing__is_lux_latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note - this is an example of removing part of a filter expression - the booking__is_instant filter was applied via pushdown (in the collapsed code section above) on the left side of the join, and we no longer repeat it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet 😎
3aaca4b
to
3c87606
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -48,71 +46,55 @@ FROM ( | |||
ON | |||
subq_36.listing = listings_latest_src_28000.listing_id | |||
) subq_41 | |||
WHERE (listing__is_lux_latest) AND (booking__is_instant) | |||
WHERE listing__is_lux_latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet 😎
cb955d2
to
53767dd
Compare
3c87606
to
981595e
Compare
53767dd
to
eeb1068
Compare
The original implementation of predicate pushdown generated duplicated where filter constraints, applying it once as a result of pushdown and again in the original where constraint node. This change removes the duplication, and does so at the level of the individual filter spec. Logically, this means the following: 1. If all filter specs for a where constraint node can be pushed down, the node itself will be moved past the join. 2. If some filter specs for a where constraint node can be pushed down, the node itself will remain in place, but it will only apply the filters that cannot be pushed down. 3. Any WhereConstraintNode added to the DataflowPlan for the purposes of cleaning up spurious rows added as the result of an outer join, such as we have in one particular JoinToTimeSpineNode scenario, must be annotated to indicate that we should apply that constraint filter no matter what. This change updates all of our logic to ensure that we effectively apply each where filter exactly once within a branch in the DAG, and only re-apply filters where explicitly requested at build time. Note this change is all based on certain assumptions about how the DataflowPlanBuilder constructs the plan. Specifically, we assume that we will never need to re-apply filters upstream of a node unless otherwise indicated. If we start to do fancy things with nesting where constraints or swapping join sides - particularly if we push groupable metric filters to an inner join or if we begin adding query time constraints outside of an aggregated output join - this could get dodgy. Conversion metrics are of particular concern here.
Due to an oversight the functionally_identical and with_new_parents logic was not updated on the WhereConstraintNode to account for the always_apply property. In isolation this doesn't affect the predicate pushdown optimizer, but in conjunction with the source scan optimizer this causes problems. Rather than merge something broken I'm amending this PR.
981595e
to
bb4eda4
Compare
The original implementation of predicate pushdown generated duplicated
where filter constraints, applying it once as a result of pushdown and
again in the original where constraint node.
This change removes the duplication, and does so at the level of the
individual filter spec. Logically, this means the following:
the node itself will be moved past the join.
the node itself will remain in place, but it will only apply the filters
that cannot be pushed down.
cleaning up spurious rows added as the result of an outer join, such as
we have in one particular JoinToTimeSpineNode scenario, must be annotated
to indicate that we should apply that constraint filter no matter what.
This change updates all of our logic to ensure that we effectively apply
each where filter exactly once within a branch in the DAG, and only re-apply
filters where explicitly requested at build time.
Note this change is all based on certain assumptions about how the
DataflowPlanBuilder constructs the plan. Specifically, we assume that we will
never need to re-apply filters upstream of a node unless otherwise
indicated. If we start to do fancy things with nesting where constraints
or swapping join sides - particularly if we push groupable metric filters
to an inner join or if we begin adding query time constraints outside of
an aggregated output join - this could get dodgy. Conversion metrics are
of particular concern here.