Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicated WhereConstraintNodes in predicate pushdown #1304

Merged
merged 3 commits into from
Jun 26, 2024

Conversation

tlento
Copy link
Contributor

@tlento tlento commented Jun 25, 2024

The original implementation of predicate pushdown generated duplicated
where filter constraints, applying it once as a result of pushdown and
again in the original where constraint node.

This change removes the duplication, and does so at the level of the
individual filter spec. Logically, this means the following:

  1. If all filter specs for a where constraint node can be pushed down,
    the node itself will be moved past the join.
  2. If some filter specs for a where constraint node can be pushed down,
    the node itself will remain in place, but it will only apply the filters
    that cannot be pushed down.
  3. Any WhereConstraintNode added to the DataflowPlan for the purposes of
    cleaning up spurious rows added as the result of an outer join, such as
    we have in one particular JoinToTimeSpineNode scenario, must be annotated
    to indicate that we should apply that constraint filter no matter what.

This change updates all of our logic to ensure that we effectively apply
each where filter exactly once within a branch in the DAG, and only re-apply
filters where explicitly requested at build time.

Note this change is all based on certain assumptions about how the
DataflowPlanBuilder constructs the plan. Specifically, we assume that we will
never need to re-apply filters upstream of a node unless otherwise
indicated. If we start to do fancy things with nesting where constraints
or swapping join sides - particularly if we push groupable metric filters
to an inner join or if we begin adding query time constraints outside of
an aggregated output join - this could get dodgy. Conversion metrics are
of particular concern here.

@cla-bot cla-bot bot added the cla:yes label Jun 25, 2024
@tlento tlento changed the title Remove duplicated WhereConstraintNodes in predicate pushdow Remove duplicated WhereConstraintNodes in predicate pushdown Jun 25, 2024
@@ -48,71 +46,55 @@ FROM (
ON
subq_36.listing = listings_latest_src_28000.listing_id
) subq_41
WHERE (listing__is_lux_latest) AND (booking__is_instant)
WHERE listing__is_lux_latest
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - this is an example of removing part of a filter expression - the booking__is_instant filter was applied via pushdown (in the collapsed code section above) on the left side of the join, and we no longer repeat it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet 😎

@tlento tlento added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 25, 2024
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 19:36 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 19:36 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 19:36 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 19:36 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 25, 2024
Copy link
Contributor Author

tlento commented Jun 25, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlento and the rest of your teammates on Graphite Graphite

@tlento tlento force-pushed the remove-pushdown-duplicated-where-constraints branch 2 times, most recently from 3aaca4b to 3c87606 Compare June 25, 2024 21:28
@tlento tlento added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 25, 2024
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 22:28 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 22:28 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 22:28 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS June 25, 2024 22:28 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jun 25, 2024
Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -48,71 +46,55 @@ FROM (
ON
subq_36.listing = listings_latest_src_28000.listing_id
) subq_41
WHERE (listing__is_lux_latest) AND (booking__is_instant)
WHERE listing__is_lux_latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet 😎

@tlento tlento force-pushed the 06-24-add_direct_tests_for_predicate_pushdown_optimizer branch from cb955d2 to 53767dd Compare June 26, 2024 00:14
@tlento tlento force-pushed the remove-pushdown-duplicated-where-constraints branch from 3c87606 to 981595e Compare June 26, 2024 00:14
Copy link
Contributor Author

tlento commented Jun 26, 2024

Merge activity

  • Jun 25, 5:24 PM PDT: @tlento started a stack merge that includes this pull request via Graphite.
  • Jun 25, 5:40 PM PDT: Graphite rebased this pull request as part of a merge.
  • Jun 25, 5:44 PM PDT: @tlento merged this pull request with Graphite.

@tlento tlento force-pushed the 06-24-add_direct_tests_for_predicate_pushdown_optimizer branch from 53767dd to eeb1068 Compare June 26, 2024 00:34
Base automatically changed from 06-24-add_direct_tests_for_predicate_pushdown_optimizer to main June 26, 2024 00:39
tlento added 3 commits June 26, 2024 00:39
The original implementation of predicate pushdown generated duplicated
where filter constraints, applying it once as a result of pushdown and
again in the original where constraint node.

This change removes the duplication, and does so at the level of the
individual filter spec. Logically, this means the following:

1. If all filter specs for a where constraint node can be pushed down,
the node itself will be moved past the join.
2. If some filter specs for a where constraint node can be pushed down,
the node itself will remain in place, but it will only apply the filters
that cannot be pushed down.
3. Any WhereConstraintNode added to the DataflowPlan for the purposes of
cleaning up spurious rows added as the result of an outer join, such as
we have in one particular JoinToTimeSpineNode scenario, must be annotated
to indicate that we should apply that constraint filter no matter what.

This change updates all of our logic to ensure that we effectively apply
each where filter exactly once within a branch in the DAG, and only re-apply
filters where explicitly requested at build time.

Note this change is all based on certain assumptions about how the
DataflowPlanBuilder constructs the plan. Specifically, we assume that we will
never need to re-apply filters upstream of a node unless otherwise
indicated. If we start to do fancy things with nesting where constraints
or swapping join sides - particularly if we push groupable metric filters
to an inner join or if we begin adding query time constraints outside of
an aggregated output join - this could get dodgy. Conversion metrics are
of particular concern here.
Due to an oversight the functionally_identical and with_new_parents
logic was not updated on the WhereConstraintNode to account for the
always_apply property.

In isolation this doesn't affect the predicate pushdown optimizer, but
in conjunction with the source scan optimizer this causes problems.

Rather than merge something broken I'm amending this PR.
@tlento tlento force-pushed the remove-pushdown-duplicated-where-constraints branch from 981595e to bb4eda4 Compare June 26, 2024 00:39
@tlento tlento merged commit 981b396 into main Jun 26, 2024
15 checks passed
@tlento tlento deleted the remove-pushdown-duplicated-where-constraints branch June 26, 2024 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants