Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: use FULL OUTER JOIN for dimension-only queries #863

Merged
merged 7 commits into from
Nov 14, 2023

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Nov 10, 2023

Description

Realized that we were using LEFT OUTER JOIN for all queries using JoinToBaseOutputNode. This made sense for metric queries, but not for dimension-only queries. When querying dimensions, we should use a FULL OUTER JOIN since we want to see all possible combinations of dimension values (including nulls). With the LEFT OUTER JOIN we were arbitrarily choosing one dimension's node as the source node and potentially eliminating values from all other dimensions.

@cla-bot cla-bot bot added the cla:yes label Nov 10, 2023
Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@courtneyholcomb courtneyholcomb changed the title Add join_type to JoinDescription Bug fix: use FULL OUTER JOIN for dimension-only queries Nov 10, 2023
@@ -80,49 +80,7 @@ class DataflowRecipe:
@property
def join_targets(self) -> List[JoinDescription]:
"""Joins to be made to source node."""
join_targets = []
for join_recipe in self.join_linkable_instances_recipes:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this logic to JoinLinkableInstancesRecipe, which seemed cleaner & more appropriate.

@@ -71,12 +72,36 @@ class JoinLinkableInstancesRecipe:
@property
def join_description(self) -> JoinDescription:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this PR, this attribute was not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lolwut

@courtneyholcomb courtneyholcomb changed the title Bug fix: use FULL OUTER JOIN for dimension-only queries Bug fix: use FULL OUTER JOIN for dimension-only queries Nov 11, 2023
@courtneyholcomb courtneyholcomb marked this pull request as ready for review November 11, 2023 00:20
@courtneyholcomb courtneyholcomb requested review from tlento, plypaul and WilliamDee and removed request for tlento and plypaul November 11, 2023 00:20
@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 11, 2023
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 11, 2023
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thank you!

@@ -71,12 +72,36 @@ class JoinLinkableInstancesRecipe:
@property
def join_description(self) -> JoinDescription:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lolwut

@@ -306,6 +336,7 @@ def evaluate_node(
self,
start_node: BaseOutput,
required_linkable_specs: Sequence[LinkableInstanceSpec],
default_join_type: SqlJoinType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this default_join_type? Does it get overridden somehow? Or should it be called join_type here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in preparation for my next PR - we'll need to use CROSS JOIN when querying metric_time with other dimensions (but no metrics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which will override the default join type for dimension-only queries.

right_source_alias=right_data_set.alias,
column_equality_descriptions=[],
join_type=join_description.join_type,
additional_on_conditions=validity_conditions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want validity_conditions here? Or do we just want to do the full CROSS_JOIN operation and not worry about validity windows?

I guess maybe we want validity windows, but that implicitly requires metric_time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll confess I don't know anything about the validity conditions, but that behavior was not changed in this PR!

metricflow/plan_conversion/dataflow_to_sql.py Show resolved Hide resolved
@tlento
Copy link
Contributor

tlento commented Nov 14, 2023

Also if you remember please squash and merge this one - I read commit-wise at first but the final PR should be cleaner in the local git history. Thank you!

@tlento
Copy link
Contributor

tlento commented Nov 14, 2023

Makes sense, ship it!

@courtneyholcomb courtneyholcomb merged commit 6dd0fd7 into main Nov 14, 2023
21 checks passed
@courtneyholcomb courtneyholcomb deleted the court/full-join-dims branch November 14, 2023 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants