-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: use FULL OUTER JOIN
for dimension-only queries
#863
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
@@ -80,49 +80,7 @@ class DataflowRecipe: | |||
@property | |||
def join_targets(self) -> List[JoinDescription]: | |||
"""Joins to be made to source node.""" | |||
join_targets = [] | |||
for join_recipe in self.join_linkable_instances_recipes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this logic to JoinLinkableInstancesRecipe
, which seemed cleaner & more appropriate.
@@ -71,12 +72,36 @@ class JoinLinkableInstancesRecipe: | |||
@property | |||
def join_description(self) -> JoinDescription: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to this PR, this attribute was not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lolwut
FULL OUTER JOIN
for dimension-only queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thank you!
@@ -71,12 +72,36 @@ class JoinLinkableInstancesRecipe: | |||
@property | |||
def join_description(self) -> JoinDescription: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lolwut
@@ -306,6 +336,7 @@ def evaluate_node( | |||
self, | |||
start_node: BaseOutput, | |||
required_linkable_specs: Sequence[LinkableInstanceSpec], | |||
default_join_type: SqlJoinType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this default_join_type
? Does it get overridden somehow? Or should it be called join_type
here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in preparation for my next PR - we'll need to use CROSS JOIN
when querying metric_time
with other dimensions (but no metrics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which will override the default join type for dimension-only queries.
right_source_alias=right_data_set.alias, | ||
column_equality_descriptions=[], | ||
join_type=join_description.join_type, | ||
additional_on_conditions=validity_conditions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want validity_conditions here? Or do we just want to do the full CROSS_JOIN operation and not worry about validity windows?
I guess maybe we want validity windows, but that implicitly requires metric_time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll confess I don't know anything about the validity conditions, but that behavior was not changed in this PR!
Also if you remember please squash and merge this one - I read commit-wise at first but the final PR should be cleaner in the local git history. Thank you! |
Makes sense, ship it! |
Description
Realized that we were using
LEFT OUTER JOIN
for all queries usingJoinToBaseOutputNode
. This made sense for metric queries, but not for dimension-only queries. When querying dimensions, we should use aFULL OUTER JOIN
since we want to see all possible combinations of dimension values (including nulls). With theLEFT OUTER JOIN
we were arbitrarily choosing one dimension's node as the source node and potentially eliminating values from all other dimensions.