-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedupe qualified name for GroupByMetricSpecs
#1171
Conversation
e645799
to
49e55b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have three concerns:
- This naming expansion only works if the link paths are different, but if they're the same it won't help. I don't know how often they'll be different but it seems like a bunch of cases will slip through here.
- The optimized SQL isn't applying the updated aliases. Does this indicate that we'll still have problems if there are duplicate metric names?
- This is outside the scope of this PR specifically, but it seems to me that we need a robust aliasing approach for these group by metrics. I don't have great ideas here - the group by item resolver is probably too fine-grained to the metric element in the filter/group by list (so if it's the same metric in group by and filter, or used twice in the filter, we may end up with multiple aliases when those should be collapsed).
🤔 I don't think I'm understanding this concern. In what scenario would we have duplicates?
🤦♀️🤦♀️🤦♀️ Ok, so the snapshots that looked weird in #1166 were just things that got added to the wrong PR. I think I must have made a commit at some point while in the middle of generating snapshots, so a few of the snapshots got updated in the wrong commit. So you don't see those updates here because they're lower in the stack. I need to rebase this one now that I've removed those snapshots, and then update the snapshots again here. |
1e60f2e
to
78beac2
Compare
49e55b9
to
4ba6968
Compare
It's entirely possible I'm just completely confused here, but if I have, say, a ratio metric that uses the same metric filter reference we'd end up with duplicates in the query output because they are actually the same, and expanding the names like this doesn't help that unless the join paths are different between the group by metric and each of those ratio metrics. I don't know how common it will be for those join paths to be different, but it's not clear to me that making these long chain sequence names is worth it. That said, it just occurred to me that we only allow this in the where filter. As long as we don't expose this naming scheme in the output table schemas it doesn't matter much one way or the other - but we'd need to make sure we remove it before allowing group by usage.
The snapshots are still showing the same issue. The comment has the expanded alias, but the SQL itself does not. Is that expected? |
@@ -24,7 +24,7 @@ FROM ( | |||
FROM ***************************.dim_lux_listing_id_mapping lux_listing_mapping_src_28000 | |||
) subq_4 | |||
FULL OUTER JOIN ( | |||
-- Pass Only Elements: ['listing', 'bookings'] | |||
-- Pass Only Elements: ['listing', 'listing__bookings'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like here we have the link, but on line 30 we do not. I don't think that's common with FilterElementsNode accesses - it typically matches, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry I haven't been able to update snapshots yet! Slowly rebasing up the stack 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok good catch on this!! I wasn't using the qualified name as the column name for group by metrics. Wasn't breaking anything, except our typical column naming convention 😉 I added a commit to fix that.
78beac2
to
b0a1f16
Compare
4ba6968
to
837d175
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, LGTM, and nice find on the weird entity link omission pieces.
Let's run with this updated naming for now while we sort out the aliasing situation and whatever all else might be needed to enable using these things in group by statements. I don't think we want the nomenclature here exposed to end users on the output side, but that isn't going to happen just yet.
entity_link_names=tuple(x.element_name for x in dimension_spec.entity_links), | ||
element_name=dimension_spec.element_name, | ||
).qualified_name, | ||
column_name=dimension_spec.qualified_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I had just ripped this whole class out when I first noticed it none of this would've happened. Lesson learned. Thanks for making the property access nature of this class even more obvious. :)
b0a1f16
to
05a685f
Compare
Previously we were duplicating this logic, which made it easy to overlook updating the column name to use the qualified name. I updated this for all specs so that it will update automatically in the future. You can see in the snapshots that this changed nothing for most specs, except group by metrics (as expected).
fad0af9
to
bf1312c
Compare
Description
Typically, a spec's qualified name is
<entity_links>__<element_name>
. In the case ofGroupByMetricSpecs
, this name isn't guaranteed to be unique. That's because themetric_subquery_entity_links
are not accounted for. This will also be a problem in SQL generation, since you could end up with a query that references two different group by metrics by the same column name. To avoid that, here I added some deduping logic for the qualified name.entity_links
matchesmetric_subquery_entity_links
, use the standard qualified name.<entity_links>__<metric_subquery_entity_links>__<element_name>
. You can still tell where one path ends and the next begins because they will both end with the same entity link, which is the last link before the element name.This is definitely ugly!! I am suggesting any alternate suggestions to make this work.