-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group by resolution for group by metrics #1139
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
b9e443f
to
04a90a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this change seems reasonable but I don't understand why the snapshot outputs are changing so drastically. It looks like we were cutting off a lot of multi-hop joins as well. (Edit to add - actually it looks like maybe we're allowing fanout implicitly in the first step of a multi-hop entity join leading to a metric join.)
What worries me about this, apart from my general driving need to understand how things work, is that the multi-hop case might be overly broad. For example, do we have any way of asserting that we are, in fact, excluding intermediate fanout join steps?
kind: Fixes | ||
body: Refactor group by resolution for group by metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is almost absurdly nitty, but maybe this should be Under the Hood? Or else have some indication of what this fixes?
) -> Sequence[SemanticModel]: | ||
# May switch to non-cached implementation. | ||
semantic_models = self._entity_to_semantic_model[entity_reference.element_name] | ||
semantic_models = self._entity_to_semantic_model[join_by_entity_reference.element_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to your changes, but I'm baffled by that comment. Why on earth would we do that? I don't know, but clearly it was on our minds at some point!
self, semantic_model: SemanticModel, using_join_path: Optional[SemanticModelJoinPath] = None | ||
) -> LinkableElementSet: | ||
"""Get the set of linkable metrics that can be joined to this semantic model.""" | ||
properties = frozenset({LinkableElementProperty.METRIC, LinkableElementProperty.JOINED}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JOINED is new, right? Seems appropriate.
@@ -1064,12 +1062,12 @@ def create_linkable_element_set_from_join_path( | |||
|
|||
for entity in semantic_model.entities: | |||
# Avoid creating "booking_id__booking_id" | |||
if entity.reference != entity_links[-1]: | |||
if entity.reference != join_path.last_entity_link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, love this change.
# Avoid creating "booking_id__booking_id" | ||
if entity.reference != join_path.last_entity_link: | ||
# Avoid creating a cycle | ||
if entity.reference not in join_path.entity_links: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Did this work before because we have a MAX_JOIN_HOPS of 2 at the moment? Or are we cleaning up some edge cases around existing multi-hop joins here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option! I don't think there was any behavior change here due to our join hop limit.
for _ in range(self._max_entity_links): | ||
next_semantic_models_and_join_paths: List[Tuple[SemanticModel, Optional[SemanticModelJoinPath]]] = [] | ||
for semantic_model, join_path in semantic_models_and_join_paths: | ||
new_linkable_metrics = self.get_metrics_directly_joinable_to_semantic_model( | ||
semantic_model=semantic_model, using_join_path=join_path | ||
) | ||
linkable_metrics = LinkableElementSet.merge_by_path_key([linkable_metrics, new_linkable_metrics]) | ||
|
||
# Get next set of semantic models & join paths to iterate through. | ||
for entity in semantic_model.entities: | ||
for next_semantic_model in self._entity_to_semantic_model[entity.name]: | ||
if next_semantic_model.name == semantic_model.name or ( | ||
join_path and entity in join_path.entity_links | ||
): | ||
continue | ||
join_path_elements = join_path.path_elements if join_path else () | ||
new_join_path_element = SemanticModelJoinPathElement( | ||
semantic_model_reference=next_semantic_model.reference, join_on_entity=entity.reference | ||
) | ||
next_semantic_models_and_join_paths.append( | ||
(next_semantic_model, SemanticModelJoinPath(join_path_elements + (new_join_path_element,))) | ||
) | ||
semantic_models_and_join_paths = next_semantic_models_and_join_paths | ||
|
||
return linkable_metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My head hurts. Why do joins gotta be so complicaaaaaated........
"""Given the current path, generate the respective linkable elements from the last semantic model in the path.""" | ||
"""Given the current path, generate the respective linkable elements from the last semantic model in the path. | ||
|
||
Does not include LinkableMetrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? Would be good to explain that here.
expected_single_hop_metrics = [ | ||
linkable_metric | ||
for linkable_metrics in metric_lookup._linkable_spec_resolver.get_metrics_directly_joinable_to_semantic_model( | ||
semantic_model=measure_semantic_model | ||
).path_key_to_linkable_metrics.values() | ||
for linkable_metric in linkable_metrics | ||
] | ||
|
||
for expected_metric in expected_single_hop_metrics: | ||
assert expected_metric in actual_metrics | ||
assert expected_metric not in already_seen | ||
already_seen.add(expected_metric) | ||
actual_metrics.remove(expected_metric) | ||
|
||
expected_multi_hop_metrics = [] | ||
for entity in measure_semantic_model.entities: | ||
next_semantic_models = metric_lookup._linkable_spec_resolver._entity_to_semantic_model[entity.name] | ||
for next_semantic_model in next_semantic_models: | ||
if next_semantic_model.name == measure_semantic_model.name: | ||
continue | ||
join_path = SemanticModelJoinPath.from_single_element( | ||
semantic_model_reference=next_semantic_model.reference, join_on_entity=entity.reference | ||
) | ||
for ( | ||
linkable_metrics | ||
) in metric_lookup._linkable_spec_resolver.get_metrics_directly_joinable_to_semantic_model( | ||
semantic_model=next_semantic_model, | ||
using_join_path=join_path, | ||
).path_key_to_linkable_metrics.values(): | ||
for linkable_metric in linkable_metrics: | ||
expected_multi_hop_metrics.append(linkable_metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is too closely coupled to the existing implementation of the LinkableSpecResolver for my liking.
I don't have a great answer for how to go about asserting these boundaries in a more implementation-agnostic way. The only thing I can come up with is to create a semantic model just for this (or to find one that is suitably simple, the multi-hop model might work) and essentially hand-craft the expected output and assert against that.
This would be more robust to the inevitable refactoring (or replacement) awaiting the LinkableSpecResolver, so I think ti's probably worth doing something at this point.
Alternatively we can probably remove this test or replace it with yet another snapshot entry.
...t_semantic_model_container.py/tuple/test_linkable_elements_for_no_metrics_query__result0.txt
Outdated
Show resolved
Hide resolved
for entity in semantic_model.entities: | ||
for next_semantic_model in self._entity_to_semantic_model[entity.name]: | ||
if next_semantic_model.name == semantic_model.name or ( | ||
join_path and entity in join_path.entity_links | ||
): | ||
continue | ||
join_path_elements = join_path.path_elements if join_path else () | ||
new_join_path_element = SemanticModelJoinPathElement( | ||
semantic_model_reference=next_semantic_model.reference, join_on_entity=entity.reference | ||
) | ||
next_semantic_models_and_join_paths.append( | ||
(next_semantic_model, SemanticModelJoinPath(join_path_elements + (new_join_path_element,))) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this open the door to fanout on the intermediate step? Like I have a measure and that joins to the foreign listing
entity via account
in some other semantic model, and then we end up with account__listing__some_listing_metric
but there's fanout between account and listing so our metrics aren't necessarily correct.
I don't see any checks for join validity within the entity block here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right - I'll go back and fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My brain realized this problem just when I was trying to fall asleep last night 🙃
8108f1c
to
96408a8
Compare
@tlento this is ready for re-review! |
…specific group by resolution
468a7d9
to
a5e405e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this seems reasonable but I know you've got a monster update coming.
It'd be good to have the final PR with the change explain what we expect to see in the snapshot updates based on what's changing in the logic. It's really hard for me to visualize what's happening inside of the LinkableSpecResolver in general, so it'll be good to have some documentation in the PR description around what we're expecting to see and why this change is happening.
@@ -544,6 +544,7 @@ def __init__( | |||
self._entity_to_semantic_model[entity.reference.element_name].append(semantic_model) | |||
|
|||
self._metric_to_linkable_element_sets: Dict[str, List[LinkableElementSet]] = {} | |||
self._joinable_metrics_for_entities: Dict[EntityReference, Set[MetricReference]] = defaultdict(set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the location of this matters at all but if this is causing circular logic errors that's a good sign that we've put waaaaaay too much code in this initializer. Maybe we'll get rid of this class before we need to refactor this....
if using_join_path and entity_reference in using_join_path.entity_links: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the there's an extra hop and this entity points through that path, so we skip the direct join to the metric? Might be worth a clarification, or maybe it's possible to illustrate this behavior in a test somehow.
We could do this with a variable name, like:
is_multi_hop_entity = using_join_path and entity_reference in using_join_path.entity_links
if is_multi_hop_entity:
continue
@@ -521,17 +500,11 @@ def _get_linkable_element_set_for_measure( | |||
measure_semantic_model = self._get_semantic_model_for_measure(measure_reference) | |||
|
|||
elements_in_semantic_model = self._get_elements_in_semantic_model(measure_semantic_model) | |||
metrics_linked_to_semantic_model = self._get_joinable_metrics_for_semantic_model(measure_semantic_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok this has been moved into get_joined_elements. That makes sense.
Description
Refactor group by resolution for
LinkableMetrics
. This gets more explicit about what the join paths will should look like forLinkableMetrics
and updates that logic accordingly.LinkableMetrics
are different from other linkable elements for a couple of reasons because there is no such thing as a "local"LinkableMetric
. There is always a join required to get to the metric subquery. To make this clear, here we reclassifyLinkableMetrics
as either single-hop or muti-hop, where previously they were considered either local or single-hop.There was also a bug here related to how we resolved what linkable metrics are available. Previously, the code assumed that if a metric could be queried with one entity in a semantic model, it could be queried with any metric in that semantic model, but that's not true because (1) some entities might cause potential fan-out and (2) derived metrics need all input measures to be joinable to the entity. Here, we fix that bug by updating the logic to be based on entities, not semantic models.
Those are the main changes in this PR, but reviewing by commit would be helpful to get more granular context.
Coming after this: Query tests for metric filters with single hop and multi-hop joins + ensure DFP is using expected join paths.