Group by resolution for group by metrics #1139

courtneyholcomb · 2024-04-15T21:09:50Z

Description

Refactor group by resolution for LinkableMetrics. This gets more explicit about what the join paths will should look like for LinkableMetrics and updates that logic accordingly. LinkableMetrics are different from other linkable elements for a couple of reasons because there is no such thing as a "local" LinkableMetric. There is always a join required to get to the metric subquery. To make this clear, here we reclassify LinkableMetrics as either single-hop or muti-hop, where previously they were considered either local or single-hop.

There was also a bug here related to how we resolved what linkable metrics are available. Previously, the code assumed that if a metric could be queried with one entity in a semantic model, it could be queried with any metric in that semantic model, but that's not true because (1) some entities might cause potential fan-out and (2) derived metrics need all input measures to be joinable to the entity. Here, we fix that bug by updating the logic to be based on entities, not semantic models.

Those are the main changes in this PR, but reviewing by commit would be helpful to get more granular context.

Coming after this: Query tests for metric filters with single hop and multi-hop joins + ensure DFP is using expected join paths.

github-actions · 2024-04-15T21:10:06Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

tlento

Overall this change seems reasonable but I don't understand why the snapshot outputs are changing so drastically. It looks like we were cutting off a lot of multi-hop joins as well. (Edit to add - actually it looks like maybe we're allowing fanout implicitly in the first step of a multi-hop entity join leading to a metric join.)

What worries me about this, apart from my general driving need to understand how things work, is that the multi-hop case might be overly broad. For example, do we have any way of asserting that we are, in fact, excluding intermediate fanout join steps?

tlento · 2024-04-17T05:07:51Z

.changes/unreleased/Fixes-20240416-202600.yaml

+kind: Fixes
+body: Refactor group by resolution for group by metrics.


This is almost absurdly nitty, but maybe this should be Under the Hood? Or else have some indication of what this fixes?

tlento · 2024-04-17T05:15:31Z

metricflow/model/semantics/linkable_spec_resolver.py

    ) -> Sequence[SemanticModel]:
        # May switch to non-cached implementation.
-        semantic_models = self._entity_to_semantic_model[entity_reference.element_name]
+        semantic_models = self._entity_to_semantic_model[join_by_entity_reference.element_name]


Unrelated to your changes, but I'm baffled by that comment. Why on earth would we do that? I don't know, but clearly it was on our minds at some point!

tlento · 2024-04-17T05:21:45Z

metricflow/model/semantics/linkable_spec_resolver.py

+        self, semantic_model: SemanticModel, using_join_path: Optional[SemanticModelJoinPath] = None
+    ) -> LinkableElementSet:
+        """Get the set of linkable metrics that can be joined to this semantic model."""
+        properties = frozenset({LinkableElementProperty.METRIC, LinkableElementProperty.JOINED})


JOINED is new, right? Seems appropriate.

tlento · 2024-04-17T05:25:33Z

metricflow/model/semantics/linkable_spec_resolver.py

@@ -1064,12 +1062,12 @@ def create_linkable_element_set_from_join_path(

        for entity in semantic_model.entities:
            # Avoid creating "booking_id__booking_id"
-            if entity.reference != entity_links[-1]:
+            if entity.reference != join_path.last_entity_link:


LOL, love this change.

tlento · 2024-04-17T05:27:42Z

metricflow/model/semantics/linkable_spec_resolver.py

-            # Avoid creating "booking_id__booking_id"
-            if entity.reference != join_path.last_entity_link:
+            # Avoid creating a cycle
+            if entity.reference not in join_path.entity_links:


Huh. Did this work before because we have a MAX_JOIN_HOPS of 2 at the moment? Or are we cleaning up some edge cases around existing multi-hop joins here?

The first option! I don't think there was any behavior change here due to our join hop limit.

tlento · 2024-04-17T05:43:12Z

metricflow/model/semantics/linkable_spec_resolver.py

+        for _ in range(self._max_entity_links):
+            next_semantic_models_and_join_paths: List[Tuple[SemanticModel, Optional[SemanticModelJoinPath]]] = []
+            for semantic_model, join_path in semantic_models_and_join_paths:
+                new_linkable_metrics = self.get_metrics_directly_joinable_to_semantic_model(
+                    semantic_model=semantic_model, using_join_path=join_path
+                )
+                linkable_metrics = LinkableElementSet.merge_by_path_key([linkable_metrics, new_linkable_metrics])
+
+                # Get next set of semantic models & join paths to iterate through.
+                for entity in semantic_model.entities:
+                    for next_semantic_model in self._entity_to_semantic_model[entity.name]:
+                        if next_semantic_model.name == semantic_model.name or (
+                            join_path and entity in join_path.entity_links
+                        ):
+                            continue
+                        join_path_elements = join_path.path_elements if join_path else ()
+                        new_join_path_element = SemanticModelJoinPathElement(
+                            semantic_model_reference=next_semantic_model.reference, join_on_entity=entity.reference
+                        )
+                        next_semantic_models_and_join_paths.append(
+                            (next_semantic_model, SemanticModelJoinPath(join_path_elements + (new_join_path_element,)))
+                        )
+            semantic_models_and_join_paths = next_semantic_models_and_join_paths
+
+        return linkable_metrics


My head hurts. Why do joins gotta be so complicaaaaaated........

tlento · 2024-04-17T05:44:05Z

metricflow/model/semantics/linkable_spec_resolver.py

-        """Given the current path, generate the respective linkable elements from the last semantic model in the path."""
+        """Given the current path, generate the respective linkable elements from the last semantic model in the path.
+
+        Does not include LinkableMetrics.


Why not? Would be good to explain that here.

tlento · 2024-04-17T05:50:03Z

tests/model/test_semantic_model_container.py

+    expected_single_hop_metrics = [
+        linkable_metric
+        for linkable_metrics in metric_lookup._linkable_spec_resolver.get_metrics_directly_joinable_to_semantic_model(
+            semantic_model=measure_semantic_model
+        ).path_key_to_linkable_metrics.values()
+        for linkable_metric in linkable_metrics
+    ]
+
+    for expected_metric in expected_single_hop_metrics:
+        assert expected_metric in actual_metrics
+        assert expected_metric not in already_seen
+        already_seen.add(expected_metric)
+        actual_metrics.remove(expected_metric)
+
+    expected_multi_hop_metrics = []
+    for entity in measure_semantic_model.entities:
+        next_semantic_models = metric_lookup._linkable_spec_resolver._entity_to_semantic_model[entity.name]
+        for next_semantic_model in next_semantic_models:
+            if next_semantic_model.name == measure_semantic_model.name:
+                continue
+            join_path = SemanticModelJoinPath.from_single_element(
+                semantic_model_reference=next_semantic_model.reference, join_on_entity=entity.reference
+            )
+            for (
+                linkable_metrics
+            ) in metric_lookup._linkable_spec_resolver.get_metrics_directly_joinable_to_semantic_model(
+                semantic_model=next_semantic_model,
+                using_join_path=join_path,
+            ).path_key_to_linkable_metrics.values():
+                for linkable_metric in linkable_metrics:
+                    expected_multi_hop_metrics.append(linkable_metric)


This test is too closely coupled to the existing implementation of the LinkableSpecResolver for my liking.

I don't have a great answer for how to go about asserting these boundaries in a more implementation-agnostic way. The only thing I can come up with is to create a semantic model just for this (or to find one that is suitably simple, the multi-hop model might work) and essentially hand-craft the expected output and assert against that.

This would be more robust to the inevitable refactoring (or replacement) awaiting the LinkableSpecResolver, so I think ti's probably worth doing something at this point.

Alternatively we can probably remove this test or replace it with yet another snapshot entry.

...t_semantic_model_container.py/tuple/test_linkable_elements_for_no_metrics_query__result0.txt

tlento · 2024-04-17T06:27:26Z

metricflow/model/semantics/linkable_spec_resolver.py

+                for entity in semantic_model.entities:
+                    for next_semantic_model in self._entity_to_semantic_model[entity.name]:
+                        if next_semantic_model.name == semantic_model.name or (
+                            join_path and entity in join_path.entity_links
+                        ):
+                            continue
+                        join_path_elements = join_path.path_elements if join_path else ()
+                        new_join_path_element = SemanticModelJoinPathElement(
+                            semantic_model_reference=next_semantic_model.reference, join_on_entity=entity.reference
+                        )
+                        next_semantic_models_and_join_paths.append(
+                            (next_semantic_model, SemanticModelJoinPath(join_path_elements + (new_join_path_element,)))
+                        )


Does this open the door to fanout on the intermediate step? Like I have a measure and that joins to the foreign listing entity via account in some other semantic model, and then we end up with account__listing__some_listing_metric but there's fanout between account and listing so our metrics aren't necessarily correct.

I don't see any checks for join validity within the entity block here.

Yeah you're right - I'll go back and fix that.

My brain realized this problem just when I was trying to fall asleep last night 🙃

courtneyholcomb · 2024-04-23T03:43:16Z

@tlento this is ready for re-review!
You already reviewed some of the commits - new commits start at Move setup dict to avoid circular logic error if you want to start there.

…ancies

…specific group by resolution

…antics)

tlento

Ok this seems reasonable but I know you've got a monster update coming.

It'd be good to have the final PR with the change explain what we expect to see in the snapshot updates based on what's changing in the logic. It's really hard for me to visualize what's happening inside of the LinkableSpecResolver in general, so it'll be good to have some documentation in the PR description around what we're expecting to see and why this change is happening.

tlento · 2024-04-24T20:49:29Z

metricflow/model/semantics/linkable_spec_resolver.py

@@ -544,6 +544,7 @@ def __init__(
                self._entity_to_semantic_model[entity.reference.element_name].append(semantic_model)

        self._metric_to_linkable_element_sets: Dict[str, List[LinkableElementSet]] = {}
+        self._joinable_metrics_for_entities: Dict[EntityReference, Set[MetricReference]] = defaultdict(set)


I don't understand why the location of this matters at all but if this is causing circular logic errors that's a good sign that we've put waaaaaay too much code in this initializer. Maybe we'll get rid of this class before we need to refactor this....

tlento · 2024-04-29T18:13:28Z

metricflow/model/semantics/linkable_spec_resolver.py

+            if using_join_path and entity_reference in using_join_path.entity_links:
+                continue


Is this because the there's an extra hop and this entity points through that path, so we skip the direct join to the metric? Might be worth a clarification, or maybe it's possible to illustrate this behavior in a test somehow.

We could do this with a variable name, like:

is_multi_hop_entity = using_join_path and entity_reference in using_join_path.entity_links if is_multi_hop_entity: continue

tlento · 2024-04-30T00:18:01Z

metricflow-semantics/metricflow_semantics/model/semantics/linkable_spec_resolver.py

@@ -521,17 +500,11 @@ def _get_linkable_element_set_for_measure(
        measure_semantic_model = self._get_semantic_model_for_measure(measure_reference)

        elements_in_semantic_model = self._get_elements_in_semantic_model(measure_semantic_model)
-        metrics_linked_to_semantic_model = self._get_joinable_metrics_for_semantic_model(measure_semantic_model)


Oh ok this has been moved into get_joined_elements. That makes sense.

cla-bot bot added the cla:yes label Apr 15, 2024

courtneyholcomb force-pushed the court/gb-reso-metric branch 2 times, most recently from b9e443f to 04a90a6 Compare April 17, 2024 03:15

courtneyholcomb marked this pull request as ready for review April 17, 2024 03:27

courtneyholcomb requested review from tlento and plypaul April 17, 2024 03:27

tlento reviewed Apr 17, 2024

View reviewed changes

courtneyholcomb force-pushed the court/gb-reso-metric branch 3 times, most recently from 8108f1c to 96408a8 Compare April 23, 2024 03:20

courtneyholcomb requested a review from tlento April 23, 2024 03:43

courtneyholcomb added 17 commits April 29, 2024 15:23

Add helper method SemanticModelJoinPath.from_single_element()

191ba8c

Add MetricSubqueryJoinPath class

875229f

Rename method to specify 'valid' join (as opposed to fan-out join)

37df848

Update LinkableMetric to use MetricSubqueryJoinPath and remove redund…

6360c7b

…ancies

Map joinable metrics to entities instead of semantic models for more …

0c0298a

…specific group by resolution

Use helper properties

ff9bcef

Get more specific about avoiding cycle creation

8aba0c1

Fix bug in snapshot generation

6377cf4

Changelog

0fdfc36

Move setup dict to avoid circular logic error

a04da70

Privatize method

d71be14

Set JOINED and MULTI_HOP properties from join_path length (cleanup)

008e674

Move logic for directly linkable metrics to _get_joined_elements (sem…

91c933b

…antics)

Update snapshots

1e6103c

Add test with multi-hop join manifest

5ace993

Update documentation

668d23f

Update changelog to be more specific

73976e3

courtneyholcomb added 3 commits April 29, 2024 16:38

Add test for filtering by the same metric you're querying

bcdcd87

Update SQL engine snapshots

6f8b6c4

Update tests broken during rebase

a5e405e

courtneyholcomb force-pushed the court/gb-reso-metric branch from 468a7d9 to a5e405e Compare April 30, 2024 00:09

Remove comment

92b5212

tlento reviewed Apr 30, 2024

View reviewed changes

courtneyholcomb closed this Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group by resolution for group by metrics #1139

Group by resolution for group by metrics #1139

courtneyholcomb commented Apr 15, 2024 •

edited

Loading

github-actions bot commented Apr 15, 2024

tlento left a comment •

edited

Loading

tlento Apr 17, 2024

tlento Apr 17, 2024

tlento Apr 17, 2024

tlento Apr 17, 2024

tlento Apr 17, 2024

courtneyholcomb Apr 17, 2024

tlento Apr 17, 2024

tlento Apr 17, 2024

tlento Apr 17, 2024

tlento Apr 17, 2024

courtneyholcomb Apr 17, 2024

courtneyholcomb Apr 17, 2024

courtneyholcomb commented Apr 23, 2024

tlento left a comment

tlento Apr 24, 2024

tlento Apr 29, 2024

tlento Apr 30, 2024

		kind: Fixes
		body: Refactor group by resolution for group by metrics.

		if using_join_path and entity_reference in using_join_path.entity_links:
		continue

Group by resolution for group by metrics #1139

Group by resolution for group by metrics #1139

Conversation

courtneyholcomb commented Apr 15, 2024 • edited Loading

Description

github-actions bot commented Apr 15, 2024

tlento left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb commented Apr 23, 2024

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb commented Apr 15, 2024 •

edited

Loading

tlento left a comment •

edited

Loading