Fill nulls for multi-metric queries #850

courtneyholcomb · 2023-11-06T22:23:46Z

Resolves #SL-1173

Description

Updates the logic when filling nulls for input measures. When joining multiple metrics, you may end up with null rows post-join if 1) there are dimensions in the query and 2) one of the metrics has dimension values that the other metric does not have. In that case, if the metric's input measure fills nulls, we want to fill those nulls with the requested value. This PR implements that behavior.
Tangentially, this updates MetricInstance.defined_from to contain a single MetricModelReference instead of multiple. It doesn't seem like there is ever a time we would want multiple (multiple are never used in the codebase), and we need to know just one to get the correct element name.

…of Tuple

github-actions · 2023-11-06T22:24:02Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

tlento

I don't know if this works, but if it does, can we either:

Have an accessor on the metric_lookup object that only returns the immediate measure inputs for the metric in question?
Have an is_derived or similar property that is based on the metric type rather than the number of measure inputs?

I'm concerned about edge case bugs creeping into the rendering here, and it'll be hard to reason about them because the typechecker can't help out at all. Like if I add a new metric type and we have logic that does an assert_values_exhausted if/else block switching over all metric types, that'll fail the typechecker.

tlento · 2023-11-06T23:05:02Z

metricflow/plan_conversion/dataflow_to_sql.py

@@ -706,7 +706,7 @@ def visit_compute_metrics_node(self, node: ComputeMetricsNode) -> SqlDataSet:
            metric_instances.append(
                MetricInstance(
                    associated_columns=(output_column_association,),
-                    defined_from=(MetricModelReference(metric_name=metric_spec.element_name),),
+                    defined_from=MetricModelReference(metric_name=metric_spec.element_name),


LOL, thanks. We have so many of these kind of "let's allow a sequence of things and then have exactly one callsite that manually enforces there's only one" interfaces lying around...

tlento · 2023-11-06T23:05:37Z

metricflow/specs/specs.py

@@ -436,7 +436,7 @@ def from_reference(reference: MetricReference) -> MetricSpec:

    @property
    def alias_spec(self) -> MetricSpec:
-        """Returns a MetricSpec represneting the alias state."""
+        """Returns a MetricSpec representing the alias state."""


tlento · 2023-11-06T23:09:04Z

metricflow/test/integration/test_cases/itest_metrics.yaml

+  group_by_objs: [{"name": "metric_time"}]
+  check_query: |
+    SELECT
+      COALESCE(subq_7.metric_time__day, subq_12.metric_time__day) AS metric_time__day


Are there missing values in this date sequence for the bookings_fill_nulls_with metric? If so that's great. An output test would be nice but at least we can inspect the data frame from here and see it.

If not, it might be worth adding a dimension that has a gap for a given metric_time, dimension_value pair.

Added a new test case & updated the source data to make sure there were some nulls filled for this case.
Will plan to add some output tests tomorrow!

metricflow/plan_conversion/dataflow_to_sql.py

tlento · 2023-11-06T23:28:10Z

metricflow/plan_conversion/dataflow_to_sql.py

+                input_measures = self._metric_lookup.measures_for_metric(
+                    metric_reference=MetricReference(metric_name),
+                    column_association_resolver=self._column_association_resolver,
+                )


Another way to deal with the issue below is to add a different, internal API here that only gives the measures that are direct inputs into the metric. That way, if we add another metric type that takes measures as input anything derived from it won't have anything to do here, and we ensure we only do the coalesce once within a derived metric chain.

I like that idea! I'll add that.

Added that!

courtneyholcomb · 2023-11-07T22:42:08Z

@tlento This should be good to go, with the exception of updating Databricks snapshots. Wrapping that up now - repopulating the source schema is just taking forever.

courtneyholcomb · 2023-11-07T22:44:21Z

@tlento This should be good to go, with the exception of updating Databricks snapshots. Wrapping that up now - repopulating the source schema is just taking forever.

And now done! Kicking off SQL engine tests now

tlento

Let's go! Thank you!

tlento · 2023-11-08T18:17:21Z

metricflow/model/semantics/metric_lookup.py

@@ -105,6 +106,23 @@ def add_metric(self, metric: Metric) -> None:
                )
        self._metrics[metric_reference] = metric

+    def yaml_input_measure_for_metric(self, metric_reference: MetricReference) -> Optional[MetricInputMeasure]:


Nice! Super nitty but can we rename this away from yaml? YAML makes me grumpy. More importantly, there's no guarantee that the input is defined in YAML (although in practice that is how it works today).

Maybe something like direct_input_measure_for_metric or configured_input_measure_for_metric?

tlento · 2023-11-08T18:19:30Z

metricflow/model/semantics/metric_lookup.py

+        """Get input measure defined in the metric YAML, if exists.
+
+        When SemanticModel is constructed, input measures from input metrics are added to the list of input measures
+        for a metric. Here, use rules about metric types to determine which input measures were defined in the YAML:


Then here we can replace YAML with config or something.

tlento · 2023-11-08T18:21:51Z

metricflow/test/integration/test_cases/itest_metrics.yaml

@@ -275,7 +275,7 @@ integration_test:
  check_query: |
    SELECT
      CAST(bookings AS {{ double_data_type_name }}) / NULLIF(views, 0) AS bookings_per_view
-      , groupby_8cbdaa28.ds AS metric_time__day
+      , COALESCE(groupby_8cbdaa28.ds, groupby_68058b0b.ds) AS metric_time__day


Those aliases though 🤣

tlento · 2023-11-08T20:45:01Z

metricflow/test/integration/query_output/test_fill_nulls_with_0.py

+from metricflow.test.snapshot_utils import assert_object_snapshot_equal
+
+
+@pytest.mark.sql_engine_snapshot


These snapshot tests are great, thanks for doing this!

courtneyholcomb added 2 commits November 6, 2023 13:51

Update MetricInstance to take a singular defined_from object instead …

4de5d57

…of Tuple

Spelling fix

5cd869e

cla-bot bot added the cla:yes label Nov 6, 2023

Fill nulls post-metric join for multi-metric queries

e8e1b17

courtneyholcomb force-pushed the court/fill-nulls-multi-metric branch from da2653a to e8e1b17 Compare November 6, 2023 22:57

Changelog

304b730

courtneyholcomb changed the title ~~Update MetricInstance to take a singular defined_from object instead …~~ Fill nulls for multi-metric queries Nov 6, 2023

courtneyholcomb marked this pull request as ready for review November 6, 2023 23:01

courtneyholcomb requested review from tlento, WilliamDee and plypaul November 6, 2023 23:02

tlento reviewed Nov 6, 2023

View reviewed changes

courtneyholcomb added 3 commits November 6, 2023 18:17

Add method to get YAML-defined input measures

6abdf1c

Add new test case & update source data

affea74

Write output tests for filling nulls

b8bc8dc

courtneyholcomb requested a review from tlento November 7, 2023 22:41

Update Databricks snapshots

aa32059

courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 7, 2023

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 7, 2023 22:44 — with GitHub Actions Inactive

courtneyholcomb had a problem deploying to DW_INTEGRATION_TESTS November 7, 2023 22:44 — with GitHub Actions Failure

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 7, 2023 22:44 — with GitHub Actions Inactive

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 7, 2023 22:56 — with GitHub Actions Inactive

courtneyholcomb had a problem deploying to DW_INTEGRATION_TESTS November 7, 2023 22:56 — with GitHub Actions Failure

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 7, 2023 22:56 — with GitHub Actions Inactive

courtneyholcomb added the Reload Test Data in SQL Engines Should be run when test data changes label Nov 7, 2023

courtneyholcomb had a problem deploying to DW_INTEGRATION_TESTS November 7, 2023 23:00 — with GitHub Actions Error

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 7, 2023 23:01 — with GitHub Actions Inactive

courtneyholcomb removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 7, 2023

github-actions bot removed the Reload Test Data in SQL Engines Should be run when test data changes label Nov 7, 2023

courtneyholcomb added Reload Test Data in SQL Engines Should be run when test data changes Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Nov 8, 2023

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 8, 2023 00:52 — with GitHub Actions Inactive

courtneyholcomb had a problem deploying to DW_INTEGRATION_TESTS November 8, 2023 00:52 — with GitHub Actions Failure

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 8, 2023

courtneyholcomb added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Reload Test Data in SQL Engines Should be run when test data changes labels Nov 8, 2023

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 8, 2023 01:10 — with GitHub Actions Inactive

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 8, 2023

tlento approved these changes Nov 8, 2023

View reviewed changes

Rename YAML -> config

6545bc1

courtneyholcomb merged commit dc076a1 into main Nov 8, 2023
8 checks passed

courtneyholcomb deleted the court/fill-nulls-multi-metric branch November 8, 2023 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill nulls for multi-metric queries #850

Fill nulls for multi-metric queries #850

courtneyholcomb commented Nov 6, 2023 •

edited

Loading

github-actions bot commented Nov 6, 2023

tlento left a comment

tlento Nov 6, 2023

tlento Nov 6, 2023

tlento Nov 6, 2023

courtneyholcomb Nov 7, 2023

tlento Nov 6, 2023

courtneyholcomb Nov 7, 2023

courtneyholcomb Nov 7, 2023

courtneyholcomb commented Nov 7, 2023

courtneyholcomb commented Nov 7, 2023

tlento left a comment

tlento Nov 8, 2023

courtneyholcomb Nov 8, 2023

tlento Nov 8, 2023

tlento Nov 8, 2023

tlento Nov 8, 2023

		from metricflow.test.snapshot_utils import assert_object_snapshot_equal


		@pytest.mark.sql_engine_snapshot

Fill nulls for multi-metric queries #850

Fill nulls for multi-metric queries #850

Conversation

courtneyholcomb commented Nov 6, 2023 • edited Loading

Description

github-actions bot commented Nov 6, 2023

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb commented Nov 7, 2023

courtneyholcomb commented Nov 7, 2023

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb commented Nov 6, 2023 •

edited

Loading