Bug fix: querying multiple granularities with offset metrics #1054

courtneyholcomb · 2024-02-28T02:13:03Z

Resolves SL-1777
Resolves SL-1810
Resolves #1053
Resolves #1052

Description

Resolves 2 related issues:

If you query a time offset metric with multiple instances of metric_time/agg_time_dimension, only one of the instances will be offset. All of them should be.
If you query a time offset metric with one instance of metric_time/agg_time_dimension but filter by a different one, the query will fail

These issues were caused by issues in the JoinToTimeSpineNode dataflow to SQL code. The node was only designed to offset one time dimension. It was also filtering out columns prematurely, when that filtering should instead be happening in the FilterElementsNode. This PR fixes both issues.

I would recommend reviewing this PR by commit.

The first commit (after changelog) adds tests that show what the SQL previously looked like, as well as the scenarios that trigger an error before SQL can be generated.
Next commit updates the dataflow to SQL logic.
Third commit updates the snapshots so you can clearly see how the logic change impacts the SQL.

linear · 2024-02-28T02:13:06Z

SL-1777 Bug: derived offset metric error when querying with one grain & filtering with another

SL-1810 [Bug] Querying a time offset metric with multiple `metric_time` dimensions returns incorrect results

…shots

tlento

This seems like what we want to happen instead of whatever was happening before, but I do wonder about the invariant handling I mentioned inline - there's an implicit expectation of a granularity relationship between the time spine and the selected columns but right now if that expectation isn't met we'll render a date trunc to a coarser granularity, which doesn't seem like what we really want to do.

Should we just put an assertion in there? Or is there some other mechanism enforcing this?

tlento · 2024-02-29T01:49:35Z

metricflow/plan_conversion/dataflow_to_sql.py

+                if time_dimension_spec.time_granularity == time_spine_dim_instance.spec.time_granularity
+                else SqlDateTruncExpression(
+                    time_granularity=time_dimension_spec.time_granularity, arg=time_spine_column_select_expr


This implies the time_spine_dim_instance.spec.time_granularity will always be equal to or finer (<= I think) than the time_dimension_spec.time_granularity.

I'm pretty sure that's true at this point, but is it enforced anywhere? I'm not sure what'll happen if we allow different granularities in the time spine, or if we allow the time spine to be DAY while the base granularity moves to MILLISECOND or whatever.

Good point! I feel like the time spine SHOULD be set with the smallest possible grain in order to fill all granularity periods we support.
In practice, though, it might not be - especially when we transition to supporting smaller grains. Users might need to update their time spines.
I'll add an assertion for now and a comment to explain!

tlento · 2024-02-29T02:08:10Z

...st_offset_to_grain_metric_filter_and_query_have_different_granularities__plan0_optimized.sql

+    metric_time__month
+    , SUM(bookings) AS bookings_start_of_month
+  FROM (
+    -- Join to Time Spine Dataset
+    -- Pass Only Elements: ['bookings', 'metric_time__month', 'metric_time__day']
+    SELECT
+      subq_13.ds AS metric_time__day
+      , DATE_TRUNC('month', subq_13.ds) AS metric_time__month
+      , subq_11.bookings AS bookings
+    FROM ***************************.mf_time_spine subq_13
+    INNER JOIN (
+      -- Read Elements From Semantic Model 'bookings_source'
+      -- Metric Time Dimension 'ds'
+      SELECT
+        DATE_TRUNC('day', ds) AS metric_time__day
+        , 1 AS bookings
+      FROM ***************************.fct_bookings bookings_source_src_28000
+    ) subq_11
+    ON
+      DATE_TRUNC('month', subq_13.ds) = subq_11.metric_time__day
+    WHERE DATE_TRUNC('month', subq_13.ds) = subq_13.ds
+  ) subq_15
+  WHERE metric_time__day = '2020-01-01'


This all feels very convoluted but maybe that's just how we have to do offset to grain calculations? Like if this grouped by metric_time__day I guess we'd drop the where filter and just have a repeated bookings_start_of_month value collected alongside each day.

I'm actually not sure how I'd write the SQL for this to make it less confusing. For this case I'd probably do the filter + aggregation on the month grains and then join to time spine to fill out other granularities, but that doesn't generalize.

Anyway, something for us to think about later - it really feels like there has to be a cleaner way to render this, but maybe there isn't and that's another reason why MetricFlow is useful.

offset_to_grain ends up with the weirdest edge cases. We might just want to be more restrictive with what we allow for those metrics. It's already so weird when there are multiple instances of metric_time queried with an offset_to_grain metric because the where filter that we add by default doesn't totally do its job anymore. And I have no idea why someone would want to make that query. But alas, for now, we can just give the people what they asked for.

But alas, for now, we can just give the people what they asked for.

LOL, this is me every time I have to add a feature of any kind.

tlento · 2024-02-29T02:11:53Z

metricflow/test/integration/query_output/test_offset_metrics.py

@@ -21,6 +21,7 @@ def test_offset_to_grain_with_single_granularity(  # noqa: D
        MetricFlowQueryRequest.create_with_random_request_id(
            metric_names=["bookings_at_start_of_month"],
            group_by_names=["metric_time__day"],
+            order_by_names=["metric_time__day"],


Yay! Thank you!

courtneyholcomb · 2024-02-29T17:43:47Z

@tlento When I added that assertion, it revealed a sneaky bug related to a duplicate variable name. I added an additional commit to fix that bug so you might want to take a look at that before I merge. It's this commit which also updates basic snapshots.

tlento

Sneaky sneaky variable naming bugs. We probably need a refactor in there to make us less dependent on how we name things, but I don't think it's worth the time to do that today.

Looks good!

tlento · 2024-02-29T19:08:34Z

...ng.py/SqlQueryPlan/DuckDB/test_cumulative_time_offset_metric_with_time_constraint__plan0.sql

@@ -30,7 +30,7 @@ FROM (
            , DATE_TRUNC('month', subq_5.metric_time__day) AS metric_time__month
            , DATE_TRUNC('quarter', subq_5.metric_time__day) AS metric_time__quarter
            , DATE_TRUNC('year', subq_5.metric_time__day) AS metric_time__year
-            , EXTRACT(year FROM DATE_TRUNC('day', subq_5.metric_time__day)) AS metric_time__extract_year
+            , EXTRACT(year FROM subq_5.metric_time__day) AS metric_time__extract_year


Oh wow I didn't even notice that.

tlento · 2024-02-29T19:10:55Z

metricflow/plan_conversion/dataflow_to_sql.py

            )
            time_spine_dim_instance = TimeDimensionInstance(
-                defined_from=time_spine_dim_instance.defined_from,
+                defined_from=original_time_spine_dim_instance.defined_from,


Yikes, this was inside the loop!

Changelog

043d194

cla-bot bot added the cla:yes label Feb 28, 2024

courtneyholcomb added 5 commits February 27, 2024 18:19

Write tests for existing functionality (some failing) & generate snap…

6a39795

…shots

Update dataflow to SQL to allow multiple agg_time_dimensions

026b382

Generate snapshots with changes

df5e0d9

Add order_by to output tests to ensure consistency

f4c9abd

Update SQL engine snapshots

3f7d816

courtneyholcomb marked this pull request as ready for review February 28, 2024 05:43

courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Feb 28, 2024

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS February 28, 2024 05:44 — with GitHub Actions Inactive

courtneyholcomb requested review from tlento and plypaul February 28, 2024 05:44

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Feb 28, 2024

Merge main

9ea7a27

tlento approved these changes Feb 29, 2024

View reviewed changes

courtneyholcomb added 2 commits February 29, 2024 08:59

Merge main

894b540

Add assertion & fix variable name bug

dc60c11

courtneyholcomb requested a review from tlento February 29, 2024 17:44

Update SQL engine snapshots

89d2c2c

tlento approved these changes Feb 29, 2024

View reviewed changes

courtneyholcomb merged commit ce7f194 into main Feb 29, 2024
9 checks passed

courtneyholcomb deleted the court/offset-multiple-grain branch February 29, 2024 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: querying multiple granularities with offset metrics #1054

Bug fix: querying multiple granularities with offset metrics #1054

courtneyholcomb commented Feb 28, 2024 •

edited

Loading

linear bot commented Feb 28, 2024

tlento left a comment

tlento Feb 29, 2024

courtneyholcomb Feb 29, 2024

tlento Feb 29, 2024

courtneyholcomb Feb 29, 2024

tlento Feb 29, 2024

tlento Feb 29, 2024

courtneyholcomb commented Feb 29, 2024 •

edited

Loading

tlento left a comment

tlento Feb 29, 2024

tlento Feb 29, 2024

Bug fix: querying multiple granularities with offset metrics #1054

Bug fix: querying multiple granularities with offset metrics #1054

Conversation

courtneyholcomb commented Feb 28, 2024 • edited Loading

Description

linear bot commented Feb 28, 2024

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb commented Feb 29, 2024 • edited Loading

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb commented Feb 28, 2024 •

edited

Loading

courtneyholcomb commented Feb 29, 2024 •

edited

Loading