Fix incorrect cumulative metric output when start-time and end-time are used #871

tlento · 2023-11-15T01:12:25Z

When users submit CLI queries with the --start-time and --end-time
parameters, MetricFlow will automatically adjust the time constraint
window to include all relevant cumulative metric input.

Unfortunately, we got over-zealous with fencepost adjustments and
introduced an off-by-one error that meant cumulative metric
queries would only be correct when the specified granularity was
DAILY.

We never caught this issue because the error in the rendered SQL is
subtle, and all of our test input data on the integration tests
happened to fit within the artificially shortened constraint windows
in all of the cases where we were testing non-daily granularity
queries.

This change updates the integration test configurations to ensure
boundary conditions will be caught in the future. In addition, the
fix alters the values of the query output snapshot tests, which,
when compared with the relevant input test data, will reveal the
nature of the original bug and the effects of this fix.

Cumulative metric queries are very difficult to reason about, especially when we do automated date sliding to include all of the necessary input data. This commit adds some output tests, which can be compared against the input data config files for fct_bookings and fct_revenue for correctness.

When users submit CLI queries with the --start-time and --end-time parameters, MetricFlow will automatically adjust the time constraint window to include all relevant cumulative metric input. Unfortunately, we got over-zealous with fencepost adjustments and introduced an off-by-one error that meant cumulative metric queries would only be correct when the specified granularity was DAILY. We never caught this issue because the error in the rendered SQL is subtle, and all of our test input data on the integration tests happened to fit within the artificially shortened constraint windows in all of the cases where we were testing non-daily granularity queries. This change updates the integration test configurations to ensure boundary conditions will be caught in the future. In addition, the fix alters the values of the query output snapshot tests, which, when compared with the relevant input test data, will reveal the nature of the original bug and the effects of this fix.

tlento · 2023-11-15T01:28:59Z

metricflow/filters/time_constraint.py

@@ -80,7 +80,7 @@ def adjust_time_constraint_for_cumulative_metric(
    ) -> TimeRangeConstraint:
        """Given a time constraint for the overall query, adjust it to cover the time range for this metric."""
        if granularity is not None:
-            return self._adjust_time_constraint_start_by_window(granularity, count - 1)
+            return self._adjust_time_constraint_start_by_window(granularity, count)


This is the actual fix.

tlento · 2023-11-15T01:30:18Z

metricflow/test/integration/test_cases/itest_cumulative_metric.yaml

@@ -6,7 +6,7 @@ integration_test:
  metrics: ["trailing_2_months_revenue"]
  group_bys: ["metric_time"]
  order_bys: ["metric_time"]
-  time_constraint: ["2020-01-05", "2021-01-04"]
+  time_constraint: ["2020-03-05", "2021-01-04"]


I left these tests in place for a couple of reasons, but one is to illustrate how the constraint adjustments need to be applied here vs in the where clause.

A number of these were producing correct results on inspection of input data only because the input data didn't start until within the too-short filter window.

tlento · 2023-11-15T01:31:45Z

...st_cumulative_metrics.py/str/BigQuery/test_grain_to_date_cumulative_metric__query_output.txt

-1       2021-01-04         2000
-2       2021-01-05         2000
-3       2021-01-06         2000
+0       2021-01-03         3000


These updated numbers are correct, while the old ones were missing inputs. See https://github.com/dbt-labs/metricflow/blob/main/metricflow/test/fixtures/source_table_snapshots/simple_model/fct_revenue.yaml#L15-L18

courtneyholcomb

Nice find 🙏 🙏 Thank you for fixing!!

tlento added 2 commits November 14, 2023 17:10

cla-bot bot added the cla:yes label Nov 15, 2023

tlento commented Nov 15, 2023

View reviewed changes

tlento requested review from plypaul and courtneyholcomb November 15, 2023 01:32

tlento linked an issue Nov 15, 2023 that may be closed by this pull request

[Bug] --start-time / --end-time with cumulative windows broken #869

Closed

2 tasks

courtneyholcomb approved these changes Nov 15, 2023

View reviewed changes

tlento merged commit 8e69cda into main Nov 15, 2023
22 checks passed

tlento deleted the fix-incorrect-cumulative-metric-output branch November 15, 2023 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect cumulative metric output when start-time and end-time are used #871

Fix incorrect cumulative metric output when start-time and end-time are used #871

tlento commented Nov 15, 2023 •

edited

Loading

tlento Nov 15, 2023

tlento Nov 15, 2023

tlento Nov 15, 2023

courtneyholcomb left a comment

Fix incorrect cumulative metric output when start-time and end-time are used #871

Fix incorrect cumulative metric output when start-time and end-time are used #871

Conversation

tlento commented Nov 15, 2023 • edited Loading

tlento Nov 15, 2023

Choose a reason for hiding this comment

tlento Nov 15, 2023

Choose a reason for hiding this comment

tlento Nov 15, 2023

Choose a reason for hiding this comment

courtneyholcomb left a comment

Choose a reason for hiding this comment

tlento commented Nov 15, 2023 •

edited

Loading