-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect cumulative metric output when start-time and end-time are used #871
Conversation
Cumulative metric queries are very difficult to reason about, especially when we do automated date sliding to include all of the necessary input data. This commit adds some output tests, which can be compared against the input data config files for fct_bookings and fct_revenue for correctness.
When users submit CLI queries with the --start-time and --end-time parameters, MetricFlow will automatically adjust the time constraint window to include all relevant cumulative metric input. Unfortunately, we got over-zealous with fencepost adjustments and introduced an off-by-one error that meant cumulative metric queries would only be correct when the specified granularity was DAILY. We never caught this issue because the error in the rendered SQL is subtle, and all of our test input data on the integration tests happened to fit within the artificially shortened constraint windows in all of the cases where we were testing non-daily granularity queries. This change updates the integration test configurations to ensure boundary conditions will be caught in the future. In addition, the fix alters the values of the query output snapshot tests, which, when compared with the relevant input test data, will reveal the nature of the original bug and the effects of this fix.
@@ -80,7 +80,7 @@ def adjust_time_constraint_for_cumulative_metric( | |||
) -> TimeRangeConstraint: | |||
"""Given a time constraint for the overall query, adjust it to cover the time range for this metric.""" | |||
if granularity is not None: | |||
return self._adjust_time_constraint_start_by_window(granularity, count - 1) | |||
return self._adjust_time_constraint_start_by_window(granularity, count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix.
@@ -6,7 +6,7 @@ integration_test: | |||
metrics: ["trailing_2_months_revenue"] | |||
group_bys: ["metric_time"] | |||
order_bys: ["metric_time"] | |||
time_constraint: ["2020-01-05", "2021-01-04"] | |||
time_constraint: ["2020-03-05", "2021-01-04"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left these tests in place for a couple of reasons, but one is to illustrate how the constraint adjustments need to be applied here vs in the where clause.
A number of these were producing correct results on inspection of input data only because the input data didn't start until within the too-short filter window.
1 2021-01-04 2000 | ||
2 2021-01-05 2000 | ||
3 2021-01-06 2000 | ||
0 2021-01-03 3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These updated numbers are correct, while the old ones were missing inputs. See https://github.com/dbt-labs/metricflow/blob/main/metricflow/test/fixtures/source_table_snapshots/simple_model/fct_revenue.yaml#L15-L18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find 🙏 🙏 Thank you for fixing!!
When users submit CLI queries with the --start-time and --end-time
parameters, MetricFlow will automatically adjust the time constraint
window to include all relevant cumulative metric input.
Unfortunately, we got over-zealous with fencepost adjustments and
introduced an off-by-one error that meant cumulative metric
queries would only be correct when the specified granularity was
DAILY.
We never caught this issue because the error in the rendered SQL is
subtle, and all of our test input data on the integration tests
happened to fit within the artificially shortened constraint windows
in all of the cases where we were testing non-daily granularity
queries.
This change updates the integration test configurations to ensure
boundary conditions will be caught in the future. In addition, the
fix alters the values of the query output snapshot tests, which,
when compared with the relevant input test data, will reveal the
nature of the original bug and the effects of this fix.