-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: querying multiple granularities with offset metrics #1054
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like what we want to happen instead of whatever was happening before, but I do wonder about the invariant handling I mentioned inline - there's an implicit expectation of a granularity relationship between the time spine and the selected columns but right now if that expectation isn't met we'll render a date trunc to a coarser granularity, which doesn't seem like what we really want to do.
Should we just put an assertion in there? Or is there some other mechanism enforcing this?
if time_dimension_spec.time_granularity == time_spine_dim_instance.spec.time_granularity | ||
else SqlDateTruncExpression( | ||
time_granularity=time_dimension_spec.time_granularity, arg=time_spine_column_select_expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies the time_spine_dim_instance.spec.time_granularity will always be equal to or finer (<=
I think) than the time_dimension_spec.time_granularity.
I'm pretty sure that's true at this point, but is it enforced anywhere? I'm not sure what'll happen if we allow different granularities in the time spine, or if we allow the time spine to be DAY while the base granularity moves to MILLISECOND or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I feel like the time spine SHOULD be set with the smallest possible grain in order to fill all granularity periods we support.
In practice, though, it might not be - especially when we transition to supporting smaller grains. Users might need to update their time spines.
I'll add an assertion for now and a comment to explain!
metric_time__month | ||
, SUM(bookings) AS bookings_start_of_month | ||
FROM ( | ||
-- Join to Time Spine Dataset | ||
-- Pass Only Elements: ['bookings', 'metric_time__month', 'metric_time__day'] | ||
SELECT | ||
subq_13.ds AS metric_time__day | ||
, DATE_TRUNC('month', subq_13.ds) AS metric_time__month | ||
, subq_11.bookings AS bookings | ||
FROM ***************************.mf_time_spine subq_13 | ||
INNER JOIN ( | ||
-- Read Elements From Semantic Model 'bookings_source' | ||
-- Metric Time Dimension 'ds' | ||
SELECT | ||
DATE_TRUNC('day', ds) AS metric_time__day | ||
, 1 AS bookings | ||
FROM ***************************.fct_bookings bookings_source_src_28000 | ||
) subq_11 | ||
ON | ||
DATE_TRUNC('month', subq_13.ds) = subq_11.metric_time__day | ||
WHERE DATE_TRUNC('month', subq_13.ds) = subq_13.ds | ||
) subq_15 | ||
WHERE metric_time__day = '2020-01-01' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all feels very convoluted but maybe that's just how we have to do offset to grain calculations? Like if this grouped by metric_time__day I guess we'd drop the where filter and just have a repeated bookings_start_of_month value collected alongside each day.
I'm actually not sure how I'd write the SQL for this to make it less confusing. For this case I'd probably do the filter + aggregation on the month grains and then join to time spine to fill out other granularities, but that doesn't generalize.
Anyway, something for us to think about later - it really feels like there has to be a cleaner way to render this, but maybe there isn't and that's another reason why MetricFlow is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offset_to_grain
ends up with the weirdest edge cases. We might just want to be more restrictive with what we allow for those metrics. It's already so weird when there are multiple instances of metric_time
queried with an offset_to_grain
metric because the where filter that we add by default doesn't totally do its job anymore. And I have no idea why someone would want to make that query. But alas, for now, we can just give the people what they asked for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But alas, for now, we can just give the people what they asked for.
LOL, this is me every time I have to add a feature of any kind.
@@ -21,6 +21,7 @@ def test_offset_to_grain_with_single_granularity( # noqa: D | |||
MetricFlowQueryRequest.create_with_random_request_id( | |||
metric_names=["bookings_at_start_of_month"], | |||
group_by_names=["metric_time__day"], | |||
order_by_names=["metric_time__day"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! Thank you!
@tlento When I added that assertion, it revealed a sneaky bug related to a duplicate variable name. I added an additional commit to fix that bug so you might want to take a look at that before I merge. It's this commit which also updates basic snapshots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaky sneaky variable naming bugs. We probably need a refactor in there to make us less dependent on how we name things, but I don't think it's worth the time to do that today.
Looks good!
@@ -30,7 +30,7 @@ FROM ( | |||
, DATE_TRUNC('month', subq_5.metric_time__day) AS metric_time__month | |||
, DATE_TRUNC('quarter', subq_5.metric_time__day) AS metric_time__quarter | |||
, DATE_TRUNC('year', subq_5.metric_time__day) AS metric_time__year | |||
, EXTRACT(year FROM DATE_TRUNC('day', subq_5.metric_time__day)) AS metric_time__extract_year | |||
, EXTRACT(year FROM subq_5.metric_time__day) AS metric_time__extract_year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow I didn't even notice that.
) | ||
time_spine_dim_instance = TimeDimensionInstance( | ||
defined_from=time_spine_dim_instance.defined_from, | ||
defined_from=original_time_spine_dim_instance.defined_from, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes, this was inside the loop!
Resolves SL-1777
Resolves SL-1810
Resolves #1053
Resolves #1052
Description
Resolves 2 related issues:
metric_time
/agg_time_dimension
, only one of the instances will be offset. All of them should be.metric_time
/agg_time_dimension
but filter by a different one, the query will failThese issues were caused by issues in the
JoinToTimeSpineNode
dataflow to SQL code. The node was only designed to offset one time dimension. It was also filtering out columns prematurely, when that filtering should instead be happening in theFilterElementsNode
. This PR fixes both issues.I would recommend reviewing this PR by commit.