-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: selective where filters after join to time spine #1120
Conversation
c17025f
to
ad98912
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is interesting, so our current expected behavior on filters and time spine joins is, basically, the query filter applies before the time spine join for fill nulls, and we only do a secondary restriction to filter out nulls. I think that's right. Two questions:
- What happens with a more deeply nested metric with a filter? Like if I have a filter on
bookings__is_instant
in the definition of an input metric to something that has fill nulls with, what happens? I'm assuming this fits into the group by case for the inner metric, then we fill nulls, then they get collapsed away? It's sort of interesting if the default is 1 instead of zero, or if the outer metric is an average. - What happens to time filters if I do
where metric_time__day = '2024-01-01' and bookings__is_instant
but I only group by metric_time__day? I think that will still over-select, correct? Meaning we'll get all the metric_times but there will be null metric values in the output. Is that right?
--- | ||
integration_test: | ||
name: join_to_time_spine_with_queried_time_constraint | ||
description: Test join to time spine with time constraint that isn't in group by. Should apply constraint twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: that is in the group by
I think the current behavior in this scenario is expected, too. If the input metric filters to
I wouldn't say they get collapsed away. The nulls get filled, and that gets used in the next stage of the query. It is weird if the default is 1, but I'm struggling to think of a scenario where you would want that 🤔 If the outer metric is an average, I think this is pretty normal. In the ratio metric example I mentioned above, the input metric filling nulls would result in 0 on some days, and that's what you want for the average calculation.
There might be null metric values in the output, yes, unless they apply a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this all sounds reasonable and the behavior you describe aligns with what I was expecting, I just wasn't sure if this PR might affect that.
Thanks for making this fix so readable! It's an awful lot of complicated SQL for such a small adjustment. 😭
Resolves SL-1987
Description
When an input metric specifies
join_to_timespine: true
in its YAML definition, we apply special behavior for filters. Before this PR, we were applying all filters twice - before and after the time spine join. This is because you might end up with new rows after the time spine join that should be filtered out. This resulted in a bug sometimes if you filtered by a spec that was not in the group by, since the filter column no longer existed at that level of the query.This fixes that bug by only applying the post-join filter if the filtered spec is also in the group by. The specs in the group by should be the only ones changed after the time spine join, anyway.