-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQL rendering logic for custom granularities #1408
Conversation
0c12518
to
a01653f
Compare
7ebed0f
to
2a1775f
Compare
a01653f
to
f636c63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions inline. My main worries are the assertions, which maybe are addressed upstack, and the thing about the date_trunc. I think this works as-is, but I'm always skittish about relying on date/time operations behaving sensibly. Let's sync up about it really quickly and if we need changes we can figure out if it's easier to merge this as-is and apply them on top of the stack or if we can fix up here and rebase.
assert ( | ||
not agg_time_grain.is_custom_granularity | ||
), "Custom time granularities are not yet supported for all queries." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because this block is for direct references to agg_time_dimension_instances, and we are currently only supporting this for metric_time? Or is this just not hooked up yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just not hooked up yet!! (agg_time_dimension_instances
here might include metric_time
.) This was more just an error for me to encounter intentionally when working on custom granularity queries that encounter the JoinToTimeSpineNode
or JoinOverTimeRangeNode
, which I wanted to test more thoroughly. Also didn't want users to encounter a more obscure error if they were playing around with this before it was ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - I haven't encountered this in my basic testing thus far, but I have a task up to make sure this gets handled/removed before we finish the production-ready feature.
time_spine_alias = self._next_unique_table_alias() | ||
custom_granularity_name = node.time_dimension_spec.time_granularity.name | ||
time_spine_source = self._get_time_spine_for_custom_granularity(custom_granularity_name) | ||
left_expr_for_join: SqlExpressionNode = SqlColumnReferenceExpression.from_table_and_column_names( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was chatting with @theyostalservice about why we merged nodes instead of keeping them separate and I realized it's partly because this always has to be a left outer join, right? Given this, it's always safe for things like predicate pushdown to push any predicate past this node, since no rows will be filtered out by the internal join here.
More generally, it's probably worth it for us to document the reasoning for deviating from our previous patterns in the PR description before merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good call!
if parent_time_dimension_instance.spec.time_granularity.base_granularity != time_spine_source.base_granularity: | ||
# If needed, apply DATE_TRUNC to parent column match the time spine spine that's column being joined to. | ||
left_expr_for_join = SqlDateTruncExpression.create( | ||
time_granularity=time_spine_source.base_granularity, arg=left_expr_for_join | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain that the parent grain is always finer than the time spine source grain here? Do we need to switch on the .to_int()
cases so we date_trunc to whatever is smaller?
Or does it matter? Maybe we only need to date_trunc parent grain since it'll always resolve to the 00 moment, like 2024-01-01T00:00:00? If so then this is fine but maybe we add a comment or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at this, I don't think this DATE_TRUNC
is ever actually necessary. I think there was just a bug somewhere in my code when I added it that made me think it was necessary.
The reason it's not necessary is that on line 1439, we choose this parent instance BECAUSE it is the instance that matches the base grain.
I tested removing this logic at the top of the stack and nothing change, so seems safe to remove. Will remove it here since I need to add some comments to this PR anyway!
spec=node.time_dimension_spec, | ||
) | ||
time_spine_instance_set = InstanceSet(time_dimension_instances=(time_spine_instance,)) | ||
time_spine_select_columns = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only does one, so if I request metric_time__martian_year
and metric_time__martian_quarter
we'll do multiple joins. I think that's ok for now, we can optimize it later if we feel like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I haven't thought through the scenarios that would make it acceptable to render multiple custom grains from the same join. It seems like Paul's CTE changes would likely handle that. But we'll see!
730cc49
to
39cf91d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Render SQL for custom granularities.
The
JoinToCustomGranularityNode
will build a time spine dataset andLEFT OUTER JOIN
it to the parent dataset, without a subquery. This is done by joining the time spine table directly, instead of creating a subquery that selects from the time spine table and then joins. That makes this the first node that does not create its own subquery. This join should be safe to do without a subquery since it is always aLEFT OUTER JOIN
, ensuring the number of rows will not be changed by the join, making it relatively safe for optimizers to handle.This behavior was discussed with @tlento and agreed upon in order to ensure optimization and readability.
You can see examples of what this SQL will look like in a testing PR higher up the stack.