Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL rendering logic for custom granularities #1408

Merged
merged 2 commits into from
Sep 24, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Sep 18, 2024

Render SQL for custom granularities.
The JoinToCustomGranularityNode will build a time spine dataset and LEFT OUTER JOIN it to the parent dataset, without a subquery. This is done by joining the time spine table directly, instead of creating a subquery that selects from the time spine table and then joins. That makes this the first node that does not create its own subquery. This join should be safe to do without a subquery since it is always a LEFT OUTER JOIN, ensuring the number of rows will not be changed by the join, making it relatively safe for optimizers to handle.
This behavior was discussed with @tlento and agreed upon in order to ensure optimization and readability.
You can see examples of what this SQL will look like in a testing PR higher up the stack.

@cla-bot cla-bot bot added the cla:yes label Sep 18, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Sep 18, 2024
@courtneyholcomb courtneyholcomb marked this pull request as ready for review September 18, 2024 21:48
@courtneyholcomb courtneyholcomb force-pushed the court/custom-grain-dfp-4 branch 2 times, most recently from 7ebed0f to 2a1775f Compare September 19, 2024 23:56
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions inline. My main worries are the assertions, which maybe are addressed upstack, and the thing about the date_trunc. I think this works as-is, but I'm always skittish about relying on date/time operations behaving sensibly. Let's sync up about it really quickly and if we need changes we can figure out if it's easier to merge this as-is and apply them on top of the stack or if we can fix up here and rebase.

Comment on lines +269 to +271
assert (
not agg_time_grain.is_custom_granularity
), "Custom time granularities are not yet supported for all queries."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because this block is for direct references to agg_time_dimension_instances, and we are currently only supporting this for metric_time? Or is this just not hooked up yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just not hooked up yet!! (agg_time_dimension_instances here might include metric_time.) This was more just an error for me to encounter intentionally when working on custom granularity queries that encounter the JoinToTimeSpineNode or JoinOverTimeRangeNode, which I wanted to test more thoroughly. Also didn't want users to encounter a more obscure error if they were playing around with this before it was ready.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - I haven't encountered this in my basic testing thus far, but I have a task up to make sure this gets handled/removed before we finish the production-ready feature.

time_spine_alias = self._next_unique_table_alias()
custom_granularity_name = node.time_dimension_spec.time_granularity.name
time_spine_source = self._get_time_spine_for_custom_granularity(custom_granularity_name)
left_expr_for_join: SqlExpressionNode = SqlColumnReferenceExpression.from_table_and_column_names(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was chatting with @theyostalservice about why we merged nodes instead of keeping them separate and I realized it's partly because this always has to be a left outer join, right? Given this, it's always safe for things like predicate pushdown to push any predicate past this node, since no rows will be filtered out by the internal join here.

More generally, it's probably worth it for us to document the reasoning for deviating from our previous patterns in the PR description before merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call!

Comment on lines 1454 to 1458
if parent_time_dimension_instance.spec.time_granularity.base_granularity != time_spine_source.base_granularity:
# If needed, apply DATE_TRUNC to parent column match the time spine spine that's column being joined to.
left_expr_for_join = SqlDateTruncExpression.create(
time_granularity=time_spine_source.base_granularity, arg=left_expr_for_join
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain that the parent grain is always finer than the time spine source grain here? Do we need to switch on the .to_int() cases so we date_trunc to whatever is smaller?

Or does it matter? Maybe we only need to date_trunc parent grain since it'll always resolve to the 00 moment, like 2024-01-01T00:00:00? If so then this is fine but maybe we add a comment or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at this, I don't think this DATE_TRUNC is ever actually necessary. I think there was just a bug somewhere in my code when I added it that made me think it was necessary.
The reason it's not necessary is that on line 1439, we choose this parent instance BECAUSE it is the instance that matches the base grain.
I tested removing this logic at the top of the stack and nothing change, so seems safe to remove. Will remove it here since I need to add some comments to this PR anyway!

spec=node.time_dimension_spec,
)
time_spine_instance_set = InstanceSet(time_dimension_instances=(time_spine_instance,))
time_spine_select_columns = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only does one, so if I request metric_time__martian_year and metric_time__martian_quarter we'll do multiple joins. I think that's ok for now, we can optimize it later if we feel like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I haven't thought through the scenarios that would make it acceptable to render multiple custom grains from the same join. It seems like Paul's CTE changes would likely handle that. But we'll see!

Base automatically changed from court/custom-grain-dfp-4 to main September 24, 2024 13:15
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sure did forget to approve this one....

ted lasso oops

@courtneyholcomb courtneyholcomb merged commit 51664a5 into main Sep 24, 2024
15 checks passed
@courtneyholcomb courtneyholcomb deleted the court/custom-grain-dfp-5 branch September 24, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants