Add SQL rendering logic for custom granularities #1408

courtneyholcomb · 2024-09-18T20:49:00Z

Render SQL for custom granularities.
The JoinToCustomGranularityNode will build a time spine dataset and LEFT OUTER JOIN it to the parent dataset, without a subquery. This is done by joining the time spine table directly, instead of creating a subquery that selects from the time spine table and then joins. That makes this the first node that does not create its own subquery. This join should be safe to do without a subquery since it is always a LEFT OUTER JOIN, ensuring the number of rows will not be changed by the join, making it relatively safe for optimizers to handle.
This behavior was discussed with @tlento and agreed upon in order to ensure optimization and readability.
You can see examples of what this SQL will look like in a testing PR higher up the stack.

tlento

I have a few questions inline. My main worries are the assertions, which maybe are addressed upstack, and the thing about the date_trunc. I think this works as-is, but I'm always skittish about relying on date/time operations behaving sensibly. Let's sync up about it really quickly and if we need changes we can figure out if it's easier to merge this as-is and apply them on top of the stack or if we can fix up here and rebase.

tlento · 2024-09-23T23:36:39Z

metricflow/plan_conversion/dataflow_to_sql.py

+            assert (
+                not agg_time_grain.is_custom_granularity
+            ), "Custom time granularities are not yet supported for all queries."


Is this because this block is for direct references to agg_time_dimension_instances, and we are currently only supporting this for metric_time? Or is this just not hooked up yet?

Just not hooked up yet!! (agg_time_dimension_instances here might include metric_time.) This was more just an error for me to encounter intentionally when working on custom granularity queries that encounter the JoinToTimeSpineNode or JoinOverTimeRangeNode, which I wanted to test more thoroughly. Also didn't want users to encounter a more obscure error if they were playing around with this before it was ready.

Also - I haven't encountered this in my basic testing thus far, but I have a task up to make sure this gets handled/removed before we finish the production-ready feature.

tlento · 2024-09-23T23:44:23Z

metricflow/plan_conversion/dataflow_to_sql.py

+        time_spine_alias = self._next_unique_table_alias()
+        custom_granularity_name = node.time_dimension_spec.time_granularity.name
+        time_spine_source = self._get_time_spine_for_custom_granularity(custom_granularity_name)
+        left_expr_for_join: SqlExpressionNode = SqlColumnReferenceExpression.from_table_and_column_names(


Was chatting with @theyostalservice about why we merged nodes instead of keeping them separate and I realized it's partly because this always has to be a left outer join, right? Given this, it's always safe for things like predicate pushdown to push any predicate past this node, since no rows will be filtered out by the internal join here.

More generally, it's probably worth it for us to document the reasoning for deviating from our previous patterns in the PR description before merge.

Yes, good call!

tlento · 2024-09-23T23:47:38Z

metricflow/plan_conversion/dataflow_to_sql.py

+        if parent_time_dimension_instance.spec.time_granularity.base_granularity != time_spine_source.base_granularity:
+            # If needed, apply DATE_TRUNC to parent column match the time spine spine that's column being joined to.
+            left_expr_for_join = SqlDateTruncExpression.create(
+                time_granularity=time_spine_source.base_granularity, arg=left_expr_for_join
+            )


Are we certain that the parent grain is always finer than the time spine source grain here? Do we need to switch on the .to_int() cases so we date_trunc to whatever is smaller?

Or does it matter? Maybe we only need to date_trunc parent grain since it'll always resolve to the 00 moment, like 2024-01-01T00:00:00? If so then this is fine but maybe we add a comment or something?

Looking back at this, I don't think this DATE_TRUNC is ever actually necessary. I think there was just a bug somewhere in my code when I added it that made me think it was necessary.
The reason it's not necessary is that on line 1439, we choose this parent instance BECAUSE it is the instance that matches the base grain.
I tested removing this logic at the top of the stack and nothing change, so seems safe to remove. Will remove it here since I need to add some comments to this PR anyway!

tlento · 2024-09-23T23:57:13Z

metricflow/plan_conversion/dataflow_to_sql.py

+            spec=node.time_dimension_spec,
+        )
+        time_spine_instance_set = InstanceSet(time_dimension_instances=(time_spine_instance,))
+        time_spine_select_columns = (


This only does one, so if I request metric_time__martian_year and metric_time__martian_quarter we'll do multiple joins. I think that's ok for now, we can optimize it later if we feel like it.

Yep. I haven't thought through the scenarios that would make it acceptable to render multiple custom grains from the same join. It seems like Paul's CTE changes would likely handle that. But we'll see!

tlento

I sure did forget to approve this one....

cla-bot bot added the cla:yes label Sep 18, 2024

courtneyholcomb added the Skip Changelog label Sep 18, 2024

dbt-labs deleted a comment from github-actions bot Sep 18, 2024

courtneyholcomb requested review from tlento and plypaul September 18, 2024 21:48

courtneyholcomb marked this pull request as ready for review September 18, 2024 21:48

courtneyholcomb force-pushed the court/custom-grain-dfp-5 branch from 0c12518 to a01653f Compare September 18, 2024 22:17

courtneyholcomb force-pushed the court/custom-grain-dfp-4 branch 2 times, most recently from 7ebed0f to 2a1775f Compare September 19, 2024 23:56

courtneyholcomb force-pushed the court/custom-grain-dfp-5 branch from a01653f to f636c63 Compare September 19, 2024 23:57

tlento reviewed Sep 23, 2024

View reviewed changes

Base automatically changed from court/custom-grain-dfp-4 to main September 24, 2024 13:15

courtneyholcomb added 2 commits September 24, 2024 06:46

Add SQL rendering logic for custom granularities

ad038f0

Remove unneeded DATE_TRUNC logic for custom grain join

39cf91d

courtneyholcomb force-pushed the court/custom-grain-dfp-5 branch from 730cc49 to 39cf91d Compare September 24, 2024 13:46

courtneyholcomb requested a review from tlento September 24, 2024 13:47

tlento approved these changes Sep 24, 2024

View reviewed changes

courtneyholcomb merged commit 51664a5 into main Sep 24, 2024
15 checks passed

courtneyholcomb deleted the court/custom-grain-dfp-5 branch September 24, 2024 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQL rendering logic for custom granularities #1408

Add SQL rendering logic for custom granularities #1408

courtneyholcomb commented Sep 18, 2024 •

edited

Loading

tlento left a comment

tlento Sep 23, 2024

courtneyholcomb Sep 24, 2024

courtneyholcomb Sep 24, 2024

tlento Sep 23, 2024

courtneyholcomb Sep 24, 2024

tlento Sep 23, 2024

courtneyholcomb Sep 24, 2024

tlento Sep 23, 2024

courtneyholcomb Sep 24, 2024

tlento left a comment

Add SQL rendering logic for custom granularities #1408

Add SQL rendering logic for custom granularities #1408

Conversation

courtneyholcomb commented Sep 18, 2024 • edited Loading

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlento left a comment

Choose a reason for hiding this comment

courtneyholcomb commented Sep 18, 2024 •

edited

Loading