Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new dataflow plan nodes for custom offset windows #1579

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Dec 18, 2024

Adds two new nodes - CustomGranularityBoundsNode and OffsetByCustomGranularityNode. These will be used to build the SQL for custom offset windows.

@cla-bot cla-bot bot added the cla:yes label Dec 18, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Dec 18, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Dec 18, 2024
@courtneyholcomb courtneyholcomb marked this pull request as ready for review December 18, 2024 23:27
This includes a CASE expression, an integer expression, and some updates to the add time expression & the window function expression.
Not totally relevant to this PR, but these seem to have been removed somehow.
@courtneyholcomb courtneyholcomb force-pushed the court/custom-offset5 branch 2 times, most recently from 6e642e4 to a9dc916 Compare December 19, 2024 06:51
… in FilterElementsNode

This will more accurately match the column name since it includes more information about the spec
This will allow us to track which specs have had a window function applied between DataflowPlan nodes
We weren't tracking the parent nodes properly, which resulted in improper optimization and nodes missing when displaying the plan. This should not impact the output data, but will hopefully improve query efficiency now that more CTEs are enabled.
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing a chat:

  • Combining the two nodes into a single node would be more useful and allow for a more straightforward functional description.
  • However, that is blocked on some updates to the optimizer, which I'll make. @courtneyholcomb is planning to refactor once that change is in.
  • Rows are not duplicated in the join with the CASE statement when computing the offset. However, if the time interval has a different number of days, (e.g. quarters with different days in each), the last day could be an aggregation of multiple days. @courtneyholcomb is verifying that this is inline with product expectations.

Other notes:

What helped during review was an example of the SQL that the nodes would generate along with a few rows that show the respective values, especially for a boundary.

Could you include a test that includes those using a minimal DAG? pytest.mark.duckdb_only makes sense to use given that more extensive tests are present later in the stack.

@@ -56,6 +56,8 @@ class StaticIdPrefix(IdPrefix, Enum, metaclass=EnumMetaClassHelper):
DATAFLOW_NODE_JOIN_CONVERSION_EVENTS_PREFIX = "jce"
DATAFLOW_NODE_WINDOW_REAGGREGATION_ID_PREFIX = "wr"
DATAFLOW_NODE_ALIAS_SPECS_ID_PREFIX = "as"
DATAFLOW_NODE_CUSTOM_GRANULARITY_BOUNDS_ID_PREFIX = "cgb"
DATAFLOW_NODE_OFFSET_BY_CUSTOMG_GRANULARITY_ID_PREFIX = "obcg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Typo.

custom_granularity_bounds_node: Optional[CustomGranularityBoundsNode] = None
filter_elements_node: Optional[FilterElementsNode] = None
for parent_node in new_parent_nodes:
if isinstance(parent_node, CustomGranularityBoundsNode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combined node should allow this to be removed.

Base automatically changed from court/custom-offset4 to main December 21, 2024 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants