-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new dataflow plan nodes for custom offset windows #1579
base: main
Are you sure you want to change the base?
Conversation
f267e57
to
57ba604
Compare
74c5cf7
to
1eb8615
Compare
57ba604
to
71cef04
Compare
This includes a CASE expression, an integer expression, and some updates to the add time expression & the window function expression.
Not totally relevant to this PR, but these seem to have been removed somehow.
1eb8615
to
f24f9fa
Compare
6e642e4
to
a9dc916
Compare
… in FilterElementsNode This will more accurately match the column name since it includes more information about the spec
This will allow us to track which specs have had a window function applied between DataflowPlan nodes
We weren't tracking the parent nodes properly, which resulted in improper optimization and nodes missing when displaying the plan. This should not impact the output data, but will hopefully improve query efficiency now that more CTEs are enabled.
f24f9fa
to
c79153d
Compare
a9dc916
to
7de2bd3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing a chat:
- Combining the two nodes into a single node would be more useful and allow for a more straightforward functional description.
- However, that is blocked on some updates to the optimizer, which I'll make. @courtneyholcomb is planning to refactor once that change is in.
- Rows are not duplicated in the join with the
CASE
statement when computing the offset. However, if the time interval has a different number of days, (e.g. quarters with different days in each), the last day could be an aggregation of multiple days. @courtneyholcomb is verifying that this is inline with product expectations.
Other notes:
What helped during review was an example of the SQL that the nodes would generate along with a few rows that show the respective values, especially for a boundary.
Could you include a test that includes those using a minimal DAG? pytest.mark.duckdb_only
makes sense to use given that more extensive tests are present later in the stack.
@@ -56,6 +56,8 @@ class StaticIdPrefix(IdPrefix, Enum, metaclass=EnumMetaClassHelper): | |||
DATAFLOW_NODE_JOIN_CONVERSION_EVENTS_PREFIX = "jce" | |||
DATAFLOW_NODE_WINDOW_REAGGREGATION_ID_PREFIX = "wr" | |||
DATAFLOW_NODE_ALIAS_SPECS_ID_PREFIX = "as" | |||
DATAFLOW_NODE_CUSTOM_GRANULARITY_BOUNDS_ID_PREFIX = "cgb" | |||
DATAFLOW_NODE_OFFSET_BY_CUSTOMG_GRANULARITY_ID_PREFIX = "obcg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Typo.
custom_granularity_bounds_node: Optional[CustomGranularityBoundsNode] = None | ||
filter_elements_node: Optional[FilterElementsNode] = None | ||
for parent_node in new_parent_nodes: | ||
if isinstance(parent_node, CustomGranularityBoundsNode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combined node should allow this to be removed.
c79153d
to
147c9d2
Compare
Adds two new nodes -
CustomGranularityBoundsNode
andOffsetByCustomGranularityNode
. These will be used to build the SQL for custom offset windows.