Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute / Cache Outputs for Nodes in SourceNodeSet #1030

Merged
merged 5 commits into from
Feb 15, 2024

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Feb 9, 2024

Description

When building a DataflowPlan, nodes from SourceNodeSet are used as common building blocks for different queries using the same semantic manifest. When the output for a DataflowPlanNode is computed, the output is lazily computed and then cached since the output will be required many times for the same nodes (e.g. between queries, and between common join candidates). The output contains generated IDs, and so precomputing the output nodes for nodes from SourceNodeSet will have the following effects:

  • The runtime of building the DataflowPlan should be a little more consistent between queries at the expense of initialization time.
  • The IDs in the generated output will be consistent regardless of the order queries seen by the DataflowPlanBuilder.

@cla-bot cla-bot bot added the cla:yes label Feb 9, 2024
@plypaul plypaul marked this pull request as ready for review February 9, 2024 00:15
Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -294,8 +295,10 @@ def explain_get_dimension_values( # noqa: D
"""Returns the SQL query for get_dimension_values.

Args:
metric_name: Names of metrics that contain the group_by.
metric_names: Names of metrics that contain the group_by.
metrics: Similar to `metric_names`, but specified via parameter objects.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove comma at the end of this line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@plypaul plypaul force-pushed the plypaul--88.2.1--source-node-set branch from b4899bd to a322a2d Compare February 12, 2024 21:36
@plypaul plypaul force-pushed the plypaul--88.2.2--cache-source-node-output branch from 5c652ff to 4e060a7 Compare February 12, 2024 21:36
@plypaul plypaul force-pushed the plypaul--88.2.1--source-node-set branch from a322a2d to 89a9e39 Compare February 15, 2024 19:17
Base automatically changed from plypaul--88.2.1--source-node-set to main February 15, 2024 19:21
@plypaul plypaul force-pushed the plypaul--88.2.2--cache-source-node-output branch from 4e060a7 to afe1d19 Compare February 15, 2024 19:22
@plypaul plypaul force-pushed the plypaul--88.2.2--cache-source-node-output branch from afe1d19 to 7840539 Compare February 15, 2024 19:28
@plypaul plypaul enabled auto-merge (rebase) February 15, 2024 19:28
@plypaul plypaul merged commit 6d8accf into main Feb 15, 2024
9 checks passed
@plypaul plypaul deleted the plypaul--88.2.2--cache-source-node-output branch February 15, 2024 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants