Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Time Spine Nodes in Dataflow Plan #1548

Merged
merged 11 commits into from
Dec 11, 2024
Merged

Include Time Spine Nodes in Dataflow Plan #1548

merged 11 commits into from
Dec 11, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Nov 23, 2024

Previously, time spine nodes were built on the fly in the dataflow to SQL logic. This came with several limitations, including:

  • We can't apply where constraints and time constraints using the standard dataflow plan logic. This limitation was the main driver for this PR stack.
  • We can't track time spines as source nodes. We might want to add these to the dbt DAG at some point.

This PR adds logic to build time spine nodes in the dataflow plan, instead.

@courtneyholcomb courtneyholcomb changed the base branch from main to court/simp7 November 23, 2024 06:01
@courtneyholcomb courtneyholcomb changed the title Court/simp8 Build Time Spine Nodes in Dataflow Plan Nov 23, 2024
@courtneyholcomb courtneyholcomb changed the title Build Time Spine Nodes in Dataflow Plan Include Time Spine Nodes in Dataflow Plan Nov 23, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Nov 23, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Nov 23, 2024
@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 23, 2024
@courtneyholcomb courtneyholcomb marked this pull request as ready for review November 23, 2024 06:46
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 23, 2024
@courtneyholcomb courtneyholcomb force-pushed the court/simp8 branch 2 times, most recently from c0b009e to fe9a711 Compare November 25, 2024 23:12
Base automatically changed from court/simp7 to main December 9, 2024 22:27
@cla-bot cla-bot bot added the cla:yes label Dec 9, 2024
@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Dec 11, 2024
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Dec 11, 2024
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but left some minor comments.

metricflow/dataflow/nodes/transform_time_dimensions.py Outdated Show resolved Hide resolved
metricflow/dataflow/nodes/transform_time_dimensions.py Outdated Show resolved Hide resolved
@@ -144,7 +145,7 @@ def instances_for_time_dimensions(
return instances_to_return

def instance_for_time_dimension(self, time_dimension_spec: TimeDimensionSpec) -> TimeDimensionInstance:
"""Given the name of the time dimension, return the instance associated with it in the data set."""
"""Given ta time dimension spec, return the instance associated with it in the data set."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo

for instance in instances:
if instance.spec == spec:
return instance
raise RuntimeError(f"Did not find instance matching spec in dataset. Spec: {spec}\nInstances: {instances}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better formatting, you can do:

# Easier autocomplete.
raise RuntimeError(str(LazyFormat("Did not find instance matching spec in dataset.", spec=spec, instances=instances))

# Or call formatting function.
raise RuntimeError(mf_pformat_dict("Did not find instance matching spec in dataset.", {"spec": spec, "instances": instances))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good call! I need to change this habit.

class AliasSpecsNode(DataflowPlanNode, ABC):
"""Change the columns matching the key specs to match the value specs."""

change_specs: Sequence[Tuple[InstanceSpec, InstanceSpec]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this class is frozen, it's preferable to keep the field types immutable as well. i.e.

change_specs: Tuple[...]

Also helpful to have a dataclass for Tuple[InstanceSpec, InstanceSpec] so that you don't have to remember what index is what.

… for time spines

The metric time nodes are used for resolving metric_time without metrics. The read SQL nodes will be used for time spine joins.
@courtneyholcomb courtneyholcomb enabled auto-merge (squash) December 11, 2024 19:31
@courtneyholcomb courtneyholcomb merged commit 7782067 into main Dec 11, 2024
11 checks passed
@courtneyholcomb courtneyholcomb deleted the court/simp8 branch December 11, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants