Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow plan for custom granularities #1409

Merged
merged 4 commits into from
Sep 24, 2024
Merged

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Sep 18, 2024

Update the DataflowPlanBuilder to support custom granularities. Steps included:

  • For each custom granularity requested, add the appropriate JoinToCustomGranularityNode to the DataflowPlan.
  • When looking for nodes that can satisfy a given linkable spec, check for the ability to satisfy the spec's base granularity, not the custom granularity requested. That will be joined in later.

@cla-bot cla-bot bot added the cla:yes label Sep 18, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Sep 18, 2024
@courtneyholcomb courtneyholcomb marked this pull request as ready for review September 18, 2024 22:17
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theyostalservice and I are both a bit confused by the conversion operations we're doing on the TimeDimensionSpecs (see inlines for where the confusion arises).

It feels like the custom granularity specs should be checkable via simple equality in their own right, making the conversion to base grain unnecessary. Then when we encounter one we should be able to get the base grain we need to construct the join just at that moment in the DataflowToSQLQueryPlanConverter logic.

If that isn't the case it seems like there's some construct we're missing someplace. It's possible I'm the one missing something, though, so please do let us know. If it's easier to explain in real time let's jump on a call.

metricflow/dataflow/builder/node_evaluator.py Outdated Show resolved Hide resolved
Comment on lines +902 to +918
for linkable_spec in linkable_specs.as_tuple:
if isinstance(linkable_spec, TimeDimensionSpec) and linkable_spec.time_granularity.is_custom_granularity:
linkable_spec_with_base_grain = linkable_spec.with_grain(
ExpandedTimeGranularity.from_time_granularity(linkable_spec.time_granularity.base_granularity)
)
linkable_specs_set_with_base_granularities.add(linkable_spec_with_base_grain)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why we're doing these granularity transformations here and below (and I think maybe in another place in the overall stack).

@courtneyholcomb
Copy link
Contributor Author

@tlento @theyostalservice we should have a huddle about this one today, if possible!
Essentially, source nodes can't satisfy custom grains, because those require a join. I agree that this mid-DataflowPlan grain conversion logic is a bit flimsy, but I haven't been able to come up with a better solution thus far and would love to hear if you have an idea.

linkable_specs_set_with_base_granularities: Set[LinkableInstanceSpec] = set()
# TODO: Add support for no-metrics queries for custom grains without a join (i.e., select directly from time spine).
for linkable_spec in linkable_specs.as_tuple:
if isinstance(linkable_spec, TimeDimensionSpec) and linkable_spec.time_granularity.is_custom_granularity:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove type refinement (here and elsewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use helper to get all custom time dims from linkable spec set

@@ -406,6 +407,10 @@ def evaluate_node(
logger.debug(f"Candidate spec set is:\n{mf_pformat(candidate_spec_set)}")

data_set_linkable_specs = candidate_spec_set.linkable_specs
# Look for which nodes can satisfy the linkable specs at their base grains. Custom grains will be joined later.
required_linkable_specs_with_base_grains = [
spec.with_base_grain if isinstance(spec, TimeDimensionSpec) else spec for spec in required_linkable_specs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this function to take a LinkableSpecSet instead of a Sequence

@@ -1020,10 +1044,12 @@ def _find_dataflow_recipe(
metric_time_dimension_reference=self._metric_time_dimension_reference,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this function to do the conversion from custom to base grain, keep that change contained inside it
and remove related logic from elsewhere

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! Per earlier discussion:

  1. We'll move the custom grain -> base grain conversion logic into _find_dataflow_recipe, since that's where it's needed, and remove the inlined bits from the node evaluator and other locations
  2. We'll use the LinkableSpecSet for managing the relevant type refinement instead of inline isinstance checks
  3. The calls to do these can have comments explaining why they're happening as appropriate. I think it's most important in the doc block for _find_dataflow_recipe, since we'll be doing these operations at the entry point to that method.

As discussed @courtneyholcomb will make these changes at the top of the stack.

InstanceSpecSet.create_from_specs(required_linkable_specs.as_tuple),
InstanceSpecSet.create_from_specs(
[
spec.with_base_grain() if isinstance(spec, TimeDimensionSpec) else spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this will also get consolidated like the other sections, so it'll be based on the LinkableSpecSet operation rather than this inline type refinement.

Base automatically changed from court/custom-grain-dfp-5 to main September 24, 2024 21:02
@courtneyholcomb courtneyholcomb enabled auto-merge (squash) September 24, 2024 21:06
@courtneyholcomb courtneyholcomb merged commit 51d781b into main Sep 24, 2024
15 checks passed
@courtneyholcomb courtneyholcomb deleted the court/custom-grain-dfp-6 branch September 24, 2024 21:08
courtneyholcomb added a commit that referenced this pull request Sep 26, 2024
Based on feedback in #1409,
this PR makes some updates to the dataflow plan builder. These are only
refactoring changes, there should be no behavioral changes. Changes
include renaming some things and simplifying the code related to
converting custom granularities to base granularities in source node
resolution. Reviewing by commit is recommended - full details in commit
descriptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants