Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved source node logic for custom granularities #1427

Merged
merged 2 commits into from
Sep 26, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Sep 24, 2024

Based on feedback in #1409, this PR makes some updates to the dataflow plan builder. These are only refactoring changes, there should be no behavioral changes. Changes include renaming some things and simplifying the code related to converting custom granularities to base granularities in source node resolution. Reviewing by commit is recommended - full details in commit descriptions.

@cla-bot cla-bot bot added the cla:yes label Sep 24, 2024
@courtneyholcomb courtneyholcomb changed the title Court/top of stack Improved source node logic for custom granularities Sep 24, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Sep 24, 2024
@courtneyholcomb courtneyholcomb force-pushed the court/custom-grain-where-2 branch from 50f3ca7 to dc20356 Compare September 24, 2024 23:38
@courtneyholcomb courtneyholcomb force-pushed the court/custom-grain-where-2 branch from dc20356 to 3feccdc Compare September 24, 2024 23:55
@courtneyholcomb courtneyholcomb force-pushed the court/custom-grain-where-2 branch from 06fec60 to f035679 Compare September 25, 2024 19:15
Comment on lines +987 to +989
# Replace any custom granularities with their base granularities. The custom granularity will be joined in
# later, since custom granularities cannot be satisfied by source nodes. But we will need the dimension at
# base granularity from the source node in order to join to the appropriate time spine later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computer kid thumbs up - "quality work!"

@@ -102,7 +102,7 @@


@dataclass(frozen=True)
class DataflowRecipe:
class SourceNodeRecipe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woman making hand gesture for 'very nice'

spec.with_base_grain() if isinstance(spec, TimeDimensionSpec) else spec
for spec in required_linkable_specs.as_tuple
]
required_linkable_specs.replace_custom_granularity_with_base_granularity().as_tuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this class is fairly complex, so I'm going to give up and just ask - I thought that by moving one of the replace_custom_granularity_with_base_granularity calls to _find_source_node_recipe, we obviated the need to do this again anywhere else. Was that impression wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that was the plan! I hadn't realized this use case was outside of that function.
I took a stab at seeing if we could add this spec set as a property to the SourceNodeRecipe to avoid calculating it again here, which would clean things up a bit. It works everywhere except for conversion metrics, which have a separate SourceNodeRecipe that they build. I tried to mess around with that code to make it work but I'm not familiar enough with conversion metrics to make any significant changes without consulting someone so I scrapped it.
Basically the blocker is that we haven't consolidated this code enough to make this simple. I would love to clean up the whole DataflowPlanBuilder to remove redundancies, and I we would certainly be able to clean this piece up then, but I think that's too big a task to block this feature.

@theyostalservice
Copy link
Contributor

Please take a look at #1427 (comment) before committing.

@courtneyholcomb courtneyholcomb force-pushed the court/custom-grain-where-2 branch from f035679 to 6f78489 Compare September 26, 2024 01:23
Base automatically changed from court/custom-grain-where-2 to main September 26, 2024 19:35
The dataflow recipe is essentially the best choice for the source node to select from and how to join any other source nodes required to satisfy all requested specs. The name is confusing, though, since the dataflow PLAN is the entire DAG, whichextends far beyond the source nodes. Rename to source node recipe in order to make it more clear that this part of the dataflow plan is only dealing with source nodes.
Consolidates that logic into _find_source_node_recipe() as much as possible, removes isinstance() checks, and clarifies comments.
@courtneyholcomb courtneyholcomb enabled auto-merge (squash) September 26, 2024 19:36
@courtneyholcomb courtneyholcomb merged commit ca92929 into main Sep 26, 2024
15 checks passed
@courtneyholcomb courtneyholcomb deleted the court/top-of-stack branch September 26, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants