Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CombineMetricsNode to combine the outputs from aggregated measures #858

Merged
merged 6 commits into from
Nov 16, 2023

Conversation

WilliamDee
Copy link
Contributor

@WilliamDee WilliamDee commented Nov 9, 2023

Description

Currently, all metrics have only 1 input_measure linked to it, this was a result from ratio metric switching to use input_metrics rather than input_measures. This resulted in the removal of JoinAggregatedMeasuresByGroupByColumnsNode which was then unused. However, this node is now useful again for conversion metrics as we need to have the ability to join multiple aggregated measures node together. After a few discussions, tweaking CombineMetricsNode to work for joining metrics or measures outputs was a better alternative than bringing back JoinAggregatedMeasuresByGroupByColumnsNode.

Changes

  • Added fill_nulls_with to MeasureSpec
    • populate the fill_nulls_with during measure aggregation such that nodes upstream (such as the combine node) can know about what values should be filled for nulls
  • Added the ability to combine AggregateMeasuresNode in CombineMetricsNode
  • Rename CombineMetricsNode to CombineAggregatedOutputsNode

@cla-bot cla-bot bot added the cla:yes label Nov 9, 2023
Copy link

github-actions bot commented Nov 9, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think combining the logic makes sense but let's add a Protocol type so (or, since the properties are the same, we can just take a Union[] and add a single switch on an isinstance() check - I prefer the protocol type as it eliminates the need for an isinstance() check and makes it a little more obvious which properties need to change in lock-step).

metricflow/plan_conversion/dataflow_to_sql.py Outdated Show resolved Hide resolved
metricflow/plan_conversion/dataflow_to_sql.py Outdated Show resolved Hide resolved
@WilliamDee WilliamDee force-pushed the will/combine-metrics branch 4 times, most recently from cbac772 to 6bf4dd7 Compare November 15, 2023 07:51
@WilliamDee WilliamDee marked this pull request as ready for review November 15, 2023 07:52
@WilliamDee WilliamDee requested a review from tlento November 15, 2023 07:58
@WilliamDee WilliamDee assigned plypaul and unassigned plypaul Nov 15, 2023
Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! With the note that I would love for us to establish some clear logic about when to use instance converters.

metricflow/protocols/semantics.py Show resolved Hide resolved
@@ -586,6 +593,47 @@ def transform(self, instance_set: InstanceSet) -> InstanceSet: # noqa: D
)


class UpdateMeasureFillNullsWith(InstanceSetTransform[InstanceSet]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally relevant to the PR, but this whole "instance converter" pattern is something I need to understand better... i.e., when is it appropriate to add a new instance converter vs. just putting that code in dataflow_to_sql or wherever it's relevant? Maybe a question for @tlento!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea abit hazy on that as well. I had it here just cause it felt cleaner. My take on this is that we would put it in the converter when we're mutating the instance spec set itself. But I guess there's also cases where we;re converting the spec set into select columns, etc...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@courtneyholcomb The original context on that is - let's say we created a new type of instance and had to update the code base. How do we find all the cases that need to be updated? The thought was that centralizing that logic into these classes would make it easy to do so. We haven't had to add a new instance in a while though. However, dataflow_to_sql is already large so separating things into smaller functional chucks is preferred.

@courtneyholcomb
Copy link
Contributor

Also, why aren't checks running for this PR 🤔

@WilliamDee
Copy link
Contributor Author

Also, why aren't checks running for this PR 🤔

@courtneyholcomb there's been some discussion on how we wanted to do this, so I wanted an initial review on the implementation before I rename the node itself and then run the tests

@courtneyholcomb courtneyholcomb self-requested a review November 15, 2023 22:23
@courtneyholcomb courtneyholcomb dismissed their stale review November 15, 2023 22:23

PR still pending

@plypaul
Copy link
Contributor

plypaul commented Nov 16, 2023

single switch on an isinstance() check

Let's not do that.

@WilliamDee
Copy link
Contributor Author

single switch on an isinstance() check

Let's not do that.

@plypaul there's no need to do this anymore since we added the fill_nulls_with property to MeasureSpec.

@WilliamDee WilliamDee added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 16, 2023
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 16, 2023 22:39 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 16, 2023 22:39 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 16, 2023 22:39 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 16, 2023 22:39 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 16, 2023
@WilliamDee WilliamDee merged commit 063ea3f into main Nov 16, 2023
34 of 35 checks passed
@WilliamDee WilliamDee deleted the will/combine-metrics branch November 16, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants