-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CombineMetricsNode to combine the outputs from aggregated measures #858
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think combining the logic makes sense but let's add a Protocol type so (or, since the properties are the same, we can just take a Union[] and add a single switch on an isinstance() check - I prefer the protocol type as it eliminates the need for an isinstance() check and makes it a little more obvious which properties need to change in lock-step).
cbac772
to
6bf4dd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! With the note that I would love for us to establish some clear logic about when to use instance converters.
@@ -586,6 +593,47 @@ def transform(self, instance_set: InstanceSet) -> InstanceSet: # noqa: D | |||
) | |||
|
|||
|
|||
class UpdateMeasureFillNullsWith(InstanceSetTransform[InstanceSet]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally relevant to the PR, but this whole "instance converter" pattern is something I need to understand better... i.e., when is it appropriate to add a new instance converter vs. just putting that code in dataflow_to_sql
or wherever it's relevant? Maybe a question for @tlento!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea abit hazy on that as well. I had it here just cause it felt cleaner. My take on this is that we would put it in the converter when we're mutating the instance spec set itself. But I guess there's also cases where we;re converting the spec set into select columns, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@courtneyholcomb The original context on that is - let's say we created a new type of instance and had to update the code base. How do we find all the cases that need to be updated? The thought was that centralizing that logic into these classes would make it easy to do so. We haven't had to add a new instance in a while though. However, dataflow_to_sql
is already large so separating things into smaller functional chucks is preferred.
Also, why aren't checks running for this PR 🤔 |
@courtneyholcomb there's been some discussion on how we wanted to do this, so I wanted an initial review on the implementation before I rename the node itself and then run the tests |
6bf4dd7
to
c117e8e
Compare
c117e8e
to
09da49f
Compare
09da49f
to
7da7b49
Compare
Let's not do that. |
@plypaul there's no need to do this anymore since we added the fill_nulls_with property to MeasureSpec. |
Description
Currently, all metrics have only 1 input_measure linked to it, this was a result from ratio metric switching to use input_metrics rather than input_measures. This resulted in the removal of
JoinAggregatedMeasuresByGroupByColumnsNode
which was then unused. However, this node is now useful again for conversion metrics as we need to have the ability to join multiple aggregated measures node together. After a few discussions, tweakingCombineMetricsNode
to work for joining metrics or measures outputs was a better alternative than bringing backJoinAggregatedMeasuresByGroupByColumnsNode
.Changes
fill_nulls_with
toMeasureSpec
fill_nulls_with
during measure aggregation such that nodes upstream (such as the combine node) can know about what values should be filled for nullsAggregateMeasuresNode
inCombineMetricsNode
CombineMetricsNode
toCombineAggregatedOutputsNode