-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill nulls for multi-metric queries #850
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
da2653a
to
e8e1b17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this works, but if it does, can we either:
- Have an accessor on the metric_lookup object that only returns the immediate measure inputs for the metric in question?
- Have an is_derived or similar property that is based on the metric type rather than the number of measure inputs?
I'm concerned about edge case bugs creeping into the rendering here, and it'll be hard to reason about them because the typechecker can't help out at all. Like if I add a new metric type and we have logic that does an assert_values_exhausted if/else block switching over all metric types, that'll fail the typechecker.
@@ -706,7 +706,7 @@ def visit_compute_metrics_node(self, node: ComputeMetricsNode) -> SqlDataSet: | |||
metric_instances.append( | |||
MetricInstance( | |||
associated_columns=(output_column_association,), | |||
defined_from=(MetricModelReference(metric_name=metric_spec.element_name),), | |||
defined_from=MetricModelReference(metric_name=metric_spec.element_name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, thanks. We have so many of these kind of "let's allow a sequence of things and then have exactly one callsite that manually enforces there's only one" interfaces lying around...
@@ -436,7 +436,7 @@ def from_reference(reference: MetricReference) -> MetricSpec: | |||
|
|||
@property | |||
def alias_spec(self) -> MetricSpec: | |||
"""Returns a MetricSpec represneting the alias state.""" | |||
"""Returns a MetricSpec representing the alias state.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😸
group_by_objs: [{"name": "metric_time"}] | ||
check_query: | | ||
SELECT | ||
COALESCE(subq_7.metric_time__day, subq_12.metric_time__day) AS metric_time__day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there missing values in this date sequence for the bookings_fill_nulls_with metric? If so that's great. An output test would be nice but at least we can inspect the data frame from here and see it.
If not, it might be worth adding a dimension that has a gap for a given metric_time, dimension_value pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new test case & updated the source data to make sure there were some nulls filled for this case.
Will plan to add some output tests tomorrow!
input_measures = self._metric_lookup.measures_for_metric( | ||
metric_reference=MetricReference(metric_name), | ||
column_association_resolver=self._column_association_resolver, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to deal with the issue below is to add a different, internal API here that only gives the measures that are direct inputs into the metric. That way, if we add another metric type that takes measures as input anything derived from it won't have anything to do here, and we ensure we only do the coalesce once within a derived metric chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea! I'll add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added that!
@tlento This should be good to go, with the exception of updating Databricks snapshots. Wrapping that up now - repopulating the source schema is just taking forever. |
And now done! Kicking off SQL engine tests now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go! Thank you!
@@ -105,6 +106,23 @@ def add_metric(self, metric: Metric) -> None: | |||
) | |||
self._metrics[metric_reference] = metric | |||
|
|||
def yaml_input_measure_for_metric(self, metric_reference: MetricReference) -> Optional[MetricInputMeasure]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Super nitty but can we rename this away from yaml? YAML makes me grumpy. More importantly, there's no guarantee that the input is defined in YAML (although in practice that is how it works today).
Maybe something like direct_input_measure_for_metric
or configured_input_measure_for_metric
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
"""Get input measure defined in the metric YAML, if exists. | ||
|
||
When SemanticModel is constructed, input measures from input metrics are added to the list of input measures | ||
for a metric. Here, use rules about metric types to determine which input measures were defined in the YAML: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then here we can replace YAML
with config
or something.
@@ -275,7 +275,7 @@ integration_test: | |||
check_query: | | |||
SELECT | |||
CAST(bookings AS {{ double_data_type_name }}) / NULLIF(views, 0) AS bookings_per_view | |||
, groupby_8cbdaa28.ds AS metric_time__day | |||
, COALESCE(groupby_8cbdaa28.ds, groupby_68058b0b.ds) AS metric_time__day |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those aliases though 🤣
from metricflow.test.snapshot_utils import assert_object_snapshot_equal | ||
|
||
|
||
@pytest.mark.sql_engine_snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These snapshot tests are great, thanks for doing this!
Resolves #SL-1173
Description
MetricInstance.defined_from
to contain a singleMetricModelReference
instead of multiple. It doesn't seem like there is ever a time we would want multiple (multiple are never used in the codebase), and we need to know just one to get the correct element name.