Join to Time Spine & Fill Nulls #832

courtneyholcomb · 2023-10-31T22:59:10Z

Resolves #SL-783

Description

We have two new attributes on MetricInputMeasures - join_to_timespine and fill_nulls_with. This PR implements the expected behavior for those features.

join_to_timespine - when true for an input measure, we should join the already-aggregated measure to the time spine table, ensuring that any missing date periods in the underlying data are filled in with null rows. We do this by adding our existing JoinToTimeSpineNode to the Dataflow plan.
fill_nulls_with accepts an optional integer input. When not null, any rows that are null for the post-aggregated measure should be filled with the requested integer. We implement this with a COALESCE statement.

tlento

tl,dr; I think we should do the following:

Make sure the code runs and test cases pass for this PR as it is, and document it as allowing filling nulls with whatever for time spine values only (since that is what it does).
Add a follow up PR to allow filling nulls with zero (or whatever) for derived metrics/multi-metric queries. That will require a change in our compute metrics join semantics and tests for the multi-metric scenario.
Add some documentation including expected input and output.... somewhere. The test cases don't exactly do this because they're just SQL queries.

Long commentary follows:

Whew. This feature is super gnarly. In hindsight I think we should've built the coalesce without time spine first and then added the time spine in after.

This PR does not appear to address cases where there are joined in dimensions and multiple metrics, either as input to derived metrics or as general metric output. Right now, that is an INNER JOIN and so we will still filter out measure values from those input sets.

I think there's a bit of an open question about what the user should expect in these cases, so we can chat about that separately, but for derived metrics in particular it's a problem that we throw away entire rows of output just because something that should be represented as a 0 doesn't produce any join output values.

I also think addressing them might be a pretty gnarly PR in and of itself, since we need to change our join semantics on the final compute metrics join steps. This also begins to get into the LEFT vs RIGHT outer join type on our dimension joins, which currently all originate from the measure aggregation input rather than allowing for dimension values to have NULL (replaced with zero) values.

metricflow/test/generate_snapshots.py

metricflow/test/fixtures/semantic_manifest_yamls/simple_manifest/metrics.yaml

metricflow/dataflow/builder/dataflow_plan_builder.py

metricflow/plan_conversion/dataflow_to_sql.py

metricflow/test/plan_conversion/test_dataflow_to_sql_plan.py

tlento · 2023-11-02T00:26:01Z

...an.py/SqlQueryPlan/DuckDB/test_simple_fill_0_with_categorical_dimension__plan0_optimized.sql

+  booking__is_instant
+  , COALESCE(bookings, 0) AS bookings_fill_0


This is something we haven't talked about at all I don't think, but I was thinking about it while reading the code. I believe this is the correct approach, but we need to document it carefully.

This does not fill values for missing dimension values. If, for some reason, there were 0 bookings with is_instant = true we would render one row - false, <number>.

In cases where users are doing some kind of full scope group by, or using those rows as inputs into a derived metric, they would want the full value set represented. However, in cases where they were filtering out those values, they would not.

Therefore, we cannot fill 0s for all dimension combinations, we can only ensure that any case where an actual NULL shows up (rather than a total absence of values) for a measure the NULL is coalesced to the specified fill value. This seems appropriate given the name of the property, but it's good for us to be clear about it in docs and so forth.

I think we did discuss this at standup yesterday! i.e., the fact that we obviously don't have a "dimension values spine" table to join to so we won't be filling those missing rows. So product is aware of this limitation (at least Nick is). But definitely agree that we need to document this clearly!!

metricflow/test/integration/test_cases/itest_metrics.yaml

courtneyholcomb · 2023-11-02T18:51:52Z

document it as allowing filling nulls with whatever for time spine values only (since that is what it does).

I'll add here that this isn't ALL it does. If there are nulls in the underlying data, it fills those too. Not sure how common that is, though.

Add some documentation including expected input and output.... somewhere. The test cases don't exactly do this because they're just SQL queries.

Do we currently do this anywhere ?

courtneyholcomb · 2023-11-02T21:11:58Z

Snowflake test failures are unrelated to this PR. I put a fix up for those here: #835

tlento

I'll add here that this isn't ALL it does. If there are nulls in the underlying data, it fills those too. Not sure how common that is, though.

True! My bad, too much sweeping generalization in there.

As for documenting things in other places.... I don't think we do, but now that I think about it I feel like the official dbt docs is the place to do it, so maybe we just open a PR over there after the next update is ready to roll out.

Thanks for doing the renaming, saves me some time!

courtneyholcomb added 6 commits October 31, 2023 15:27

Add join_to_timespine and fill_nulls_with to MetricInputMeasureSpec

d3f1cdc

Update JoinToTimeSpineNode attributes

501fb0b

Add Dataflow plan step to join aggregated measure to time spine

e50cbca

Generate SQL to coalesce nulls when fill_nulls_with is requested

4c679d1

Integration test cases

74c2209

Changelog

ce55fb1

cla-bot bot added the cla:yes label Oct 31, 2023

Regenerate SQL engine test snapshots (except Snowflake)

e050b31

courtneyholcomb added the run_mf_sql_engine_tests label Oct 31, 2023

courtneyholcomb requested review from tlento and plypaul October 31, 2023 23:42

courtneyholcomb added 2 commits October 31, 2023 16:50

Render engine-specific test case SQL

155794e

Update Snowflake snapshots

6e42eb2

courtneyholcomb added run_mf_sql_engine_tests and removed run_mf_sql_engine_tests labels Nov 1, 2023

tlento reviewed Nov 2, 2023

View reviewed changes

tlento added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed run_mf_sql_engine_tests labels Nov 2, 2023

tlento temporarily deployed to DW_INTEGRATION_TESTS November 2, 2023 00:43 — with GitHub Actions Inactive

tlento had a problem deploying to DW_INTEGRATION_TESTS November 2, 2023 00:43 — with GitHub Actions Failure

tlento mentioned this pull request Nov 2, 2023

COALESCE and CONCAT should not be tagged as aggregate functions #833

Open

courtneyholcomb added 3 commits November 2, 2023 11:10

Remove TODO

a537af9

Render date trunc engine-agnostically

fc330e5

Rename fill_0 to fill_nulls_with_0

26b3044

courtneyholcomb added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Nov 2, 2023

courtneyholcomb had a problem deploying to DW_INTEGRATION_TESTS November 2, 2023 18:55 — with GitHub Actions Error

Update SQL engine snapshots

0dcc69d

courtneyholcomb added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Nov 2, 2023

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 2, 2023 19:14 — with GitHub Actions Inactive

courtneyholcomb had a problem deploying to DW_INTEGRATION_TESTS November 2, 2023 19:14 — with GitHub Actions Failure

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 2, 2023 19:14 — with GitHub Actions Inactive

tlento approved these changes Nov 2, 2023

View reviewed changes

courtneyholcomb merged commit 7bbb704 into main Nov 2, 2023

courtneyholcomb deleted the court/fill-nulls branch November 2, 2023 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Join to Time Spine & Fill Nulls #832

Join to Time Spine & Fill Nulls #832

courtneyholcomb commented Oct 31, 2023 •

edited

Loading

tlento left a comment

tlento Nov 2, 2023

courtneyholcomb Nov 2, 2023

courtneyholcomb commented Nov 2, 2023

courtneyholcomb commented Nov 2, 2023

tlento left a comment

		booking__is_instant
		, COALESCE(bookings, 0) AS bookings_fill_0

Join to Time Spine & Fill Nulls #832

Join to Time Spine & Fill Nulls #832

Conversation

courtneyholcomb commented Oct 31, 2023 • edited Loading

Description

tlento left a comment

Choose a reason for hiding this comment

tlento Nov 2, 2023

Choose a reason for hiding this comment

courtneyholcomb Nov 2, 2023

Choose a reason for hiding this comment

courtneyholcomb commented Nov 2, 2023

courtneyholcomb commented Nov 2, 2023

tlento left a comment

Choose a reason for hiding this comment

courtneyholcomb commented Oct 31, 2023 •

edited

Loading