Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test time spines for sub-daily granularity #1358

Merged
merged 6 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,40 @@ project_configuration:
- location: $source_schema.mf_time_spine
column_name: ds
grain: day
time_spines:
- node_relation:
alias: mf_time_spine_nanosecond
schema_name: $source_schema
primary_column:
name: ts
time_granularity: nanosecond
- node_relation:
alias: mf_time_spine_microsecond
schema_name: $source_schema
primary_column:
name: ts
time_granularity: microsecond
- node_relation:
alias: mf_time_spine_millisecond
schema_name: $source_schema
primary_column:
name: ts
time_granularity: millisecond
- node_relation:
alias: mf_time_spine_second
schema_name: $source_schema
primary_column:
name: ts
time_granularity: second
- node_relation:
alias: mf_time_spine_minute
schema_name: $source_schema
primary_column:
name: ts
time_granularity: minute
- node_relation:
alias: mf_time_spine_hour
schema_name: $source_schema
primary_column:
name: ts
time_granularity: hour
Original file line number Diff line number Diff line change
Expand Up @@ -783,22 +783,3 @@ metric:
name: listings
filter: "{{ Metric('views', ['listing']) }} > 10"
time_granularity: week
---
metric:
name: bookings_before_dec_20_2019
description: Bookings up to the end of 2022
type: simple
type_params:
measure: bookings
filter: "{{ TimeDimension('metric_time') }} < '2012-12-20'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I know why this is. Timestamp literals in Trino have to take the form TIMESTAMP <literal> so we'd need to do some substitution somewhere.

If the error is due to the filter it means we have to do custom rendering against the filter expr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I couldn't come up with an easy way to fix it here and wasn't sure it was worth doing the hard fix for an engine we barely use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think the right way to deal with this is via a more holistic approach to filter expression inputs. In the meantime, having a small gap in Trino test coverage - particularly one where the main difference essentially boils down to how we go about pasting in the user-provided expression at render time - seems fine to me.

---
metric:
name: bookings_between_dec_18_2019_and_dec_20_2019
description: Bookings starting in 2020. Used to test a metric with different types of ambiguous filters in on its input metric.
type: derived
type_params:
expr: bookings_before_dec_20_2019
metrics:
- name: bookings_before_dec_20_2019
filter: "{{ TimeDimension('metric_time') }} > '2019-12-18'"
time_granularity: week
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@
'listing__booking__listing__booking_value_sub_instant',
'listing__booking__listing__booking_value_sub_instant_add_10',
'listing__booking__listing__bookings',
'listing__booking__listing__bookings_before_dec_20_2019',
'listing__booking__listing__bookings_between_dec_18_2019_and_dec_20_2019',
'listing__booking__listing__bookings_fill_nulls_with_0',
'listing__booking__listing__bookings_fill_nulls_with_0_without_time_spine',
'listing__booking__listing__bookings_join_to_time_spine',
Expand Down Expand Up @@ -56,8 +54,6 @@
'listing__booking_value_sub_instant',
'listing__booking_value_sub_instant_add_10',
'listing__bookings',
'listing__bookings_before_dec_20_2019',
'listing__bookings_between_dec_18_2019_and_dec_20_2019',
'listing__bookings_fill_nulls_with_0',
'listing__bookings_fill_nulls_with_0_without_time_spine',
'listing__bookings_join_to_time_spine',
Expand Down Expand Up @@ -266,8 +262,6 @@
'user__listing__user__booking_value_sub_instant',
'user__listing__user__booking_value_sub_instant_add_10',
'user__listing__user__bookings',
'user__listing__user__bookings_before_dec_20_2019',
'user__listing__user__bookings_between_dec_18_2019_and_dec_20_2019',
'user__listing__user__bookings_fill_nulls_with_0',
'user__listing__user__bookings_fill_nulls_with_0_without_time_spine',
'user__listing__user__bookings_join_to_time_spine',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ Model Join-Path Entity Links
('listings_latest',) ("('listing',)", "('booking', 'listing')") booking_value_sub_instant ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") booking_value_sub_instant_add_10 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") bookings ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") bookings_before_dec_20_2019 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") bookings_between_dec_18_2019_and_dec_20_2019 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") bookings_fill_nulls_with_0 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") bookings_fill_nulls_with_0_without_time_spine ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('booking', 'listing')") bookings_join_to_time_spine ['JOINED', 'METRIC']
Expand Down Expand Up @@ -79,8 +77,6 @@ Model Join-Path Entity Links
('listings_latest',) ("('listing',)", "('listing',)") booking_value_sub_instant ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") booking_value_sub_instant_add_10 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") bookings ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") bookings_before_dec_20_2019 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") bookings_between_dec_18_2019_and_dec_20_2019 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") bookings_fill_nulls_with_0 ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") bookings_fill_nulls_with_0_without_time_spine ['JOINED', 'METRIC']
('listings_latest',) ("('listing',)", "('listing',)") bookings_join_to_time_spine ['JOINED', 'METRIC']
Expand Down Expand Up @@ -140,8 +136,6 @@ Model Join-Path Entity Links
('listings_latest',) ("('user',)", "('listing', 'user')") booking_value_sub_instant ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") booking_value_sub_instant_add_10 ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") bookings ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") bookings_before_dec_20_2019 ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") bookings_between_dec_18_2019_and_dec_20_2019 ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") bookings_fill_nulls_with_0 ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") bookings_fill_nulls_with_0_without_time_spine ['JOINED', 'METRIC']
('listings_latest',) ("('user',)", "('listing', 'user')") bookings_join_to_time_spine ['JOINED', 'METRIC']
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,6 @@
'company__listing__user__company__booking_value_sub_instant',
'company__listing__user__company__booking_value_sub_instant_add_10',
'company__listing__user__company__bookings',
'company__listing__user__company__bookings_before_dec_20_2019',
'company__listing__user__company__bookings_between_dec_18_2019_and_dec_20_2019',
'company__listing__user__company__bookings_fill_nulls_with_0',
'company__listing__user__company__bookings_fill_nulls_with_0_without_time_spine',
'company__listing__user__company__bookings_join_to_time_spine',
Expand Down Expand Up @@ -127,8 +125,6 @@
'guest__booking__guest__booking_value_sub_instant',
'guest__booking__guest__booking_value_sub_instant_add_10',
'guest__booking__guest__bookings',
'guest__booking__guest__bookings_before_dec_20_2019',
'guest__booking__guest__bookings_between_dec_18_2019_and_dec_20_2019',
'guest__booking__guest__bookings_fill_nulls_with_0',
'guest__booking__guest__bookings_fill_nulls_with_0_without_time_spine',
'guest__booking__guest__bookings_join_to_time_spine',
Expand Down Expand Up @@ -162,8 +158,6 @@
'guest__booking_value_sub_instant',
'guest__booking_value_sub_instant_add_10',
'guest__bookings',
'guest__bookings_before_dec_20_2019',
'guest__bookings_between_dec_18_2019_and_dec_20_2019',
'guest__bookings_fill_nulls_with_0',
'guest__bookings_fill_nulls_with_0_without_time_spine',
'guest__bookings_join_to_time_spine',
Expand Down Expand Up @@ -208,8 +202,6 @@
'host__booking__host__booking_value_sub_instant',
'host__booking__host__booking_value_sub_instant_add_10',
'host__booking__host__bookings',
'host__booking__host__bookings_before_dec_20_2019',
'host__booking__host__bookings_between_dec_18_2019_and_dec_20_2019',
'host__booking__host__bookings_fill_nulls_with_0',
'host__booking__host__bookings_fill_nulls_with_0_without_time_spine',
'host__booking__host__bookings_join_to_time_spine',
Expand Down Expand Up @@ -243,8 +235,6 @@
'host__booking_value_sub_instant',
'host__booking_value_sub_instant_add_10',
'host__bookings',
'host__bookings_before_dec_20_2019',
'host__bookings_between_dec_18_2019_and_dec_20_2019',
'host__bookings_fill_nulls_with_0',
'host__bookings_fill_nulls_with_0_without_time_spine',
'host__bookings_join_to_time_spine',
Expand Down Expand Up @@ -290,8 +280,6 @@
'listing__booking__listing__booking_value_sub_instant',
'listing__booking__listing__booking_value_sub_instant_add_10',
'listing__booking__listing__bookings',
'listing__booking__listing__bookings_before_dec_20_2019',
'listing__booking__listing__bookings_between_dec_18_2019_and_dec_20_2019',
'listing__booking__listing__bookings_fill_nulls_with_0',
'listing__booking__listing__bookings_fill_nulls_with_0_without_time_spine',
'listing__booking__listing__bookings_join_to_time_spine',
Expand Down Expand Up @@ -326,8 +314,6 @@
'listing__booking_value_sub_instant',
'listing__booking_value_sub_instant_add_10',
'listing__bookings',
'listing__bookings_before_dec_20_2019',
'listing__bookings_between_dec_18_2019_and_dec_20_2019',
'listing__bookings_fill_nulls_with_0',
'listing__bookings_fill_nulls_with_0_without_time_spine',
'listing__bookings_join_to_time_spine',
Expand Down Expand Up @@ -399,8 +385,6 @@
'lux_listing__listing__lux_listing__booking_value_sub_instant',
'lux_listing__listing__lux_listing__booking_value_sub_instant_add_10',
'lux_listing__listing__lux_listing__bookings',
'lux_listing__listing__lux_listing__bookings_before_dec_20_2019',
'lux_listing__listing__lux_listing__bookings_between_dec_18_2019_and_dec_20_2019',
'lux_listing__listing__lux_listing__bookings_fill_nulls_with_0',
'lux_listing__listing__lux_listing__bookings_fill_nulls_with_0_without_time_spine',
'lux_listing__listing__lux_listing__bookings_join_to_time_spine',
Expand Down Expand Up @@ -512,8 +496,6 @@
'user__listing__user__booking_value_sub_instant',
'user__listing__user__booking_value_sub_instant_add_10',
'user__listing__user__bookings',
'user__listing__user__bookings_before_dec_20_2019',
'user__listing__user__bookings_between_dec_18_2019_and_dec_20_2019',
'user__listing__user__bookings_fill_nulls_with_0',
'user__listing__user__bookings_fill_nulls_with_0_without_time_spine',
'user__listing__user__bookings_join_to_time_spine',
Expand Down
5 changes: 3 additions & 2 deletions tests_metricflow/dataflow/builder/test_node_data_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

from _pytest.fixtures import FixtureRequest
from dbt_semantic_interfaces.references import SemanticModelElementReference
from dbt_semantic_interfaces.type_enums.time_granularity import TimeGranularity
from metricflow_semantics.aggregation_properties import AggregationState
from metricflow_semantics.instances import (
InstanceSet,
Expand Down Expand Up @@ -38,7 +39,7 @@

def test_no_parent_node_data_set(
simple_semantic_manifest_lookup: SemanticManifestLookup,
time_spine_source: TimeSpineSource,
time_spine_sources: Mapping[TimeGranularity, TimeSpineSource],
) -> None:
"""Tests getting the data set from a single node."""
resolver: DataflowPlanNodeOutputDataSetResolver = DataflowPlanNodeOutputDataSetResolver(
Expand Down Expand Up @@ -93,7 +94,7 @@ def test_joined_node_data_set(
mf_test_configuration: MetricFlowTestConfiguration,
mf_engine_test_fixture_mapping: Mapping[SemanticManifestSetup, MetricFlowEngineTestFixture],
simple_semantic_manifest_lookup: SemanticManifestLookup,
time_spine_source: TimeSpineSource,
time_spine_sources: Mapping[TimeGranularity, TimeSpineSource],
) -> None:
"""Tests getting the data set from a dataflow plan with a join."""
resolver: DataflowPlanNodeOutputDataSetResolver = DataflowPlanNodeOutputDataSetResolver(
Expand Down
26 changes: 23 additions & 3 deletions tests_metricflow/fixtures/dataflow_fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from typing import Mapping

import pytest
from dbt_semantic_interfaces.type_enums.time_granularity import TimeGranularity
from metricflow_semantics.query.query_parser import MetricFlowQueryParser
from metricflow_semantics.specs.column_assoc import ColumnAssociationResolver
from metricflow_semantics.test_helpers.config_helpers import MetricFlowTestConfiguration
Expand Down Expand Up @@ -90,7 +91,26 @@ def scd_query_parser( # noqa: D103


@pytest.fixture(scope="session")
def time_spine_source( # noqa: D103
def time_spine_sources( # noqa: D103
sql_client: SqlClient, mf_test_configuration: MetricFlowTestConfiguration # noqa: F811
) -> TimeSpineSource:
return TimeSpineSource(schema_name=mf_test_configuration.mf_source_schema, table_name="mf_time_spine")
) -> Mapping[TimeGranularity, TimeSpineSource]:
legacy_time_spine_grain = TimeGranularity.DAY
time_spine_base_table_name = "mf_time_spine"
print("expected schema name:", mf_test_configuration.mf_source_schema)
# Legacy time spine
time_spine_sources = {
legacy_time_spine_grain: TimeSpineSource(
schema_name=mf_test_configuration.mf_source_schema, table_name=time_spine_base_table_name
)
}
Comment on lines +101 to +105
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, does this mean that we're no longer triggering the fallback behavior in the runtime, because we're overriding it with an explicit time spine input?

What happens if I define an HOUR grain time spine but leave the DAY grain one in the original configuration? Does that fail unceremoniously, do we raise an informative error, or do we just use the HOUR spine for everything?

I'm actually fine with raising an informative error or using the HOUR spine, especially for now, I just realized I'm just not clear on what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy time spine config will only be included in your manifest if you have the metricflow_time_spine model in your project. It will be overridden if you add configs pointing to a time spine with DAY. The legacy time spine will be treated like any other time spine you have configured.
If you have a new time spine with HOUR grain in addition to the legacy DAY time spine, we'll use the DAY time spine if you query something with DAY or higher. In that case we would only use the HOUR time spine if you query with HOUR. We'll choose the time spine with the largest grain available that can resolve the requested grain.

# Current time spines
for granularity in TimeGranularity:
if granularity.to_int() < legacy_time_spine_grain.to_int():
time_spine_sources[granularity] = TimeSpineSource(
schema_name=mf_test_configuration.mf_source_schema,
table_name=f"{time_spine_base_table_name}_{granularity.value}",
time_column_name="ts",
time_column_granularity=granularity,
)

return time_spine_sources
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
table_snapshot:
table_name: mf_time_spine_hour
column_definitions:
- name: ts
type: TIME
rows:
- ["2020-01-01 01:00:00"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I just realized, are we going to date_trunc the time spine input to the specified grain? I'm pretty sure we don't do it today, but there's a type for it (DATE). The spec calls for the end user to configure that correctly, so I'm inclined not to date_trunc right now, but it might be something we need to do.

Presumably most people are using packages to build these things so maybe we just rely on that. If we're worried but not very worried about this we could also set up a best-effort warehouse validation, or release a validation package for time spine models that people can use if they wish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't apply that DATE_TRUNC in JoinToTimeSpineNode or JoinOverTimeRangeNode, But we do in ReadSqlSourceNode & MetricTimeDimensionTransformNode. This feature doesn't change that behavior so far, but we could discuss if we want to change it.
I think the warehouse validations will be a good idea so we can have more efficient queries.

- ["2020-01-01 02:00:00"]
- ["2020-01-01 03:00:00"]
- ["2020-01-01 04:00:00"]
- ["2020-01-01 05:00:00"]
- ["2020-01-01 06:00:00"]
- ["2020-01-01 07:00:00"]
- ["2020-01-01 08:00:00"]
- ["2020-01-01 09:00:00"]
- ["2020-01-01 010:00:00"]
- ["2020-01-01 11:00:00"]
- ["2020-01-01 12:00:00"]
- ["2020-01-02 01:00:00"]
- ["2020-01-02 02:00:00"]
- ["2020-01-02 03:00:00"]
- ["2020-01-02 04:00:00"]
- ["2020-01-02 05:00:00"]
- ["2020-01-02 06:00:00"]
- ["2020-01-02 07:00:00"]
- ["2020-01-02 08:00:00"]
- ["2020-01-02 09:00:00"]
- ["2020-01-02 010:00:00"]
- ["2020-01-02 11:00:00"]
- ["2020-01-02 12:00:00"]
- ["2020-01-03 01:00:00"]
- ["2020-01-03 02:00:00"]
- ["2020-01-03 03:00:00"]
- ["2020-01-03 04:00:00"]
- ["2020-01-03 05:00:00"]
- ["2020-01-03 06:00:00"]
- ["2020-01-03 07:00:00"]
- ["2020-01-03 08:00:00"]
- ["2020-01-03 09:00:00"]
- ["2020-01-03 010:00:00"]
- ["2020-01-03 11:00:00"]
- ["2020-01-03 12:00:00"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
table_snapshot:
table_name: mf_time_spine_microsecond
column_definitions:
- name: ts
type: TIME
rows:
- ["2020-01-01 00:00:00.000000"]
- ["2020-01-01 00:00:00.000001"]
- ["2020-01-01 00:00:00.000002"]
- ["2020-01-01 00:00:00.000003"]
- ["2020-01-01 00:00:00.000004"]
- ["2020-01-01 00:00:00.000005"]
- ["2020-01-01 00:00:00.000006"]
- ["2020-01-01 00:00:00.000007"]
- ["2020-01-01 00:00:00.000008"]
- ["2020-01-01 00:00:00.000009"]
- ["2020-01-01 00:00:00.000010"]
- ["2020-01-01 00:00:00.000011"]
- ["2020-01-01 00:00:00.000012"]
- ["2020-01-01 00:00:00.000013"]
- ["2020-01-01 00:00:00.000014"]
- ["2020-01-01 00:00:00.000015"]
- ["2020-01-01 00:00:00.000016"]
- ["2020-01-01 00:00:00.000017"]
- ["2020-01-01 00:00:00.000018"]
- ["2020-01-01 00:00:00.000019"]
- ["2020-01-01 00:00:00.000020"]
- ["2020-01-01 00:00:00.000021"]
- ["2020-01-01 00:00:00.000022"]
- ["2020-01-01 00:00:00.000023"]
- ["2020-01-01 00:00:00.000024"]
- ["2020-01-01 00:00:00.000025"]
- ["2020-01-01 00:00:00.000026"]
- ["2020-01-01 00:00:00.000027"]
- ["2020-01-01 00:00:00.000028"]
- ["2020-01-01 00:00:00.000029"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want values from multiple days, even though they won't be contiguous in the input.

I'm not sure if this really matters but I'm always wary of having test data pegged to a boundary (in this case, a year boundary).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - I can update that tomorrow! Shouldn't impact any of the tests, just will need to repopulate the source schemas.

- ["2020-01-01 00:00:00.000030"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
table_snapshot:
table_name: mf_time_spine_millisecond
column_definitions:
- name: ts
type: TIME
rows:
- ["2020-01-01 00:00:00.001"]
- ["2020-01-01 00:00:00.002"]
- ["2020-01-01 00:00:00.003"]
- ["2020-01-01 00:00:00.004"]
- ["2020-01-01 00:00:00.005"]
- ["2020-01-01 00:00:00.006"]
- ["2020-01-01 00:00:00.007"]
- ["2020-01-01 00:00:00.008"]
- ["2020-01-01 00:00:00.009"]
- ["2020-01-01 00:00:00.010"]
- ["2020-01-01 00:00:00.011"]
- ["2020-01-01 00:00:00.012"]
- ["2020-01-01 00:00:00.013"]
- ["2020-01-01 00:00:00.014"]
- ["2020-01-01 00:00:00.015"]
- ["2020-01-01 00:00:00.016"]
- ["2020-01-01 00:00:00.017"]
- ["2020-01-01 00:00:00.018"]
- ["2020-01-01 00:00:00.019"]
- ["2020-01-01 00:00:00.020"]
- ["2020-01-01 00:00:00.021"]
- ["2020-01-01 00:00:00.022"]
- ["2020-01-01 00:00:00.023"]
- ["2020-01-01 00:00:00.024"]
- ["2020-01-01 00:00:00.025"]
- ["2020-01-01 00:00:00.026"]
- ["2020-01-01 00:00:00.027"]
- ["2020-01-01 00:00:00.028"]
- ["2020-01-01 00:00:00.029"]
- ["2020-01-01 00:00:00.030"]
Loading
Loading