Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce time granularity to configured value in all cases #797

Merged
merged 2 commits into from
Oct 6, 2023

Conversation

tlento
Copy link
Contributor

@tlento tlento commented Oct 5, 2023

MetricFlow's initial time dimension select statement rendering
was built on the assumption that the granularity configred for a given
time dimension would always match the granularity of the values stored.
For example, if a time dimension was defined with granularity DAY, we
assumed that the values in the warehouse would always correspond to
daily granularity data, and would never be offset from a day boundary.

This assumption is flawed for a variety of reasons - yearly or quarterly
granularities might use different offset boundaries, users may
misconfigure dimensions or allow offset values into the dataset which
could be safely corrected, etc.

This change updates our rendering to apply the date_trunc operation on
the base time dimension column, just as we do for the other, coarser
granularities. The one exception is for the base granularity column
for validity window time dimensions, which must retain their underlying
granularity in order to remain viable as validity window boundaries
for any DAILY granularity contexts, as we do not support smaller than
daily granularities at this time.

Note this may cause queries with start and end time constraints to
no longer trigger partition pruning, an issue which will be addressed
in subsequent updates.

The time dimension conversion logic inside of our semantic model
to data set converter class has expanded to the point where it's
difficult to reason through where or how to best make changes.

In order to at least keep added layers of logic isolated to the
specific task of converting time dimensions, this commit moves
that logic into a separate private method.
@cla-bot cla-bot bot added the cla:yes label Oct 5, 2023
Copy link
Contributor Author

tlento commented Oct 5, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS October 5, 2023 18:47 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS October 5, 2023 18:47 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS October 5, 2023 18:47 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS October 5, 2023 18:47 — with GitHub Actions Inactive
@tlento tlento temporarily deployed to DW_INTEGRATION_TESTS October 5, 2023 18:47 — with GitHub Actions Inactive
@tlento tlento linked an issue Oct 5, 2023 that may be closed by this pull request
2 tasks
MetricFlow's initial time dimension select statement rendering
was built on the assumption that the granularity configred for a given
time dimension would always match the granularity of the values stored.
For example, if a time dimension was defined with granularity DAY, we
assumed that the values in the warehouse would always correspond to
daily granularity data, and would never be offset from a day boundary.

This assumption is flawed for a variety of reasons - yearly or quarterly
granularities might use different offset boundaries, users may
misconfigure dimensions or allow offset values into the dataset which
could be safely corrected, etc.

This change updates our rendering to apply the date_trunc operation on
the base time dimension column, just as we do for the other, coarser
granularities. The one exception is for the base granularity column
for validity window time dimensions, which must retain their underlying
granularity in order to remain viable as validity window boundaries
for any DAILY granularity contexts, as we do not support smaller than
daily granularities at this time.

Note this may cause queries with start and end time constraints to
no longer trigger partition pruning, an issue which will be addressed
in subsequent updates.
@tlento tlento force-pushed the coerce-date-time-granularities branch from ef76ee3 to 6260b7a Compare October 5, 2023 19:47
@tlento
Copy link
Contributor Author

tlento commented Oct 5, 2023

@tlento tlento requested a review from plypaul October 5, 2023 19:48
Base automatically changed from clean-up-sql-time-interval-math-expression to main October 5, 2023 22:20
@tlento tlento merged commit acbeeef into main Oct 6, 2023
7 checks passed
@tlento tlento deleted the coerce-date-time-granularities branch October 6, 2023 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] MetricFlow does not coerce specified time granularity in all cases
2 participants