Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JoinToTimeSpineNode bug fix #1541

Merged
merged 6 commits into from
Dec 9, 2024
Merged

JoinToTimeSpineNode bug fix #1541

merged 6 commits into from
Dec 9, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Nov 20, 2024

We had previously made a product decision about the behavior of the JoinToTimeSpineNode that we later decided was not correct. This is specifically related to what happens if you query with both metric_time and the agg_time_dimension (e.g., booking__ds). If you query with just one or the other, we would use the time spine to fulfill that dimension. But if you queried with both, there was different behavior.

Previous behavior when querying with both:

  • Treat metric_time as the time spine dimension and fulfill those values with a time spine column
  • Treat agg_time_dimension as any other dimension, and fulfill those values from the semantic model source table

This behavior felt inconsistent, since you might get different values when querying the agg_time_dimension depending on if you included metric_time in the query or not. This could be especially confusing if you included agg_time_dimension only in the group by, and metric_time in the where filter.

New behavior when querying with both:

  • Treat metric_time and agg_time_dimension the same. Fulfill both dimensions with values from the time spine.

PR notes:

  • I recommend reviewing by commit.
  • I tried to isolate the changes related to the bug fix, but it was not as easy as I'd hoped. Hopefully it isn't too difficult to understand the changes, but let me know if you want to sync to discuss.
  • There are a bunch of SQL changes here, but only a few are in optimized snapshots. The optimized ones more clearly demonstrate the actual behavior change.

@courtneyholcomb courtneyholcomb changed the title Court/simp3 Simplify JoinToTimeSpineNode Nov 21, 2024
@courtneyholcomb courtneyholcomb changed the base branch from main to court/simp2 November 21, 2024 02:42
@courtneyholcomb courtneyholcomb changed the title Simplify JoinToTimeSpineNode JoinToTimeSpineNode bug fix Nov 21, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Nov 21, 2024
@@ -646,9 +646,6 @@ def _build_derived_metric_output_node(
queried_agg_time_dimension_specs = queried_linkable_specs.included_agg_time_dimension_specs_for_metric(
metric_reference=metric_spec.reference, metric_lookup=self._metric_lookup
)
assert (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is handled in the __post_init__ for JoinToTimeSpineNode, so I removed the duplication.

):
agg_time_dimension_instances.append(instance)
# Select the dimension for the join from the parent node because it may not have been included in the request.
# Default to using metric_time for the join if it was requested, otherwise use the agg_time_dimension.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the one place we still have some differentiation between metric_time and agg_time. If you request both, we have to choose one of them to use as the join column, and we default to using metric_time.

subq_12.ds AS metric_time__day
, DATE_TRUNC('month', subq_12.ds) AS metric_time__month
DATE_TRUNC('month', subq_12.ds) AS metric_time__month
, subq_10.metric_time__day AS metric_time__day
Copy link
Contributor Author

@courtneyholcomb courtneyholcomb Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might notice on this line we're selecting metric_time__day from the parent instead of the time spine. This is only true for columns used in the join and never anywhere else, and it is fine because the column should be equivalent to the time spine column since it's the one used in the join.
This is true in a couple of snapshots. It's not a problem functionally but might be confusing to some users reading the SQL. This gets fixed farther up the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of the fix in the next PR here: c6d6323#r1852366470

@courtneyholcomb courtneyholcomb marked this pull request as ready for review November 21, 2024 15:13
@courtneyholcomb courtneyholcomb added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Skip Changelog labels Nov 21, 2024
@dbt-labs dbt-labs deleted a comment from github-actions bot Nov 21, 2024
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 21, 2024
@@ -244,3 +244,16 @@ def with_entity_prefix(self, entity_prefix: EntityReference) -> TimeDimensionSpe
date_part=self.date_part,
aggregation_state=self.aggregation_state,
)

@staticmethod
def with_base_grains(time_dimension_specs: Sequence[TimeDimensionSpec]) -> List[TimeDimensionSpec]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General preference has been to return immutable types, or a more abstract collection type. Similar to https://docs.python.org/3/library/typing.html#typing.List

@@ -145,6 +145,12 @@ def with_entity_prefix(
spec=transformed_spec,
)

def with_new_defined_from(self, defined_from: Tuple[SemanticModelElementReference, ...]) -> TimeDimensionInstance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@cla-bot cla-bot bot added the cla:yes label Dec 9, 2024
Base automatically changed from court/simp2 to main December 9, 2024 19:27
…grains

This does not change any query behavior, but will allow us to remove
 this logic from the SQL conversion logic, which is important for future commits.
These snapshot changes primarily show changes from the bug fix mentioned in the last commit. The optimized snapshots show the actual behavior change.
@courtneyholcomb courtneyholcomb enabled auto-merge (squash) December 9, 2024 19:41
@courtneyholcomb courtneyholcomb merged commit 385d3cf into main Dec 9, 2024
11 checks passed
@courtneyholcomb courtneyholcomb deleted the court/simp3 branch December 9, 2024 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants