Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Dimension.grain(...) in where/filter #152

Merged
merged 17 commits into from
Sep 20, 2023

Conversation

DevonFulcher
Copy link
Contributor

Resolves #SL-631 (internal to dbt Labs)

Description

The object of this PR is to allow for syntax like this in the where parameter & filter field {{ Dimension('metric_time').grain('WEEK') }} > 2023-09-18. To accomplish this, I refactored WhereFilterParser to use the QueryInterface* classes such as QueryInterfaceDimension. This has been done in MF already for WhereSpecFactory.

Now that MF, MFS, and DSI all implement the same QueryInterface* classes, there will be a lessened likelihood of drift between these codebases. I plan on removing the QueryInterface* classes from MF and having that codebase reference the protocols in the codebase instead.

Checklist

def grain(self, time_granularity: str) -> QueryInterfaceDimension:
"""The time granularity."""
self.time_granularity = TimeGranularity(time_granularity)
self._time_dimension_call_parameter_sets.append(
Copy link
Collaborator

@QMalcolm QMalcolm Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if I'm getting out pocket on this 🙃 I'm likely biased from my personal preference for functional programming, but I found chain of passing the pointer for time_dimension_factory.time_dimension_call_parameter_sets odd. Additionally, until I followed the chain I was confused as to why in the where_filter_parser dimension_parameter_sets were getting skipped unless they didn't have a time_granularity specified. That is to say it appeared that the dimension_parameter_sets with time_granularity specified were getting trashed. I understand how it works now, and it's elegant in an OOP way. I do worry it's more complex than it needs to be though and that it will be harder for those unfamiliar with the code to approach. Additionally, thinking more about it, a question in my mind came up of what would happen if someone did the following

{{ Dimension('<some_dimension>').grain('day').grain('week') }}

This chain wouldn't break on parsing and it would add the dimension to the time_dimension_call_parameter_sets twice, once with granularity DAY and once with granularity WEEK. Maybe that is what we want, but I don't think we do? My assumption here is that we'd actually just want it added once with granularity WEEK.

A way to simplify things and to solve the duplication problem would be to not have WhereFilterDimensions directly modify the time _dimension_call_parameter_sets. Instead when grain is called it could just generate a TimeDimensionCallParameterSet and store it on a self property (maybe something like self.time_dimension_call_parameter_set which is defaulted to None on instantiation of the WhereFilterDimsnion. Then in parse_call_parameter_sets when we iterate over dimension_factory.created we'd separate the created parameter sets into two lists, one of DimensionCallParameterSets and the other of TimeDimensionCallParameterSet. The latter of which would be joined to the time_dimension_factory.created in the return statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused as to why in the where_filter_parser dimension_parameter_sets were getting skipped unless they didn't have a time_granularity specified

I will add comments to this section and look for other opportunities to add comments.

{{ Dimension('<some_dimension>').grain('day').grain('week') }}

Good call. I hadn't considered that edge case.

A way simplify things and to solve the duplication problem...

Good idea. I had implemented something similar to this, but I wasn't sure if the order of time_dimension_call_parameter_sets mattered, so I refactored to this current approach. If we did what you are suggesting, then the TimeDimensionCallParameterSets that are generated from this syntax Dimension('metric_time').grain('day') would always be appended after the ones that are generated from the TimeDimension syntax. Paul mentioned that the order doesn't matter (but I had already completed the refactor), so I think I will go ahead and implement what you have suggested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah +1 to @QMalcolm here, looking forward to the update!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have addressed all of your concerns. Let me know if you would prefer a different approach.

Copy link
Collaborator

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I love the consolidation.

attention @plypaul this seems relevant to your work on saved queries.

def grain(self, time_granularity: str) -> QueryInterfaceDimension:
"""The time granularity."""
self.time_granularity = TimeGranularity(time_granularity)
self._time_dimension_call_parameter_sets.append(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah +1 to @QMalcolm here, looking forward to the update!

Comment on lines 31 to 53
time_dimension_factory = WhereFilterTimeDimensionFactory()
dimension_factory = WhereFilterDimensionFactory(time_dimension_factory.time_dimension_call_parameter_sets)
entity_factory = WhereFilterEntityFactory()

try:
# the string that the sandbox renders is unused
SandboxedEnvironment(undefined=StrictUndefined).from_string(where_sql_template).render(
Dimension=dimension_factory.create,
TimeDimension=time_dimension_factory.create,
Entity=entity_factory.create,
)
except (UndefinedError, TemplateSyntaxError, SecurityError) as e:
raise ParseWhereFilterException(f"Error while parsing Jinja template:\n{where_sql_template}") from e

dimension_parameter_sets = []
for dimension in dimension_factory.created:
if not dimension.time_granularity:
param_set = ParameterSetFactory.create_dimension(dimension.name, dimension.entity_path)
dimension_parameter_sets.append(param_set)

return FilterCallParameterSets(
dimension_call_parameter_sets=tuple(dimension_parameter_sets),
time_dimension_call_parameter_sets=tuple(time_dimension_factory.time_dimension_call_parameter_sets),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this reference passing and hidden update extensions be addressed by the approach @QMalcolm suggests? I think it will. If so that's pretty nice, because this is confusing - it looks as if we skip any time dimensions defined through the Dimension() syntax, but that isn't the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have fixed your concern, but please let me know if you prefer a different approach. I don't think there is a way to implement this without the mutable WhereFilterTimeDimensionFactory.time_dimension_call_parameter_sets, WhereFilterEntityFactory.entity_call_parameter_sets, & WhereFilterDimensionFactory.created properties given the protocols that we want to implement.

Copy link
Collaborator

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work as a whole and the updates! Looks good to me 🙂

@DevonFulcher DevonFulcher merged commit 8c00508 into main Sep 20, 2023
9 checks passed
@DevonFulcher DevonFulcher deleted the dimension_dot_grain_support branch September 20, 2023 23:39
Copy link
Collaborator

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I think the new logic is easier to follow too!

@@ -16,7 +16,7 @@


class EntityStub(ProtocolHint[QueryInterfaceEntity]):
"""An Entity implementation that does nothing to satisfy the protocol.
"""An Entity implementation that just satisfies the protocol.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. 😁

@DevonFulcher DevonFulcher restored the dimension_dot_grain_support branch September 29, 2023 15:00
@DevonFulcher DevonFulcher deleted the dimension_dot_grain_support branch September 29, 2023 15:01
DevonFulcher added a commit that referenced this pull request Oct 3, 2023
Due to MetricFlow's reliance on version 0.2.x of DSI, the updates to support Dimension(...).grain(...) need to be made to this branch as well. I just combined the work from these PRs into this one #156, #154, #152
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants