Enable custom granularity inputs for filters #346

tlento · 2024-09-09T05:18:25Z

We are enabling support for custom granularity, which means
users will need to be able to reference them in where filters and
other contexts where their input representations of their manifest
elements will need to be parsed.

Historically, we have restricted time granularities to an enumerated
set of standard supported grains. With the addition of custom
granularities we must now admit the possibility of arbitrary string
names representing valid time granularities.

This change updates the interfaces for parsing time dimensions out
of filter expressions to allow for arbitrary string inputs representing
custom granularity names.

For various reasons - mainly having to do with the filter parser
not having access to the semantic manifest - we will NOT be able
to support custom granularity references via dundered string names. So
something like metric_time__martian_year will not work, but
the explicit syntax of Dimension('metric_time').grain('martian_year')
will be accepted and processed.

Resolves #345

We are enabling support for custom granularity, which means users will need to be able to reference them in where filters and other contexts where their input representations of their manifest elements will need to be parsed. Historically, we have restricted time granularities to an enumerated set of standard supported grains. With the addition of custom granularities we must now admit the possibility of arbitrary string names representing valid time granularities. This change updates the interfaces for parsing time dimensions out of filter expressions to allow for arbitrary string inputs representing custom granularity names. For various reasons - mainly having to do with the filter parser not having access to the semantic manifest - we will NOT be able to support custom granularity references via dundered string names. So something like `metric_time__martian_year` will not work, but the explicit syntax of `Dimension('metric_time').grain('martian_year')` will be accepted and processed.

DevonFulcher · 2024-09-09T20:31:11Z

dbt_semantic_interfaces/parsing/text_input/ti_description.py

-                    raise InvalidQuerySyntax(
-                        f"{self.time_granularity_name!r} is not a valid time granularity. Valid values are"
-                        f" {valid_time_granularity_names}"
-                    )


I am not very familiar with custom time granularities. However, it seems like we could still validate that the time granularity used at query time matches a custom one defined by the analytics engineer, right?

I would love to do this too, is it too complex based on the need to pass around the manifest lookup @tlento?

I would also like validation here. I dug into this further and the tl,dr; is:

This is sufficiently difficult to do from within DSI that I don't think it's worth it.

I might be wrong, so if either of you have suggestions for an approach that I'm missing please let me know!

Details below.....

This class stack is complicated in a way that makes this look very self-contained when it's actually exposed directly underneath the external protocol interface for the WhereFilter objects themselves. Specifically, this class is initialized in the ObjectBuildingRenderingHelper, which is initialized in the ObjectBuilderTextProcessor, which is initialized in the WhereFilterParser.parse_item_descriptions method. That method is used in the call_parameter_sets property inside of the PydanticWhereFilter, which is the implementation of the WhereFilter protocol for the semantic layer spec. The WhereFilter protocol requires the call_parameter_sets property, and marks it as a read-only property, meaning the signature does not allow for inputs.

So the only way to add a mechanism for doing this check from within DSI is to either:

Alter the WhereFilter protocol to incorporate a custom granularities attribute

Update the interface for extracting call parameter sets to allow for passing a list of granularities into the method so we can thread it through to this point in the call stack

Option 1 won't allow us to do an effective validation check at this level without restructuring our entire parsing process - we'd basically need to do a parsing pass to extract the custom granularity set and then do a second pass where we make use of that list. This is likely incompatible with the parsing implementation in dbt-core, although I admit I have not looked.

Option 2 is a breaking change because we'd need to change the WhereFilter and WhereFilterIntersection protocols. We could remove the call_parameter_sets property and have some external component derive that set of objects. This is awkward since the external component would need to implicitly depend on every WhereFilter's string where_sql_template property having the same formatting. We could also change the call_parameter_sets property to a method that accepts an optional list of custom time granularities. That affects every call site against call_parameter_sets and effectively requires that we thread a list of granularity names through each time we access it. This is theoretically possible but it might become quite ugly.

Is it worth all of the effort to alter the call_parameter_sets interface? Today, we basically parse where filters in two circumstances:

During yaml parse time, where we only really need to use them for direct extraction of element references and validation.

At query time, when we generally have access to MetricFlow (or, at least, to a SemanticManifestLookup) and can do more robust processing around the call_parameter_sets we extract to ensure custom granularities are properly specified. We can do this in terms of both the name reference and compatibility with requested metric_time grains and whatever else, which is a validation we have to do at query time anyway.

This means, effectively, that we can do nearly as good a job of validation in post-processing without changing the interface. The drawback is we cannot accept a where filter reference of the form Dimension('shuttle__launch_time__martian_day') but I don't know if we really allow where filters like Dimension('shuttle_launch_time__day') today anyway. I always thought we'd throw an error somewhere or other in that case, but it may be that it works. 🤷

Assuming I'm not missing anything here I'm inclined to keep this PR as-is - it's just not clear to me that the added validation on what is ultimately an arbitrary string name is worth the weight it'd add to the interface. But then I might be missing something, so please let me know if either of you have got any ideas!

Oh I guess maybe we do allow Dimension('entity__dimension__granularity') since we rely on the call parameter sets in most places. This is very confusing, but I'm comfortable saying that only works with standard granularities for now, and we should really get rid of all this jinja stuff in the medium to long term.

Ok I'm fine with not validating for now and merging as-is, but will put up a task for me to investigate this later and see if I can come up with a non-breaking solution.

DevonFulcher · 2024-09-09T20:37:13Z

dbt_semantic_interfaces/parsing/where_filter/parameter_set_factory.py

+        grain_parsed_from_name = (
+            group_by_item_name.time_granularity.value if group_by_item_name.time_granularity else None
+        )
+        grain_from_param = time_granularity_name


Is this line necessary? Why not just use time_granularity_name instead of using a different variable with the same value?

It is not necessary but I found this more readable due to the assertion comparison -> coalesce -> type refinement flow we have to go through here.

I have an idea for us to get the best of both worlds, I'll update separately.

DevonFulcher · 2024-09-09T20:37:29Z

dbt_semantic_interfaces/parsing/where_filter/parameter_set_factory.py

+        grain_parsed_from_name = (
+            group_by_item_name.time_granularity.value if group_by_item_name.time_granularity else None
+        )
+        grain_from_param = time_granularity_name


Is this line necessary? Why not just use time_granularity_name instead of using a different variable with the same value?

DevonFulcher · 2024-09-09T21:01:11Z

dbt_semantic_interfaces/parsing/where_filter/parameter_set_factory.py

+        grain_parsed_from_name = (
+            group_by_item_name.time_granularity.value if group_by_item_name.time_granularity else None
+        )
+        grain_from_param = time_granularity_name


Is this line necessary? Why not just use time_granularity_name instead of using a different variable with the same value?

courtneyholcomb

I might not have been clear about this in discussion, but I think it would be a much better user experience if we actually could pass the granularity name via dundered syntax. We would need to pass the manifest lookup around, which is annoying. But it should be possible since we're enforcing name uniqueness, right?

courtneyholcomb · 2024-09-10T16:05:24Z

dbt_semantic_interfaces/parsing/text_input/ti_description.py

-                    raise InvalidQuerySyntax(
-                        f"{self.time_granularity_name!r} is not a valid time granularity. Valid values are"
-                        f" {valid_time_granularity_names}"
-                    )


I would love to do this too, is it too complex based on the need to pass around the manifest lookup @tlento?

courtneyholcomb · 2024-09-10T16:06:43Z

dbt_semantic_interfaces/parsing/where_filter/parameter_set_factory.py

-    """Creates parameter sets for use in the Jinja sandbox."""
+    """Creates parameter sets for use in the Jinja sandbox.
+
+    This is ONLY to be used for parsing where filters, and then only for the purposes of extracting


Is this correct? We also parse the jinja in 1) the JDBC and 2) saved query group by inputs. I believe this is used for at least the saved query group by inputs for some parsing in core, maybe related to node dependencies. The JDBC might use the MF logic, but I'm not positive.

All of those callsites have to be updated in a custom granularity world because this component asserted that we could always reliably determine whether or not <arbitrary_string> refers to a granularity, and that isn't the case anymore. I'll update the docstring to be less prescriptive and more detailed in terms of where this might go wrong, as I think that's more useful in general.

DevonFulcher

LGTM!

courtneyholcomb · 2024-09-16T18:11:46Z

I might not have been clear about this in discussion, but I think it would be a much better user experience if we actually could pass the granularity name via dundered syntax. We would need to pass the manifest lookup around, which is annoying. But it should be possible since we're enforcing name uniqueness, right?

@tlento - reraising my comment here. I would like for us to be able to pass custom grains via dundered syntax, but is that limited by the same issues raised in the other thread on this PR?
Not blocking for merging this. I can include this note in the investigation task I'm putting up for myself.

courtneyholcomb

Mentioned some follow ups I want to do but not blocking!

tlento · 2024-09-16T19:30:49Z

I would like for us to be able to pass custom grains via dundered syntax, but is that limited by the same issues raised in the other thread on this PR?

It's the same set of issues, and I think "limited" is the correct term - it's not impossible in all cases, but it's very difficult to do in a reasonable way from within this component inside DSI.

We can support the dundered syntax for query parameters today - even where filters - we'd just need to make sure whatever parses them does a conversion from dundered input to what the DSI parser supports.

The hard part is the manifest - filter parameters stored in the YAML can show up in any metric, and that means we'd need to have the list of custom grains before we try to do any parsing unless we change the interface requirements and remove all invocations of call_parameter_sets from our yaml parsing steps.

I think it would be a much better user experience if we actually could pass the granularity name via dundered syntax. We would need to pass the manifest lookup around, which is annoying. But it should be possible since we're enforcing name uniqueness, right?

I'd avoid the manifest lookup. All we need is the set of custom granularity names.

I suggest pushing a breaking change to this API and changing call_parameter_sets from a property to a function and passing the set of custom granularity constructs in at that point. This only works if none of our YAML parsers use call_parameter_sets in its property format, which I suspect is the case but I'm not 100% sure. I'd use the equivalent of the "ExpandedTimeGranularity" struct instead of string names, because adding that is potentially useful to us in other ways.

One issue with the breaking change - dbt-core 1.9.x is going to roll to beta soon. I'm not sure if it's worth the risk to try to jam this in right before those cutoffs. Worst case we can do a 2-stage shift, where we keep the property but have it only work for standard grains, add a new function to the protocol, cut everything calling it over to the function, and then remove the property. That defers the "breaking change" and allows us to add the new functionality in cloud even if we miss the core 1.9 cut. Source available users of MetricFlow won't be able to use custom calendar with the dunder syntax until 1.10, but they'll still be able to use the feature as long as they always specify the grain explicitly instead of relying on dunder magic.

tlento requested review from QMalcolm, graciegoheen and a team as code owners September 9, 2024 05:18

cla-bot bot added the cla:yes label Sep 9, 2024

tlento requested review from courtneyholcomb, plypaul and DevonFulcher September 9, 2024 05:18

DevonFulcher reviewed Sep 9, 2024

View reviewed changes

courtneyholcomb reviewed Sep 10, 2024

View reviewed changes

tlento added 2 commits September 13, 2024 14:42

Improve docstring for ParameterSetFactory

c9abc2a

Clean up granularity input matching logic

a16495a

tlento requested review from DevonFulcher and courtneyholcomb September 13, 2024 21:42

DevonFulcher approved these changes Sep 16, 2024

View reviewed changes

courtneyholcomb approved these changes Sep 16, 2024

View reviewed changes

tlento mentioned this pull request Sep 18, 2024

Remove call_parameter_sets accessors from WhereFilter and WhereFilterIntersection #348

Open

tlento merged commit ff37d9f into main Sep 18, 2024
22 checks passed

tlento deleted the enable-custom-grain-query-inputs-for-filters branch September 18, 2024 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable custom granularity inputs for filters #346

Enable custom granularity inputs for filters #346

tlento commented Sep 9, 2024 •

edited

Loading

DevonFulcher Sep 9, 2024

courtneyholcomb Sep 10, 2024

tlento Sep 11, 2024

tlento Sep 13, 2024

courtneyholcomb Sep 16, 2024

DevonFulcher Sep 9, 2024

tlento Sep 11, 2024

DevonFulcher Sep 9, 2024

DevonFulcher Sep 9, 2024

courtneyholcomb left a comment

courtneyholcomb Sep 10, 2024

courtneyholcomb Sep 10, 2024

tlento Sep 11, 2024

DevonFulcher left a comment

courtneyholcomb commented Sep 16, 2024

courtneyholcomb left a comment

tlento commented Sep 16, 2024 •

edited

Loading

Enable custom granularity inputs for filters #346

Enable custom granularity inputs for filters #346

Conversation

tlento commented Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DevonFulcher left a comment

Choose a reason for hiding this comment

courtneyholcomb commented Sep 16, 2024

courtneyholcomb left a comment

Choose a reason for hiding this comment

tlento commented Sep 16, 2024 • edited Loading

tlento commented Sep 9, 2024 •

edited

Loading

tlento commented Sep 16, 2024 •

edited

Loading