-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable custom granularity inputs for filters #346
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
kind: Features | ||
body: Allow filter parameters to accept custom granularities | ||
time: 2024-09-08T22:17:57.865557-07:00 | ||
custom: | ||
Author: tlento | ||
Issue: "345" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,12 +16,19 @@ | |
MetricReference, | ||
TimeDimensionReference, | ||
) | ||
from dbt_semantic_interfaces.type_enums import TimeGranularity | ||
from dbt_semantic_interfaces.type_enums.date_part import DatePart | ||
|
||
|
||
class ParameterSetFactory: | ||
"""Creates parameter sets for use in the Jinja sandbox.""" | ||
"""Creates parameter sets for use in the Jinja sandbox. | ||
|
||
This is ONLY to be used for parsing where filters, and then only for the purposes of extracting | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this correct? We also parse the jinja in 1) the JDBC and 2) saved query group by inputs. I believe this is used for at least the saved query group by inputs for some parsing in core, maybe related to node dependencies. The JDBC might use the MF logic, but I'm not positive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of those callsites have to be updated in a custom granularity world because this component asserted that we could always reliably determine whether or not <arbitrary_string> refers to a granularity, and that isn't the case anymore. I'll update the docstring to be less prescriptive and more detailed in terms of where this might go wrong, as I think that's more useful in general. |
||
some basic information about which elements are being accessed in the filter expression in the | ||
small number of contexts where a more complete semantic layer implementation is not available. | ||
|
||
In practice, today, this is used by the dbt core parser, which cannot take on a MetricFlow | ||
dependency, in order to provide some DAG annotations around elements referenced in where filters. | ||
""" | ||
|
||
@staticmethod | ||
def _exception_message_for_incorrect_format(element_name: str) -> str: | ||
|
@@ -37,26 +44,46 @@ def create_time_dimension( | |
entity_path: Sequence[str] = (), | ||
date_part_name: Optional[str] = None, | ||
) -> TimeDimensionCallParameterSet: | ||
"""Gets called by Jinja when rendering {{ TimeDimension(...) }}.""" | ||
"""Gets called by Jinja when rendering {{ TimeDimension(...) }}. | ||
|
||
There is a lot of strangeness around the time granularity specification here. Historically, | ||
we accepted time dimension names of the form `metric_time__week` or `customer__registration_date__month` | ||
in this interface. We have not yet fully deprecated this, and it's unclear if we ever will. | ||
|
||
Key points to note: | ||
1. The time dimension name parsing only accepts standard time granularities. This will not change. | ||
2. The time granularity parameter is what we want everybody to use because it's more explicit. | ||
3. The time granularity parameter will support custom granularities, so that's nice | ||
|
||
While this all may seem pretty bad it's not as terrible as all that - this class is only used | ||
for parsing where filters. When we solve the problems with our current where filter spec this will | ||
persist as a backwards compatibility model, but nothing more. | ||
""" | ||
group_by_item_name = DunderedNameFormatter.parse_name(time_dimension_name) | ||
if len(group_by_item_name.entity_links) != 1 and not is_metric_time_name(group_by_item_name.element_name): | ||
raise ParseWhereFilterException( | ||
ParameterSetFactory._exception_message_for_incorrect_format(time_dimension_name) | ||
) | ||
grain_parsed_from_name = group_by_item_name.time_granularity | ||
grain_from_param = TimeGranularity(time_granularity_name) if time_granularity_name else None | ||
grain_parsed_from_name = ( | ||
group_by_item_name.time_granularity.value if group_by_item_name.time_granularity else None | ||
) | ||
grain_from_param = time_granularity_name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this line necessary? Why not just use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not necessary but I found this more readable due to the assertion comparison -> coalesce -> type refinement flow we have to go through here. I have an idea for us to get the best of both worlds, I'll update separately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this line necessary? Why not just use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this line necessary? Why not just use |
||
if grain_parsed_from_name and grain_from_param and grain_from_param != grain_parsed_from_name: | ||
raise ParseWhereFilterException( | ||
f"Received different grains in `time_dimension_name` parameter ('{time_dimension_name}') " | ||
f"and `time_granularity_name` parameter ('{time_granularity_name}')." | ||
f"and `time_granularity_name` parameter ('{time_granularity_name}'). Remove the grain suffix " | ||
f"(`{grain_parsed_from_name}`) from the time dimension name and use the `time_granularity_name` " | ||
"parameter to specify the intendend grain." | ||
) | ||
|
||
time_granularity_name = grain_parsed_from_name or grain_from_param | ||
|
||
return TimeDimensionCallParameterSet( | ||
time_dimension_reference=TimeDimensionReference(element_name=group_by_item_name.element_name), | ||
entity_path=( | ||
tuple(EntityReference(element_name=arg) for arg in entity_path) + group_by_item_name.entity_links | ||
), | ||
time_granularity=grain_parsed_from_name or grain_from_param, | ||
time_granularity_name=time_granularity_name.lower() if time_granularity_name else None, | ||
date_part=DatePart(date_part_name.lower()) if date_part_name else None, | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with custom time granularities. However, it seems like we could still validate that the time granularity used at query time matches a custom one defined by the analytics engineer, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to do this too, is it too complex based on the need to pass around the manifest lookup @tlento?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also like validation here. I dug into this further and the tl,dr; is:
Details below.....
This class stack is complicated in a way that makes this look very self-contained when it's actually exposed directly underneath the external protocol interface for the
WhereFilter
objects themselves. Specifically, this class is initialized in theObjectBuildingRenderingHelper
, which is initialized in theObjectBuilderTextProcessor
, which is initialized in theWhereFilterParser.parse_item_descriptions
method. That method is used in the call_parameter_sets property inside of thePydanticWhereFilter
, which is the implementation of theWhereFilter
protocol for the semantic layer spec. TheWhereFilter
protocol requires thecall_parameter_sets
property, and marks it as a read-only property, meaning the signature does not allow for inputs.So the only way to add a mechanism for doing this check from within DSI is to either:
Option 1 won't allow us to do an effective validation check at this level without restructuring our entire parsing process - we'd basically need to do a parsing pass to extract the custom granularity set and then do a second pass where we make use of that list. This is likely incompatible with the parsing implementation in dbt-core, although I admit I have not looked.
Option 2 is a breaking change because we'd need to change the
WhereFilter
andWhereFilterIntersection
protocols. We could remove the call_parameter_sets property and have some external component derive that set of objects. This is awkward since the external component would need to implicitly depend on every WhereFilter's stringwhere_sql_template
property having the same formatting. We could also change the call_parameter_sets property to a method that accepts an optional list of custom time granularities. That affects every call site against call_parameter_sets and effectively requires that we thread a list of granularity names through each time we access it. This is theoretically possible but it might become quite ugly.Is it worth all of the effort to alter the call_parameter_sets interface? Today, we basically parse where filters in two circumstances:
This means, effectively, that we can do nearly as good a job of validation in post-processing without changing the interface. The drawback is we cannot accept a where filter reference of the form
Dimension('shuttle__launch_time__martian_day')
but I don't know if we really allow where filters likeDimension('shuttle_launch_time__day')
today anyway. I always thought we'd throw an error somewhere or other in that case, but it may be that it works. 🤷Assuming I'm not missing anything here I'm inclined to keep this PR as-is - it's just not clear to me that the added validation on what is ultimately an arbitrary string name is worth the weight it'd add to the interface. But then I might be missing something, so please let me know if either of you have got any ideas!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I guess maybe we do allow
Dimension('entity__dimension__granularity')
since we rely on the call parameter sets in most places. This is very confusing, but I'm comfortable saying that only works with standard granularities for now, and we should really get rid of all this jinja stuff in the medium to long term.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I'm fine with not validating for now and merging as-is, but will put up a task for me to investigate this later and see if I can come up with a non-breaking solution.