Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpecPattern Classes to Model Ambiguous Group-By-Item Inputs #888

Merged
merged 1 commit into from
Nov 27, 2023

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Nov 18, 2023

Description

This PR implements the pattern classes for ambiguous group-by-items as described in https://github.com/dbt-labs/metricflow/pull/new/plypaul--58.7--spec-pattern.

@cla-bot cla-bot bot added the cla:yes label Nov 18, 2023
@plypaul plypaul force-pushed the plypaul--58.6--time-dimension-spec-comparison branch 2 times, most recently from f4c20db to 86b41cc Compare November 20, 2023 18:29
Base automatically changed from plypaul--58.6--time-dimension-spec-comparison to main November 20, 2023 18:40
@plypaul plypaul force-pushed the plypaul--58.7--spec-pattern branch 2 times, most recently from f654067 to aba93d0 Compare November 21, 2023 19:05
@plypaul plypaul marked this pull request as ready for review November 21, 2023 19:09
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable. See inlines for small improvement requests.

Going forward, please do write out brief commit messages and PR descriptions explaining what is changing and why - pointing to issue links is more indirection for readers to wade through when they often just need a local summary.

Comment on lines 703 to 720
def transform( # noqa: D
self, transform_function: LinkableSpecSetTransform[LinkableSpecSetTransformOutputT]
) -> LinkableSpecSetTransformOutputT:
return transform_function.transform(self)


class LinkableSpecSetTransform(Generic[LinkableSpecSetTransformOutputT], ABC):
"""Function to use for transforming LinkableSpecSets.

Similar to InstanceSpecSetTransform, but for LinkableSpecSets. TBD: There's a lot of similarity between
InstanceSpecSet and LinkableSpecSet, so there's probably a better class structure so that we don't need to duplicate
this.
"""

@abstractmethod
def transform(self, spec_set: LinkableSpecSet) -> LinkableSpecSetTransformOutputT: # noqa: D
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine in this instance, it's a pattern we have in the codebase, etc., but what are we going to use this for, exactly?

We have this habit of producing broadly generic interfaces for doing arbitrary things and then using them for exactly one concrete application. This adds low grade, everyday friction to working in the codebase. It also encourages adding the broadest possible interface for every operation because that's the pattern we've been following, which then further adds to that friction.

In this PR stack we're essentially doing custom string serialization for converting to various naming representations. What other operations will we perform? If that's kind of all we've got in mind here, I recommend making a more specific API for allowing different forms of string serialization and then generalizing it later if we find a need to do so.

More broadly, I think "add an interface to do the specific thing we're trying to do unless we have a clear need for broader generalization" is a default posture that will serve us well going forward. Broad methods with generic return types like transform(...) -> TransformOutpuT and accept(....) -> VisitorOutputT are the enemy of readability, and we use a lot of them to hide the fact that what we really mean is render_name(...) -> str or convert_to_sql_plan(....) -> SqlQueryPlanNode or render_sql(...) -> SqlRenderResult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this interface as later in the stack, this turned out to not be necessary.

@@ -0,0 +1,174 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picky but personally important thing - as a person who runs git log in a terminal on a regular basis and reads commit messages to see what's going on and why at a given point in the evolution of a codebase, having commit messages that contain only pointers to some github link is super annoying.

I know it feels like duplicated effort, but adding 2 sentences describing what this commit is doing and why it's here would be super handy. That's especially true in the case where this is one commit of many targeted at resolving an issue consisting of 1500 words of explanation complete with diagrams.

Don't get me wrong, I love that issue writeup, I just don't want to have to wade through all of it when I'm looking at the codebase and thinking "wait, why is this like this? Oh it was always this way, let me see if the commit message helps me out at all." Often enough the 2 sentence thing is adequate, and it saves me time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the commit message, but also left the link in there as I thought it would still be useful. It's now:

Add SpecPattern classes.

These pattern classes are used to model user inputs when group-by items are
specified through the query interface or specified in a filter. These patterns
allow for ambiguous user inputs of group-by items e.g. a time dimension with
an unknown grain. For that case, the ambiguous input makes it easier for the
user to author queries as figuring out the time grain requires inspection of
the configs. For more details, please see:

https://github.com/dbt-labs/metricflow/issues/887

from metricflow.specs.specs import LinkableInstanceSpec


class SpecPattern(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LinkableSpecPattern? It only works on LinkableInstanceSpecs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this was updated later in the stack to apply for any spec. Moved those changes into this commit.


for spec in candidate_specs:
spec_values = tuple(
(getattr(spec, key_to_check) if hasattr(spec, key_to_check) else None) for key_to_check in keys_to_check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr allows for a default return:

getattr(spec, key_to_check, default=None) for key_to_check in keys_to_check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 134 to 138
ParameterSetField.DATE_PART,
]

if time_dimension_call_parameter_set.time_granularity is not None:
fields_to_compare.append(ParameterSetField.TIME_GRANULARITY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little puzzling to me why one of these is using an is not None check but not the other. Is there any reason not to just compare both of these every time? Why do we care if the granularity is set for the purposes of the match call?

It appears the reason is so we can use None as a value for "any time granularity" here. Regardless, adding a docstring to the method explaining this instead of suppressing lint would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the case to follow the behavior that's defined for how the time grain / date part are used in the object builder naming scheme. Added a comment.

@plypaul plypaul force-pushed the plypaul--58.7--spec-pattern branch from aba93d0 to a12e578 Compare November 27, 2023 21:21
These pattern classes are used to model user inputs when group-by items are
specified through the query interface or specified in a filter. These patterns
allow for ambiguous user inputs of group-by items e.g. a time dimension with
an unknown grain. For that case, the ambiguous input makes it easier for the
user to author queries as figuring out the time grain requires inspection of
the configs. For more details, please see:

#887
@plypaul plypaul force-pushed the plypaul--58.7--spec-pattern branch from a12e578 to 1db958e Compare November 27, 2023 21:38
@plypaul plypaul merged commit da40fb9 into main Nov 27, 2023
9 checks passed
@plypaul plypaul deleted the plypaul--58.7--spec-pattern branch November 27, 2023 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants