Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Naming Schemes to Represent Different Input Formats #893

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Nov 21, 2023

Description

These naming scheme classes will be used in the query parser to convert string inputs into patterns. The patterns will be used later to resolve ambiguous group-by-items.

@cla-bot cla-bot bot added the cla:yes label Nov 21, 2023
Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@plypaul plypaul force-pushed the plypaul--58.8--naming-scheme branch from 49ea7af to 03f45dc Compare November 21, 2023 19:17
@plypaul plypaul marked this pull request as ready for review November 21, 2023 19:26
@plypaul plypaul force-pushed the plypaul--58.8--naming-scheme branch from 03f45dc to 6897942 Compare November 22, 2023 05:30
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable! I left a bunch of minor comments. At the very least, please do clean up the error types before merging.

Comment on lines 24 to 25
If this scheme cannot accommodate the spec, return None. This is needed to handle a case with DatePart in
DunderNamingScheme, but naming schemes should otherwise be complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect the DunderNamingScheme itself to fall further and further behind due to its inherent limitations.

It would be nice to just get rid of it but I suspect that's never going to happen.

Suggested change
If this scheme cannot accommodate the spec, return None. This is needed to handle a case with DatePart in
DunderNamingScheme, but naming schemes should otherwise be complete.
If this scheme cannot accommodate the spec, return None. This is needed to handle unsupported cases in
DunderNamingScheme, such as DatePart, but naming schemes should otherwise be complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

#
if time_dimension_spec.date_part is not None:
return None
names = _DunderNameTransform().transform(LinkableSpecSet.from_specs((instance_spec,)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
names = _DunderNameTransform().transform(LinkableSpecSet.from_specs((instance_spec,)))
names = _DunderNameTransform().transform(spec_set)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines +42 to +48
for time_dimension_spec in spec_set.time_dimension_specs:
# From existing comment in StructuredLinkableSpecName:
#
# Dunder syntax not supported for querying date_part
#
if time_dimension_spec.date_part is not None:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an incredibly roundabout way of getting this value. I guess it's ok for now while we think about how to improve the spec class interfaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. However, I haven't been able to come up with a better one. Have ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an idea a while back but it pushed too much stuff into the common interface. We'll come up with something.

@override
def spec_pattern(self, input_str: str) -> EntityLinkPattern:
if not self.input_str_follows_scheme(input_str):
raise RuntimeError(f"{repr(input_str)} does not follow this scheme.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exception gets thrown in this case today? I assume this is almost always going to be a query input error, which means we may want to throw an exception we can handle and re-cast for things like alert management.

If we don't have a custom exception we can go to here I'd tag this as ValueError since it's an invalid value. RuntimeError is really broad. Also, we use ValueError below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, this case should not be hit as we would call input_str_follows_scheme before running that input through this method. If the input does not follow the scheme, we would create an error issue instead of raising exceptions. Updated to ValueError though. Also updated the docstring about it.

@override
def spec_pattern(self, input_str: str) -> SpecPattern:
if not self.input_str_follows_scheme(input_str):
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ValueError. Same question about a custom exception applies, and we should be consistent here.

pass

@abstractmethod
def input_str_follows_scheme(self, input_str: str) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding an enforcing version of this that's implemented to raise a consistent error, since mismatches are likely to all share the same root cause and error type/response messaging, something like:

def assert_input_str_follows_scheme(self, input_str: str) -> None:
    if not self.input_str_follows_scheme(input_str):
        raise ....

Then the implementations can just call the assert method when they need it instead of handling the exception info itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In later commits of this set, we avoid raising exceptions in favor of creating query issues so that all errors can be collected and displayed to the user.


@override
def input_str_follows_scheme(self, input_str: str) -> bool:
if ObjectBuilderNamingScheme._NAME_REGEX.match(input_str) is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this regex, or can we jump to the parsing? It's not clear to me what the regex is adding, honestly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a remnant of an earlier implementation. Removed.

Comment on lines +150 to +168
initializer_parameters = []
entity_link_names = list(entity_link.element_name for entity_link in entity_links)
if len(entity_link_names) > 0:
initializer_parameters.append(repr(entity_link_names[-1] + DUNDER + element_name))
else:
initializer_parameters.append(repr(element_name))
if time_granularity is not None:
initializer_parameters.append(
f"'{time_granularity.value}'",
)
if date_part is not None:
initializer_parameters.append(f"date_part_name={repr(date_part.value)}")
if len(entity_link_names) > 1:
initializer_parameters.append(f"entity_path={repr(entity_link_names[:-1])}")

return ", ".join(initializer_parameters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there has to be a better way to do this but without overhauling our spec property interfaces I don't know what it is. The logic here is a little brittle because of the assumption that all the parameters passed are correct for the spec type but since it's all in this one small class I think it's fine.

return EntityLinkPattern(
EntityLinkPatternParameterSet(
element_name=entity_call_parameter_set.entity_reference.element_name,
entity_links=entity_call_parameter_set.entity_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially related - do we actually allow linked entities, like user__listing, in a query? Or is this forward-looking for things we might add, or for using this for internal cases where we may need that entity path resolution in an intermediate query step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do allow queries for linked entities like user__listing right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? TIL. I don't think I've ever seen it.

return DunderNamingScheme()


def test_input_str(dunder_naming_scheme: DunderNamingScheme) -> None: # noqa: D
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not a fan of tests that do multi-step assertions against different logic triggers, I'd rather they be parameterized on input/output or else split into separate methods - that way when I inevitably change something that breaks a bunch of stuff I don't have to go through this run test, fix bug one, run test, fix bug 2, run test, fix bug 3 workflow because all of the errors are exposed on the first run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But given the simplicity of these cases, I ended up bunching them. Could change them later though.

@plypaul plypaul force-pushed the plypaul--58.7--spec-pattern branch 2 times, most recently from a12e578 to 1db958e Compare November 27, 2023 21:38
Base automatically changed from plypaul--58.7--spec-pattern to main November 27, 2023 22:04
@plypaul plypaul force-pushed the plypaul--58.8--naming-scheme branch from 6897942 to 51ea8bb Compare November 27, 2023 22:10
These naming scheme classes will be used in the query parser to convert string
inputs into patterns. The patterns will be used later to resolve ambiguous
group-by-items.
@plypaul plypaul force-pushed the plypaul--58.8--naming-scheme branch from 51ea8bb to 7992b74 Compare November 27, 2023 22:17
@plypaul plypaul merged commit 955b897 into main Nov 28, 2023
8 checks passed
@plypaul plypaul deleted the plypaul--58.8--naming-scheme branch November 28, 2023 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants