Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support label attribute #158

Merged
merged 11 commits into from
Oct 6, 2023
Merged

Conversation

QMalcolm
Copy link
Collaborator

Resolves #143

Description

In this PR we

  1. Add optional label attribute to the following protocols:

    • Metric
    • SavedQuery
    • SemanticModel
    • Dimension
    • Entity
    • Measure
  2. Begin supporting optional label attribute on the corresponding pydantic implementations and parsing

  3. Add validations which ensure

    • metric labels are unique to other metrics
    • saved query labels are unique to other saved queries
    • semantic model labels are unique to other semantic models
    • dimension labels on a semantic model are unique to other dimensions on the same semantic model
    • entity labels on a semantic model are unique to other entities on the same semantic model
    • measure labels on a semantic model are unique to other measures on the same semantic model
    • entities with the same name across semantic models have the same label (or are None)

Checklist

This probably should have been part of #148, but we forgot. I'm adding
these tests here because I plan to add `label` to the `SavedQuery`
protocol in the coming commits. With that, I'll want to updated the
parsing tests to check it. To do that, the tests need to exist.
@QMalcolm QMalcolm added enhancement New feature or request Protocol Spec Change labels Sep 27, 2023
@QMalcolm QMalcolm requested review from tlento and plypaul September 27, 2023 00:36
@cla-bot cla-bot bot added the cla:yes label Sep 27, 2023
@QMalcolm QMalcolm force-pushed the qmalcolm--143-support-label-attribute branch from e9dc17d to a96638c Compare September 27, 2023 00:40
Copy link
Collaborator

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done but I figured you might want to look at the cat gif earlier....

tests/parsing/test_semantic_model_parsing.py Outdated Show resolved Hide resolved
tests/parsing/test_semantic_model_parsing.py Outdated Show resolved Hide resolved
@@ -334,6 +420,37 @@ def test_semantic_model_primary_time_dimension_parsing() -> None:
assert dimension.type_params is not None


def test_base_semantic_model_dimension_parsing() -> None:
"""Test parsing base attributes of PydanticDimension object."""
description = "Test sematic_model dimension description"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cutest copy/paste gif I could find.....

cat on copier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not marking comment as resolved because cat gif 🐈

# assert isinstance(categorical_dim, RuntimeCheckableDimension)
@given(builds(PydanticDimension))
def test_dimension_protocol(dimesnion: PydanticDimension) -> None: # noqa: D
assert isinstance(dimesnion, RuntimeCheckableDimension)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this not subject to the same problem with categorical dimension types as the old test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good question. So I ran the test a few different ways after seeing your comment in this review.

import pdb
@given(builds(PydanticDimension))
def test_dimension_protocol(dimension: PydanticDimension) -> None:  # noqa: D
    if dimension.type == DimensionType.CATEGORICAL:
        pdb.set_trace()
    assert isinstance(dimension, RuntimeCheckableDimension)
import pdb
@given(builds(PydanticDimension))
def test_dimension_protocol(dimension: PydanticDimension) -> None:  # noqa: D
    if dimension.type == DimensionType.TIME:
        pdb.set_trace()
    assert isinstance(dimension, RuntimeCheckableDimension)
import pdb
@given(builds(PydanticDimension))
def test_dimension_protocol(dimension: PydanticDimension) -> None:  # noqa: D
    if dimension.type_params is not None:
        pdb.set_trace()
    assert isinstance(dimension, RuntimeCheckableDimension)

Variation (1) and (2) hit the break point when the test was ran, but (3) never was. Thus it appears the type_params aren't getting automatically generated by hypothesis. I'm gonna do some more investigating and see if this is happening to all nested objects. If so I'll have to build better strategies. I haven't found a setting in hypothesis to handle this better by default. It looks like we might have be able to register strategies for custom types, which is maybe what we have to do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging in further, this might be a pydantic class defaulting issue HypothesisWorks/hypothesis#3218. Which makes sense why I was seeing it on some things, but not on others.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I improved our strategies, but that was a separate fix. Why the tests no longer have the issue with categorical dimensions is we removed the assertion causing the issue in this commit 83cc44d

Comment on lines 42 to 52
SIMPLE_METRIC_STRATEGY = builds(
PydanticMetric,
type=just(MetricType.SIMPLE),
type_params=builds(PydanticMetricTypeParams, measure=builds(PydanticMetricInputMeasure)),
)

SEMANTIC_MODEL_STRATEGY = builds(
PydanticSemanticModel,
dimensions=lists(builds(PydanticDimension)),
entities=lists(builds(PydanticEntity)),
measures=lists(builds(PydanticMeasure)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh neat, it infers values based on type hints, I guess recursively all the way down.

Copy link
Collaborator Author

@QMalcolm QMalcolm Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does it pretty well! Although it sometimes chokes on nested lists of structured objects and just does an empty list instead. Hence why I explicitly tell it to generate lists of dimensions/entities/measures

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also might be choking on nested objects in general. Or maybe it's only nested optional objects 🤔

@@ -0,0 +1,43 @@
import logging
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got to here..... will look over validations tomorrow!

@QMalcolm
Copy link
Collaborator Author

Gonna do commits to address comments. I'll fix them up into the relevant commits before merging, but after reviewing is finished 🙂

Copy link
Collaborator

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have one open question about the dimension uniqueness rule.

issues.append(
ValidationError(
context=FileContext.from_metadata(semantic_model.metadata),
message=f"Dimension labels must be unique within a semantic model. The label `{label}` was "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want them to be globally unique for dimensions? I can see people setting user__country and sales_rep__country to both be Country, which isn't great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those following along, we discussed this in a call. The decision for now was to not require labels for dimensions or measures to be globally unique. If people end up consistently running into disambiguation problems, we'll look into being stricter

Copy link
Collaborator

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion today the current validation looks good!

@QMalcolm
Copy link
Collaborator Author

QMalcolm commented Oct 6, 2023

Rebasing to fixup commit 9373115 into 59357a1

Specifically we added the attribute to the following protocols:
- Metric
- SavedQuery
- SemanticModel
- Dimension
- Entity
- Measure
To really do protocol satisfaction testing well, we have to write
really complex instantiations of the implementations. We want to cover
when things are optional vs not. Some implementations require certain
attributes to be set depending on paraent attributes during instantiation.
And the protocols are still growing (new attributes are still getting
added often). This makes how we were during these tests really brittle,
and we weren't testing all the cases we wanted to. By using hypothesis
each test will be ran many times with different generated instantiations
of the main built object. Additionally, hypothesis does shrinking, such
that it will try to find the minimal failing case. This gives us better
guarantees that we're actually hitting all possible variations.
…action

So pydantic 1.x doesn't play nice with hypothesis in that all defaulted
and optional fields won't have strategies automatically generated for them.
Therefore we need to explicitly set strategies for these properties
unfortunately. I tried to reduce as much duplication as possible by
composing strategies as much as possible.
@QMalcolm QMalcolm force-pushed the qmalcolm--143-support-label-attribute branch from d807b48 to ff2d7e5 Compare October 6, 2023 03:29
@QMalcolm QMalcolm merged commit b306a5b into main Oct 6, 2023
9 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--143-support-label-attribute branch October 6, 2023 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add optional label attribute protocols
2 participants