Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create metricflow-semantics Package #1151

Merged
merged 80 commits into from
Apr 26, 2024
Merged

Create metricflow-semantics Package #1151

merged 80 commits into from
Apr 26, 2024

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Apr 24, 2024

Resolves #1150

Description

This PR carves out semantic-related modules from the metricflow package into the metricflow-semantics package.

When viewing by commit, commit messages prefixed with Move... indicate that it only contains changes that move modules / objects with associated import changes. Other commit messages indicate a potential code change.

The publish workflow will be updated in a later PR.

@plypaul plypaul force-pushed the plypaul--11--repackage branch from 0cea41f to e626208 Compare April 24, 2024 22:43
@plypaul plypaul force-pushed the plypaul--11--repackage branch from e626208 to 4afbd09 Compare April 24, 2024 22:51
@@ -57,11 +57,12 @@ features = [

[tool.hatch.build.targets.sdist]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we use exclude vs include?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I suspect this is a holdover from the original poetry configurations in the internal Transform repo pre-MetricFlow.

That said, I tried to switch to include when I split out the dbt dependencies and that kept producing broken builds for some reason, so I went back to exclude just to move on with my life.

Also, I don't think this applies right here right now, but there's stuff we're supposed to include in any distributions for legal reasons, and it's really easy to forget to update the packaging if we move one of those files or something.

@plypaul
Copy link
Contributor Author

plypaul commented Apr 24, 2024

Heads up - need to check the packaging setup for metricflow.

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the effort to separate out the mechanical changes by type - apart from one auto-format things in the middle this was quite reviewable. Watching the adding and removing of the DataflowPlanBuilder and so forth was quite a ride. The end state streamlined production dependencies are a thing of beauty. I have two concerns:

  1. I'm not convinced the production MetricFlow package will build and run correctly. I see you commented that you're working on this, so I won't approve until that is confirmed to be viable.
  2. There's an inline about this but we have some copy/paste code held over in the prod packages. This can be dealt with in a follow-up PR.

The remaining inlines are all minor things but probably worth attending to even if it's just to say "yeah cool let's do that later." I took out all the fun ones because this PR is too big for that, it takes forever to do anything with GitHub here.

This repo encapsulates the modules needed for semantic resolution for MetricFlow queries.

## Repo use cases
- Resolving the dbt model dependencies of a (saved) query in MetricFlow for DAG dependency management.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We might want to make this more general. If you've got an idea now that's great otherwise we can revisit later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ideas at the moment.


# Welcome to metricflow-semantics

This repo encapsulates the modules needed for semantic resolution for MetricFlow queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a package, not a repo, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -57,11 +57,12 @@ features = [

[tool.hatch.build.targets.sdist]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I suspect this is a holdover from the original poetry configurations in the internal Transform repo pre-MetricFlow.

That said, I tried to switch to include when I split out the dbt dependencies and that kept producing broken builds for some reason, so I went back to exclude just to move on with my life.

Also, I don't think this applies right here right now, but there's stuff we're supposed to include in any distributions for legal reasons, and it's really easy to forget to update the packaging if we move one of those files or something.

"python-dateutil>=2.8.2, <2.9.0",
"rapidfuzz>=3.0, <4.0",
"ruamel.yaml>=0.17.21, <0.18.0",
"tabulate>=0.8.9",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need tabulate? I think we only use it in tests.

Doesn't matter much, I have a todo item in my brain to go through and streamline our dependencies, so I'll deal with it then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a dev dependency.

@@ -23,3 +24,23 @@ class MetricFlowTestConfiguration(SnapshotConfiguration):
# is created and persisted between runs. The source schema name includes a hash of the tables that should be in
# the schema, so
use_persistent_source_schema: bool


class DirectoryPathAnchor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a wrapper class around a builtin one-liner is cutting against one of Python's core design principles. Just make the constant use pathlib directly. pathlib.Path(__file__).parent.directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Actually, I was looking to get rid of __file__ to make it easier to use. See the added commit that does this. Could be made into a function though.
  • I wanted to use these anchors for each of the semantic manifests. In getting the new package test configuration setup, not being able to auto-complete and relying on strings was error-prone. This is the n-th iteration that we've had to adjust this.

PLACEHOLDER_CHAR_FOR_INCOMPARABLE_STRINGS = "*"


def make_schema_replacement_function(system_schema: str, source_schema: str) -> Callable[[str], str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is all just a straight copy/paste with maybe some autoformatting applied, but nothing else is changed. If that isn't correct I should probably read these files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's correct.

from tests_metricflow.fixtures.manifest_fixtures import MetricFlowEngineTestFixture, SemanticManifestSetup


def test_natural_entity_instance_set_validation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were all a straight copy apart from the imports and calls to the new validator, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Comment on lines +24 to +57
def _semantic_model_of_entity_in_instance_set(
instance_set: InstanceSet,
entity_reference: EntityReference,
) -> SemanticModelReference:
"""Return the semantic model where the entity was defined in the instance set."""
matching_instances: List[EntityInstance] = []
for entity_instance in instance_set.entity_instances:
assert len(entity_instance.defined_from) == 1
if len(entity_instance.spec.entity_links) == 0 and entity_instance.spec.reference == entity_reference:
matching_instances.append(entity_instance)

assert len(matching_instances) == 1, (
f"Not exactly 1 matching entity instances found: {matching_instances} for {entity_reference} in "
f"{mf_pformat(instance_set)}"
)
return matching_instances[0].origin_semantic_model_reference.semantic_model_reference

def is_valid_instance_set_join(
self,
left_instance_set: InstanceSet,
right_instance_set: InstanceSet,
on_entity_reference: EntityReference,
) -> bool:
"""Return true if the instance sets can be joined using the given entity."""
return self._join_evaluator.is_valid_semantic_model_join(
left_semantic_model_reference=JoinDataflowOutputValidator._semantic_model_of_entity_in_instance_set(
instance_set=left_instance_set, entity_reference=on_entity_reference
),
right_semantic_model_reference=JoinDataflowOutputValidator._semantic_model_of_entity_in_instance_set(
instance_set=right_instance_set,
entity_reference=on_entity_reference,
),
on_entity_reference=on_entity_reference,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be deleting these from the SemanticModelJoinEvaluator now that they've been moved here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I deleted that - let me remove it.

@@ -36,7 +36,8 @@ def __init__(

@staticmethod
def _is_pydantic_base_model(obj: Any): # type:ignore
return isinstance(obj, BaseModel)
# Checking the attribute as the BaseModel check fails for newer version of Pydantic.
return isinstance(obj, BaseModel) or hasattr(obj, "__config__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's risky but I guess it's ok in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out a good solution after a few minutes of searching, but yeah, I'm looking for a better solution.

@@ -0,0 +1,6 @@
kind: Features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: I'd call this Under the Hood since that's what it is from the perspective of MetricFlow users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@tlento
Copy link
Contributor

tlento commented Apr 25, 2024

One other thing from Slack - when the time comes please use a merge commit. Squash will be awful in the repo for git diff, and rebase-style will be bad because there are definitely broken commits in here and we may need to be able to use the merge commit to bisect past these or whatever at some point.

@plypaul
Copy link
Contributor Author

plypaul commented Apr 25, 2024

@tlento Updated the build configuration to an include-package style. Looks good to me when I examine them.

  • The include-package style flattens the directory structure in the artifact.
  • When I build a distribution from main, the result contained a few things that we weren't supposed to (e.g. .ruff_cache), so the include-package style does help to keep it tidy more easily.
  • The wheel build configuration wasn't there, so that artifact was including tests. Added a section for that.
  • Do you know which files other than LICENSE required?

@plypaul plypaul force-pushed the plypaul--11--repackage branch from 9aee2e6 to 1df0aca Compare April 25, 2024 17:03
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! Love the new packaging layout, it's so clean and simple and I wish I'd found that in the docs when I was digging around last month. As discussed:

  1. When you're confident in the packaging, merge away!
  2. Squash and merge or a merge commit is fine, just don't rebase-merge. Thank you!

@plypaul plypaul force-pushed the plypaul--11--repackage branch 2 times, most recently from 80c3abd to d62f62c Compare April 25, 2024 20:32
@plypaul plypaul force-pushed the plypaul--11--repackage branch from d62f62c to 47d7057 Compare April 26, 2024 09:59
@plypaul plypaul merged commit ff139fb into main Apr 26, 2024
15 checks passed
@plypaul plypaul deleted the plypaul--11--repackage branch April 26, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SL-2103] [Feature] Create metricflow-semantics Package
2 participants