-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create metricflow-semantics
Package
#1151
Conversation
0cea41f
to
e626208
Compare
e626208
to
4afbd09
Compare
@@ -57,11 +57,12 @@ features = [ | |||
|
|||
[tool.hatch.build.targets.sdist] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we use exclude vs include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I suspect this is a holdover from the original poetry configurations in the internal Transform repo pre-MetricFlow.
That said, I tried to switch to include when I split out the dbt dependencies and that kept producing broken builds for some reason, so I went back to exclude just to move on with my life.
Also, I don't think this applies right here right now, but there's stuff we're supposed to include in any distributions for legal reasons, and it's really easy to forget to update the packaging if we move one of those files or something.
Heads up - need to check the packaging setup for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the effort to separate out the mechanical changes by type - apart from one auto-format things in the middle this was quite reviewable. Watching the adding and removing of the DataflowPlanBuilder and so forth was quite a ride. The end state streamlined production dependencies are a thing of beauty. I have two concerns:
- I'm not convinced the production MetricFlow package will build and run correctly. I see you commented that you're working on this, so I won't approve until that is confirmed to be viable.
- There's an inline about this but we have some copy/paste code held over in the prod packages. This can be dealt with in a follow-up PR.
The remaining inlines are all minor things but probably worth attending to even if it's just to say "yeah cool let's do that later." I took out all the fun ones because this PR is too big for that, it takes forever to do anything with GitHub here.
This repo encapsulates the modules needed for semantic resolution for MetricFlow queries. | ||
|
||
## Repo use cases | ||
- Resolving the dbt model dependencies of a (saved) query in MetricFlow for DAG dependency management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. We might want to make this more general. If you've got an idea now that's great otherwise we can revisit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No ideas at the moment.
metricflow-semantics/README.md
Outdated
|
||
# Welcome to metricflow-semantics | ||
|
||
This repo encapsulates the modules needed for semantic resolution for MetricFlow queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a package, not a repo, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -57,11 +57,12 @@ features = [ | |||
|
|||
[tool.hatch.build.targets.sdist] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, I suspect this is a holdover from the original poetry configurations in the internal Transform repo pre-MetricFlow.
That said, I tried to switch to include when I split out the dbt dependencies and that kept producing broken builds for some reason, so I went back to exclude just to move on with my life.
Also, I don't think this applies right here right now, but there's stuff we're supposed to include in any distributions for legal reasons, and it's really easy to forget to update the packaging if we move one of those files or something.
"python-dateutil>=2.8.2, <2.9.0", | ||
"rapidfuzz>=3.0, <4.0", | ||
"ruamel.yaml>=0.17.21, <0.18.0", | ||
"tabulate>=0.8.9", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need tabulate? I think we only use it in tests.
Doesn't matter much, I have a todo item in my brain to go through and streamline our dependencies, so I'll deal with it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a dev dependency.
@@ -23,3 +24,23 @@ class MetricFlowTestConfiguration(SnapshotConfiguration): | |||
# is created and persisted between runs. The source schema name includes a hash of the tables that should be in | |||
# the schema, so | |||
use_persistent_source_schema: bool | |||
|
|||
|
|||
class DirectoryPathAnchor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing a wrapper class around a builtin one-liner is cutting against one of Python's core design principles. Just make the constant use pathlib directly. pathlib.Path(__file__).parent.directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Actually, I was looking to get rid of
__file__
to make it easier to use. See the added commit that does this. Could be made into a function though. - I wanted to use these anchors for each of the semantic manifests. In getting the new package test configuration setup, not being able to auto-complete and relying on strings was error-prone. This is the n-th iteration that we've had to adjust this.
PLACEHOLDER_CHAR_FOR_INCOMPARABLE_STRINGS = "*" | ||
|
||
|
||
def make_schema_replacement_function(system_schema: str, source_schema: str) -> Callable[[str], str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is all just a straight copy/paste with maybe some autoformatting applied, but nothing else is changed. If that isn't correct I should probably read these files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's correct.
from tests_metricflow.fixtures.manifest_fixtures import MetricFlowEngineTestFixture, SemanticManifestSetup | ||
|
||
|
||
def test_natural_entity_instance_set_validation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were all a straight copy apart from the imports and calls to the new validator, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
def _semantic_model_of_entity_in_instance_set( | ||
instance_set: InstanceSet, | ||
entity_reference: EntityReference, | ||
) -> SemanticModelReference: | ||
"""Return the semantic model where the entity was defined in the instance set.""" | ||
matching_instances: List[EntityInstance] = [] | ||
for entity_instance in instance_set.entity_instances: | ||
assert len(entity_instance.defined_from) == 1 | ||
if len(entity_instance.spec.entity_links) == 0 and entity_instance.spec.reference == entity_reference: | ||
matching_instances.append(entity_instance) | ||
|
||
assert len(matching_instances) == 1, ( | ||
f"Not exactly 1 matching entity instances found: {matching_instances} for {entity_reference} in " | ||
f"{mf_pformat(instance_set)}" | ||
) | ||
return matching_instances[0].origin_semantic_model_reference.semantic_model_reference | ||
|
||
def is_valid_instance_set_join( | ||
self, | ||
left_instance_set: InstanceSet, | ||
right_instance_set: InstanceSet, | ||
on_entity_reference: EntityReference, | ||
) -> bool: | ||
"""Return true if the instance sets can be joined using the given entity.""" | ||
return self._join_evaluator.is_valid_semantic_model_join( | ||
left_semantic_model_reference=JoinDataflowOutputValidator._semantic_model_of_entity_in_instance_set( | ||
instance_set=left_instance_set, entity_reference=on_entity_reference | ||
), | ||
right_semantic_model_reference=JoinDataflowOutputValidator._semantic_model_of_entity_in_instance_set( | ||
instance_set=right_instance_set, | ||
entity_reference=on_entity_reference, | ||
), | ||
on_entity_reference=on_entity_reference, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be deleting these from the SemanticModelJoinEvaluator now that they've been moved here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought I deleted that - let me remove it.
@@ -36,7 +36,8 @@ def __init__( | |||
|
|||
@staticmethod | |||
def _is_pydantic_base_model(obj: Any): # type:ignore | |||
return isinstance(obj, BaseModel) | |||
# Checking the attribute as the BaseModel check fails for newer version of Pydantic. | |||
return isinstance(obj, BaseModel) or hasattr(obj, "__config__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's risky but I guess it's ok in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out a good solution after a few minutes of searching, but yeah, I'm looking for a better solution.
@@ -0,0 +1,6 @@ | |||
kind: Features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
micro-nit: I'd call this Under the Hood since that's what it is from the perspective of MetricFlow users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
One other thing from Slack - when the time comes please use a merge commit. Squash will be awful in the repo for git diff, and rebase-style will be bad because there are definitely broken commits in here and we may need to be able to use the merge commit to bisect past these or whatever at some point. |
@tlento Updated the build configuration to an include-package style. Looks good to me when I examine them.
|
9aee2e6
to
1df0aca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you! Love the new packaging layout, it's so clean and simple and I wish I'd found that in the docs when I was digging around last month. As discussed:
- When you're confident in the packaging, merge away!
- Squash and merge or a merge commit is fine, just don't rebase-merge. Thank you!
80c3abd
to
d62f62c
Compare
d62f62c
to
47d7057
Compare
Resolves #1150
Description
This PR carves out semantic-related modules from the
metricflow
package into themetricflow-semantics
package.When viewing by commit, commit messages prefixed with
Move...
indicate that it only contains changes that move modules / objects with associated import changes. Other commit messages indicate a potential code change.The publish workflow will be updated in a later PR.