Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create metricflow-semantics Package #1151

Merged
merged 80 commits into from
Apr 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
34bb6ff
Add missing `__init__.py` files in `tests`.
plypaul Apr 23, 2024
86420c6
Add semantics module.
plypaul Apr 23, 2024
f208ee1
Rename errors.
plypaul Apr 23, 2024
c75352d
Rename dataset.
plypaul Apr 23, 2024
678b993
Rename specs.
plypaul Apr 23, 2024
e50a1e5
Add missing `__init__.py` in `inference`.
plypaul Apr 23, 2024
8929eec
Move `base_time_grain.py`.
plypaul Apr 23, 2024
614f0e7
Move patterns.
plypaul Apr 23, 2024
2b2ea34
Move spec_classes.py
plypaul Apr 23, 2024
8657abf
Move specs.
plypaul Apr 23, 2024
38f689e
Move naming.
plypaul Apr 23, 2024
e9e04f4
Move filters.
plypaul Apr 23, 2024
91529ec
Move model.
plypaul Apr 23, 2024
248474b
Move dag.
plypaul Apr 23, 2024
c952eb0
Move dataset.
plypaul Apr 23, 2024
0cc46a3
Move dataflow plan sub-modules.
plypaul Apr 23, 2024
4f28202
Move the rest of dataflow.
plypaul Apr 23, 2024
64cad82
Move some files out of plan_conversion.
plypaul Apr 23, 2024
fb53b89
Move plan_conversion.
plypaul Apr 23, 2024
d6cbc72
Move errors.
plypaul Apr 23, 2024
02eda1b
Move protocols.
plypaul Apr 23, 2024
e1160dd
Move query.
plypaul Apr 23, 2024
d80ebc3
Move sql.
plypaul Apr 23, 2024
246b3b0
Move time.
plypaul Apr 23, 2024
c7f6a04
Move top level.
plypaul Apr 23, 2024
298b45e
Move collection_helpers.
plypaul Apr 23, 2024
d2cd7be
Move mf_logging.
plypaul Apr 23, 2024
5de0abd
Move out time.
plypaul Apr 23, 2024
be48a6d
Move out sql.
plypaul Apr 23, 2024
68875ee
Move out protocols/sql_client.
plypaul Apr 23, 2024
8724f02
Move out plan_conversion.
plypaul Apr 23, 2024
9c094dc
Move out dataset.
plypaul Apr 23, 2024
73580a4
Move out dataflow.
plypaul Apr 23, 2024
249cd1b
Move out time.
plypaul Apr 23, 2024
89bef6f
Move out `data_warehouse_model_validator.py`.
plypaul Apr 24, 2024
e629e9d
Move `sql_bind_parameters.py.`
plypaul Apr 24, 2024
471a7bd
Update `linkable_spec_resolver.py` to use metric time from DSI.
plypaul Apr 24, 2024
33752c7
Separate and move `SqlJoinType`.
plypaul Apr 24, 2024
fce72e9
Move `sql_join_type.py`.
plypaul Apr 24, 2024
2ef2163
Remove `SemanticManifestLookup.time_spine_source`.
plypaul Apr 24, 2024
77bd647
Move semantic tests.
plypaul Apr 24, 2024
cfc2eef
Add `metricflow-semantics` package skeleton.
plypaul Apr 24, 2024
58f631c
Move metricflow.semantics.
plypaul Apr 24, 2024
51431b2
Add `tests_metricflow_semantics`.
plypaul Apr 24, 2024
9173de7
Update `snapshot_path_prefix` to handle new test directories.
plypaul Apr 24, 2024
f88ad4e
Add `py.typed` for `metricflow_semantics`.
plypaul Apr 24, 2024
be48ed5
Move fixtures from `setup_fixtures.py` to `sql_client_fixtures.py`.
plypaul Apr 24, 2024
d2094c0
Rename tests -> tests_metricflow.
plypaul Apr 24, 2024
7c17c1e
Update test module name in tests.
plypaul Apr 24, 2024
f755143
Split `test_helpers.py` into separate files.
plypaul Apr 24, 2024
5426a61
Move helpers into `test_helpers` module.
plypaul Apr 24, 2024
3d9aba3
Change signature of `assert_snapshot_text_equal` to use `SnapshotConf…
plypaul Apr 24, 2024
2b87c74
Move `load_semantic_manifest` to `manifest_helpers.py`.
plypaul Apr 24, 2024
0313849
Move `semantic_manifest_yamls` to `test_helpers`.
plypaul Apr 24, 2024
d6d3cfc
Add `DirectoryAnchor` and use new manifest YAML dir.
plypaul Apr 24, 2024
d58a953
Move `assert_*_snapshot*` to `snapshot_helpers`.
plypaul Apr 24, 2024
4c9dda2
Add snapshot methods that don't depend on a SQL client.
plypaul Apr 24, 2024
3e349ff
Change signaure of `assert*` methods to use `SnapshotConfiguration`.
plypaul Apr 24, 2024
7e9ee2c
Initial configuration for `metricflow-semantics` tests.
plypaul Apr 24, 2024
00129ac
Move a few tests to new locations.
plypaul Apr 24, 2024
6f21bb7
Remove `DunderColumnAssociationResolver` from `test_suggestions.py`.
plypaul Apr 24, 2024
b988db5
Move `metric_time_dimension.py` to `test_helpers`.
plypaul Apr 24, 2024
3da1773
Remove `DataSet` dependency from `metric_time_dimension.py`
plypaul Apr 24, 2024
76119b6
Separate dataflow validation from `SemanticModelJoinEvaluator`.
plypaul Apr 24, 2024
ec9d092
Move semantic tests to `tests_metricflow_semantics`.
plypaul Apr 24, 2024
9b468a9
Move `DunderColumnAssociationResolver` to `metricflow-semantics`.
plypaul Apr 24, 2024
d201599
Add `column_association_resolver` fixture.
plypaul Apr 24, 2024
ae485c0
Add missing `query_parser` fixture.
plypaul Apr 24, 2024
742b85b
Update `pyproject.toml`.
plypaul Apr 24, 2024
d80bf8a
Move tests to `metricflow-semantics`.
plypaul Apr 24, 2024
1699579
Fix pretty_printing for newer Pydantic versions.
plypaul Apr 24, 2024
dbf35f8
Update various build-related files.
plypaul Apr 24, 2024
f9d3544
Move ID-related objects to `test_helpers``.
plypaul Apr 24, 2024
b853369
Move snapshots.
plypaul Apr 24, 2024
d54a4ad
Add change log for #1150.
plypaul Apr 24, 2024
4afbd09
Lint fixes.
plypaul Apr 24, 2024
f6f64cb
Address comments.
plypaul Apr 25, 2024
9e2c131
Update `DirectoryPathAnchor` to not require `__file__`.
plypaul Apr 25, 2024
60773a9
Update / cleanup build configuration.
plypaul Apr 25, 2024
47d7057
Add package CI tests.
plypaul Apr 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions metricflow/plan_conversion/node_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from dbt_semantic_interfaces.references import EntityReference, TimeDimensionReference
from metricflow_semantics.filters.time_constraint import TimeRangeConstraint
from metricflow_semantics.mf_logging.pretty_print import mf_pformat
from metricflow_semantics.model.semantics.semantic_model_join_evaluator import MAX_JOIN_HOPS, SemanticModelJoinEvaluator
from metricflow_semantics.model.semantics.semantic_model_join_evaluator import MAX_JOIN_HOPS
from metricflow_semantics.model.semantics.semantic_model_lookup import SemanticModelLookup
from metricflow_semantics.specs.spec_classes import InstanceSpecSet, LinkableInstanceSpec, LinklessEntitySpec
from metricflow_semantics.specs.spec_set_transforms import ToElementNameSet
Expand All @@ -22,6 +22,7 @@
from metricflow.dataflow.nodes.filter_elements import FilterElementsNode
from metricflow.dataflow.nodes.join_to_base import JoinDescription, JoinToBaseOutputNode
from metricflow.dataflow.nodes.metric_time_transform import MetricTimeDimensionTransformNode
from metricflow.validation.dataflow_join_validator import JoinDataflowOutputValidator

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -84,7 +85,7 @@ def __init__( # noqa: D107
self._node_data_set_resolver = node_data_set_resolver
self._partition_resolver = PartitionJoinResolver(semantic_model_lookup)
self._semantic_model_lookup = semantic_model_lookup
self._join_evaluator = SemanticModelJoinEvaluator(semantic_model_lookup)
self._join_evaluator = JoinDataflowOutputValidator(semantic_model_lookup)

def add_time_range_constraint(
self,
Expand Down
57 changes: 57 additions & 0 deletions metricflow/validation/dataflow_join_validator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
from __future__ import annotations

from typing import TYPE_CHECKING, List

from dbt_semantic_interfaces.references import (
EntityReference,
SemanticModelReference,
)
from metricflow_semantics.instances import EntityInstance, InstanceSet
from metricflow_semantics.mf_logging.pretty_print import mf_pformat
from metricflow_semantics.model.semantics.semantic_model_join_evaluator import SemanticModelJoinEvaluator

if TYPE_CHECKING:
from metricflow_semantics.model.semantics.semantic_model_lookup import SemanticModelLookup


class JoinDataflowOutputValidator:
"""Checks that the instances in the output of a join dataflow node is valid."""

def __init__(self, semantic_model_lookup: SemanticModelLookup) -> None: # noqa: D107
self._join_evaluator = SemanticModelJoinEvaluator(semantic_model_lookup)

@staticmethod
def _semantic_model_of_entity_in_instance_set(
instance_set: InstanceSet,
entity_reference: EntityReference,
) -> SemanticModelReference:
"""Return the semantic model where the entity was defined in the instance set."""
matching_instances: List[EntityInstance] = []
for entity_instance in instance_set.entity_instances:
assert len(entity_instance.defined_from) == 1
if len(entity_instance.spec.entity_links) == 0 and entity_instance.spec.reference == entity_reference:
matching_instances.append(entity_instance)

assert len(matching_instances) == 1, (
f"Not exactly 1 matching entity instances found: {matching_instances} for {entity_reference} in "
f"{mf_pformat(instance_set)}"
)
return matching_instances[0].origin_semantic_model_reference.semantic_model_reference

def is_valid_instance_set_join(
self,
left_instance_set: InstanceSet,
right_instance_set: InstanceSet,
on_entity_reference: EntityReference,
) -> bool:
"""Return true if the instance sets can be joined using the given entity."""
return self._join_evaluator.is_valid_semantic_model_join(
left_semantic_model_reference=JoinDataflowOutputValidator._semantic_model_of_entity_in_instance_set(
instance_set=left_instance_set, entity_reference=on_entity_reference
),
right_semantic_model_reference=JoinDataflowOutputValidator._semantic_model_of_entity_in_instance_set(
instance_set=right_instance_set,
entity_reference=on_entity_reference,
),
on_entity_reference=on_entity_reference,
)
Comment on lines +24 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be deleting these from the SemanticModelJoinEvaluator now that they've been moved here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I deleted that - let me remove it.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from __future__ import annotations

from typing import Dict, Mapping, Sequence
from typing import Dict, Sequence

from dbt_semantic_interfaces.enum_extension import assert_values_exhausted
from dbt_semantic_interfaces.protocols.entity import EntityType
Expand All @@ -13,8 +13,6 @@
SemanticModelLink,
)

from tests_metricflow.fixtures.manifest_fixtures import MetricFlowEngineTestFixture, SemanticManifestSetup


def _get_join_types_for_entity_type(entity_type: EntityType) -> Sequence[SemanticModelEntityJoinType]:
"""Exhaustively evaluate entity types and return a sequence of all possible join type pairs.
Expand Down Expand Up @@ -207,124 +205,6 @@ def test_semantic_model_join_validation_on_missing_entity(
)


def test_distinct_target_instance_set_join_validation(
mf_engine_test_fixture_mapping: Mapping[SemanticManifestSetup, MetricFlowEngineTestFixture],
simple_semantic_manifest_lookup: SemanticManifestLookup,
) -> None:
"""Tests instance set join validation to a PRIMARY or UNIQUE entity."""
foreign_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SIMPLE_MANIFEST]
.data_set_mapping["listings_latest"]
.instance_set
)
primary_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SIMPLE_MANIFEST]
.data_set_mapping["users_latest"]
.instance_set
)
unique_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SIMPLE_MANIFEST].data_set_mapping["companies"].instance_set
)
user_entity_reference = EntityReference(element_name="user")
join_evaluator = SemanticModelJoinEvaluator(
semantic_model_lookup=simple_semantic_manifest_lookup.semantic_model_lookup
)

foreign_primary = join_evaluator.is_valid_instance_set_join(
left_instance_set=foreign_user_instance_set,
right_instance_set=primary_user_instance_set,
on_entity_reference=user_entity_reference,
)
primary_primary = join_evaluator.is_valid_instance_set_join(
left_instance_set=primary_user_instance_set,
right_instance_set=primary_user_instance_set,
on_entity_reference=user_entity_reference,
)
unique_primary = join_evaluator.is_valid_instance_set_join(
left_instance_set=unique_user_instance_set,
right_instance_set=primary_user_instance_set,
on_entity_reference=user_entity_reference,
)
foreign_unique = join_evaluator.is_valid_instance_set_join(
left_instance_set=foreign_user_instance_set,
right_instance_set=unique_user_instance_set,
on_entity_reference=user_entity_reference,
)
primary_unique = join_evaluator.is_valid_instance_set_join(
left_instance_set=primary_user_instance_set,
right_instance_set=unique_user_instance_set,
on_entity_reference=user_entity_reference,
)
unique_unique = join_evaluator.is_valid_instance_set_join(
left_instance_set=unique_user_instance_set,
right_instance_set=unique_user_instance_set,
on_entity_reference=user_entity_reference,
)

results = {
"foreign to primary": foreign_primary,
"primary to primary": primary_primary,
"unique to primary": unique_primary,
"foreign to unique": foreign_unique,
"primary to unique": primary_unique,
"unique to unique": unique_unique,
}
assert all(results.values()), (
f"All instance set level join types for primary and unique targets should be valid, but we found "
f"at least one that was not! Incorrectly failing types: {[k for k,v in results.items() if not v]}."
)


def test_foreign_target_instance_set_join_validation(
mf_engine_test_fixture_mapping: Mapping[SemanticManifestSetup, MetricFlowEngineTestFixture],
simple_semantic_manifest_lookup: SemanticManifestLookup,
) -> None:
"""Tests semantic model join validation to FOREIGN entity types."""
foreign_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SIMPLE_MANIFEST]
.data_set_mapping["listings_latest"]
.instance_set
)
primary_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SIMPLE_MANIFEST]
.data_set_mapping["users_latest"]
.instance_set
)
unique_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SIMPLE_MANIFEST].data_set_mapping["companies"].instance_set
)
user_entity_reference = EntityReference(element_name="user")
join_evaluator = SemanticModelJoinEvaluator(
semantic_model_lookup=simple_semantic_manifest_lookup.semantic_model_lookup
)

foreign_foreign = join_evaluator.is_valid_instance_set_join(
left_instance_set=foreign_user_instance_set,
right_instance_set=foreign_user_instance_set,
on_entity_reference=user_entity_reference,
)
primary_foreign = join_evaluator.is_valid_instance_set_join(
left_instance_set=primary_user_instance_set,
right_instance_set=foreign_user_instance_set,
on_entity_reference=user_entity_reference,
)
unique_foreign = join_evaluator.is_valid_instance_set_join(
left_instance_set=unique_user_instance_set,
right_instance_set=foreign_user_instance_set,
on_entity_reference=user_entity_reference,
)

results = {
"foreign to foreign": foreign_foreign,
"primary to foreign": primary_foreign,
"unique to foreign": unique_foreign,
}
assert not any(results.values()), (
f"All semantic model level joins against foreign targets should be invalid, but we found at least one "
f"that was not! Incorrectly passing types: {[k for k,v in results.items() if v]}."
)


def test_get_joinable_semantic_models_single_hop( # noqa: D103
partitioned_multi_hop_join_semantic_manifest_lookup: SemanticManifestLookup,
) -> None:
Expand Down Expand Up @@ -498,88 +378,3 @@ def test_natural_entity_semantic_model_validation(scd_semantic_manifest_lookup:
f"joins marked invalid: {[k for k,v in valid_joins.items() if not v]}. Invalid joins marked valid: "
f"{[k for k, v in invalid_joins.items() if v]}."
)


def test_natural_entity_instance_set_validation(
mf_engine_test_fixture_mapping: Mapping[SemanticManifestSetup, MetricFlowEngineTestFixture],
scd_semantic_manifest_lookup: SemanticManifestLookup,
) -> None:
"""Tests instance set validation for NATURAL target entity types.

These tests rely on the scd_semantic_manifest_lookup, which makes extensive use of NATURAL key types.
"""
natural_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SCD_MANIFEST]
.data_set_mapping["primary_accounts"]
.instance_set
)
primary_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SCD_MANIFEST].data_set_mapping["users_latest"].instance_set
)
foreign_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SCD_MANIFEST]
.data_set_mapping["bookings_source"]
.instance_set
)
unique_user_instance_set = (
mf_engine_test_fixture_mapping[SemanticManifestSetup.SCD_MANIFEST].data_set_mapping["companies"].instance_set
)
user_entity_reference = EntityReference(element_name="user")
join_evaluator = SemanticModelJoinEvaluator(
semantic_model_lookup=scd_semantic_manifest_lookup.semantic_model_lookup
)

# Valid cases
natural_primary = join_evaluator.is_valid_instance_set_join(
left_instance_set=natural_user_instance_set,
right_instance_set=primary_user_instance_set,
on_entity_reference=user_entity_reference,
)
natural_unique = join_evaluator.is_valid_instance_set_join(
left_instance_set=natural_user_instance_set,
right_instance_set=unique_user_instance_set,
on_entity_reference=user_entity_reference,
)
foreign_natural = join_evaluator.is_valid_instance_set_join(
left_instance_set=foreign_user_instance_set,
right_instance_set=natural_user_instance_set,
on_entity_reference=user_entity_reference,
)
primary_natural = join_evaluator.is_valid_instance_set_join(
left_instance_set=primary_user_instance_set,
right_instance_set=natural_user_instance_set,
on_entity_reference=user_entity_reference,
)
unique_natural = join_evaluator.is_valid_instance_set_join(
left_instance_set=unique_user_instance_set,
right_instance_set=natural_user_instance_set,
on_entity_reference=user_entity_reference,
)
# Invalid cases
natural_foreign = join_evaluator.is_valid_instance_set_join(
left_instance_set=natural_user_instance_set,
right_instance_set=foreign_user_instance_set,
on_entity_reference=user_entity_reference,
)
natural_natural = join_evaluator.is_valid_instance_set_join(
left_instance_set=natural_user_instance_set,
right_instance_set=natural_user_instance_set,
on_entity_reference=user_entity_reference,
)

valid_joins = {
"natural to primary": natural_primary,
"natural to unique": natural_unique,
"foreign to natural": foreign_natural,
"primary to natural": primary_natural,
"unique to natural": unique_natural,
}
invalid_joins = {
"natural to foreign": natural_foreign,
"natural to natural": natural_natural,
}
assert all(valid_joins.values()) and not any(invalid_joins.values()), (
f"Found unexpected join validator results when validating joins involving natural key comparisons! Valid "
f"joins marked invalid: {[k for k,v in valid_joins.items() if not v]}. Invalid joins marked valid: "
f"{[k for k, v in invalid_joins.items() if v]}."
)
Loading