Skip to content

Commit

Permalink
Move SemanticModel data artifacts to dbt/artifacts (#9485)
Browse files Browse the repository at this point in the history
* Move `SemanticModel` sub dataclasses to dbt/artifacts

* Move `NodeRelation` to dbt/artifacts

* Move `SemanticModelConfig` to dbt/artifacts

* Move data portion of `SemanticModel` to dbt/artifacts

* Add contextual comments to `semantic_model.py` about DSI protocols

* Fixup mypy complaint

* Migrate v12 manifest to use artifact definitions of `SavedQuery`, `Metric`, and `SemanticModel`

* Convert `SemanticModel` and `Metric` resources to full nodes in selector search

In the `search` method in `selector_methods.py`, we were getting object
representations from the incoming writable manifest by unique id. What we
get from the writable manifest though is increasingly the `resource`
(data artifact) part of the node, not the full node. This was problematic
because a number of the selector processes _compare_ the old node to the
new node, but the `resource` representation doesn't have the comparator
methods.

In this commit we dict-ify the resource and then get the full node by
undictifying that. We should probably have a better built in process to
the full node objects to do this, but this will do for now.

* Add `from_resource` implementation on `BaseNode` to ease resource to node conversion

We want to easily be able to create nodes from their resource counter
parts. It's actually imperative that we can do so. The previous commit
had a manual way to do so where needed. However, we don't want to have
to put `from_dict(.to_dict())` everywhere. So here we hadded a `from_resource`
class method to `BaseNode`. Everything that inherits from `BaseNode` thus
automatically gets this functionality.

HOWEVER, the implementation currently has a problem. Specifically, the
type for `resource_instance` is `BaseResource`. Which means if one is
calling say `Metric.from_resource()`, one could hand it a `SemanticModelResource`
and mypy won't complain. In this case, a semi-cryptic error might get
raised at runtime. Whether or not an error gets raised depends entirely
on whether or not the dictified resource instance manages to satisfy all
the required attributes of the desired node class. THIS IS VERY BAD.

We should be able to solve this issue in an upcoming (hopefully next)
commit, wherein we genericize `BaseNode` such that when inheriting it
you declare it with a resource type. Technically a runtime error will
still be possible, however any mixups should be caught by mypy on
pre-commit hooks as well as PRs.

* Make `BaseNode` a generic that is defined with a `ResourceType`

Turning `BaseNode` into an ABC generic allows us to say that the inheriting
class can define what resource type from artifacts it should be used with.
This gives us added type safety to what resource type can be passed into
`from_resource` when called via `SemanticModel.from_resource(...)`,
`Metric.from_resource(...)`, and etc.

NOTE: This only gives us type safety from mypy. If we begin ignoring
mypy errors during development, we can still get into a situation for
runtime errors (it's just harder to do so now).
  • Loading branch information
QMalcolm authored Feb 1, 2024
1 parent 8717828 commit 0836095
Show file tree
Hide file tree
Showing 16 changed files with 361 additions and 306 deletions.
6 changes: 6 additions & 0 deletions .changes/unreleased/Under the Hood-20240129-163800.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Under the Hood
body: Move data portion of `SemanticModel` to dbt/artifacts
time: 2024-01-29T16:38:00.245253-08:00
custom:
Author: QMalcolm
Issue: "9387"
13 changes: 13 additions & 0 deletions core/dbt/artifacts/resources/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,16 @@
WhereFilter,
WhereFilterIntersection,
)
from dbt.artifacts.resources.v1.semantic_model import (
Defaults,
Dimension,
DimensionTypeParams,
DimensionValidityParams,
Entity,
Measure,
MeasureAggregationParameters,
NodeRelation,
NonAdditiveDimension,
SemanticModel,
SemanticModelConfig,
)
273 changes: 273 additions & 0 deletions core/dbt/artifacts/resources/v1/semantic_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
import time

from dataclasses import dataclass, field
from dbt.artifacts.resources.base import GraphResource
from dbt.artifacts.resources.v1.components import DependsOn, RefArgs
from dbt_common.contracts.config.base import BaseConfig, CompareBehavior, MergeBehavior
from dbt_common.dataclass_schema import dbtClassMixin
from dbt_semantic_interfaces.references import (
DimensionReference,
EntityReference,
LinkableElementReference,
MeasureReference,
SemanticModelReference,
TimeDimensionReference,
)
from dbt_semantic_interfaces.type_enums import (
AggregationType,
DimensionType,
EntityType,
TimeGranularity,
)
from dbt.artifacts.resources import SourceFileMetadata
from typing import Any, Dict, List, Optional, Sequence


"""
The classes in this file are dataclasses which are used to construct the Semantic
Model node in dbt-core. Additionally, these classes need to at a minimum support
what is specified in their protocol definitions in dbt-semantic-interfaces.
Their protocol definitions can be found here:
https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/semantic_model.py
"""


@dataclass
class Defaults(dbtClassMixin):
agg_time_dimension: Optional[str] = None


@dataclass
class NodeRelation(dbtClassMixin):
alias: str
schema_name: str # TODO: Could this be called simply "schema" so we could reuse StateRelation?
database: Optional[str] = None
relation_name: Optional[str] = None


# ====================================
# Dimension objects
# Dimension protocols: https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/dimension.py
# ====================================


@dataclass
class DimensionValidityParams(dbtClassMixin):
is_start: bool = False
is_end: bool = False


@dataclass
class DimensionTypeParams(dbtClassMixin):
time_granularity: TimeGranularity
validity_params: Optional[DimensionValidityParams] = None


@dataclass
class Dimension(dbtClassMixin):
name: str
type: DimensionType
description: Optional[str] = None
label: Optional[str] = None
is_partition: bool = False
type_params: Optional[DimensionTypeParams] = None
expr: Optional[str] = None
metadata: Optional[SourceFileMetadata] = None

@property
def reference(self) -> DimensionReference:
return DimensionReference(element_name=self.name)

@property
def time_dimension_reference(self) -> Optional[TimeDimensionReference]:
if self.type == DimensionType.TIME:
return TimeDimensionReference(element_name=self.name)
else:
return None

@property
def validity_params(self) -> Optional[DimensionValidityParams]:
if self.type_params:
return self.type_params.validity_params
else:
return None


# ====================================
# Entity objects
# Entity protocols: https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/entity.py
# ====================================


@dataclass
class Entity(dbtClassMixin):
name: str
type: EntityType
description: Optional[str] = None
label: Optional[str] = None
role: Optional[str] = None
expr: Optional[str] = None

@property
def reference(self) -> EntityReference:
return EntityReference(element_name=self.name)

@property
def is_linkable_entity_type(self) -> bool:
return self.type in (EntityType.PRIMARY, EntityType.UNIQUE, EntityType.NATURAL)


# ====================================
# Measure objects
# Measure protocols: https://github.com/dbt-labs/dbt-semantic-interfaces/blob/main/dbt_semantic_interfaces/protocols/measure.py
# ====================================


@dataclass
class MeasureAggregationParameters(dbtClassMixin):
percentile: Optional[float] = None
use_discrete_percentile: bool = False
use_approximate_percentile: bool = False


@dataclass
class NonAdditiveDimension(dbtClassMixin):
name: str
window_choice: AggregationType
window_groupings: List[str]


@dataclass
class Measure(dbtClassMixin):
name: str
agg: AggregationType
description: Optional[str] = None
label: Optional[str] = None
create_metric: bool = False
expr: Optional[str] = None
agg_params: Optional[MeasureAggregationParameters] = None
non_additive_dimension: Optional[NonAdditiveDimension] = None
agg_time_dimension: Optional[str] = None

@property
def reference(self) -> MeasureReference:
return MeasureReference(element_name=self.name)


# ====================================
# SemanticModel final parts
# ====================================


@dataclass
class SemanticModelConfig(BaseConfig):
enabled: bool = True
group: Optional[str] = field(
default=None,
metadata=CompareBehavior.Exclude.meta(),
)
meta: Dict[str, Any] = field(
default_factory=dict,
metadata=MergeBehavior.Update.meta(),
)


@dataclass
class SemanticModel(GraphResource):
model: str
node_relation: Optional[NodeRelation]
description: Optional[str] = None
label: Optional[str] = None
defaults: Optional[Defaults] = None
entities: Sequence[Entity] = field(default_factory=list)
measures: Sequence[Measure] = field(default_factory=list)
dimensions: Sequence[Dimension] = field(default_factory=list)
metadata: Optional[SourceFileMetadata] = None
depends_on: DependsOn = field(default_factory=DependsOn)
refs: List[RefArgs] = field(default_factory=list)
created_at: float = field(default_factory=lambda: time.time())
config: SemanticModelConfig = field(default_factory=SemanticModelConfig)
unrendered_config: Dict[str, Any] = field(default_factory=dict)
primary_entity: Optional[str] = None
group: Optional[str] = None

@property
def entity_references(self) -> List[LinkableElementReference]:
return [entity.reference for entity in self.entities]

@property
def dimension_references(self) -> List[LinkableElementReference]:
return [dimension.reference for dimension in self.dimensions]

@property
def measure_references(self) -> List[MeasureReference]:
return [measure.reference for measure in self.measures]

@property
def has_validity_dimensions(self) -> bool:
return any([dim.validity_params is not None for dim in self.dimensions])

@property
def validity_start_dimension(self) -> Optional[Dimension]:
validity_start_dims = [
dim for dim in self.dimensions if dim.validity_params and dim.validity_params.is_start
]
if not validity_start_dims:
return None
return validity_start_dims[0]

@property
def validity_end_dimension(self) -> Optional[Dimension]:
validity_end_dims = [
dim for dim in self.dimensions if dim.validity_params and dim.validity_params.is_end
]
if not validity_end_dims:
return None
return validity_end_dims[0]

@property
def partitions(self) -> List[Dimension]: # noqa: D
return [dim for dim in self.dimensions or [] if dim.is_partition]

@property
def partition(self) -> Optional[Dimension]:
partitions = self.partitions
if not partitions:
return None
return partitions[0]

@property
def reference(self) -> SemanticModelReference:
return SemanticModelReference(semantic_model_name=self.name)

def checked_agg_time_dimension_for_measure(
self, measure_reference: MeasureReference
) -> TimeDimensionReference:
measure: Optional[Measure] = None
for measure in self.measures:
if measure.reference == measure_reference:
measure = measure

assert (
measure is not None
), f"No measure with name ({measure_reference.element_name}) in semantic_model with name ({self.name})"

default_agg_time_dimension = (
self.defaults.agg_time_dimension if self.defaults is not None else None
)

agg_time_dimension_name = measure.agg_time_dimension or default_agg_time_dimension
assert agg_time_dimension_name is not None, (
f"Aggregation time dimension for measure {measure.name} on semantic model {self.name} is not set! "
"To fix this either specify a default `agg_time_dimension` for the semantic model or define an "
"`agg_time_dimension` on the measure directly."
)
return TimeDimensionReference(element_name=agg_time_dimension_name)

@property
def primary_entity_reference(self) -> Optional[EntityReference]:
return (
EntityReference(element_name=self.primary_entity)
if self.primary_entity is not None
else None
)
12 changes: 8 additions & 4 deletions core/dbt/artifacts/schemas/manifest/v12/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,14 @@
get_artifact_schema_version,
)
from dbt.artifacts.schemas.upgrades import upgrade_manifest_json
from dbt.artifacts.resources import Documentation, Group, Macro
from dbt.artifacts.resources import (
Documentation,
Group,
Macro,
Metric,
SavedQuery,
SemanticModel,
)

# TODO: remove usage of dbt modules other than dbt.artifacts
from dbt import tracking
Expand All @@ -18,9 +25,6 @@
Exposure,
GraphMemberNode,
ManifestNode,
Metric,
SavedQuery,
SemanticModel,
SourceDefinition,
UnitTestDefinition,
)
Expand Down
19 changes: 5 additions & 14 deletions core/dbt/contracts/graph/model_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
from typing import Any, List, Optional, Dict, Union, Type
from typing_extensions import Annotated

from dbt.artifacts.resources import MetricConfig, SavedQueryConfig
from dbt.artifacts.resources import (
MetricConfig,
SavedQueryConfig,
SemanticModelConfig,
)
from dbt_common.contracts.config.base import BaseConfig, MergeBehavior, CompareBehavior
from dbt_common.contracts.config.materialization import OnConfigurationChangeOption
from dbt_common.contracts.config.metadata import Metadata, ShowBehavior
Expand Down Expand Up @@ -49,19 +53,6 @@ class Hook(dbtClassMixin, Replaceable):
index: Optional[int] = None


@dataclass
class SemanticModelConfig(BaseConfig):
enabled: bool = True
group: Optional[str] = field(
default=None,
metadata=CompareBehavior.Exclude.meta(),
)
meta: Dict[str, Any] = field(
default_factory=dict,
metadata=MergeBehavior.Update.meta(),
)


@dataclass
class ExposureConfig(BaseConfig):
enabled: bool = True
Expand Down
Loading

0 comments on commit 0836095

Please sign in to comment.