Skip to content

Commit

Permalink
Move node dataset resolver subquery ids into a separate namespace
Browse files Browse the repository at this point in the history
The DataflowPlanNodeOutputDataSetResolver class extends the
DataflowToSqlQueryPlanConverter class. As a result, both classes
effectively do a full conversion from a DataflowPlan to a SqlQueryPlan,
complete with SqlQueryPlan node_id generation. These shared the
same ID prefix, which meant any call that might trigger an additional
invocation of the node dataset resolver subclass could cause shifts
in the subquery IDs in our SqlQueryPlan outputs even if the invocation
had no material impact on the query in question.

In order to reduce snapshot thrash and make our subquery IDs easier to
reason about on read-through of the generated SQL, this change splits
the node dataset resolver ID prefix into a dedicated value.
  • Loading branch information
tlento committed Jun 14, 2024
1 parent fda9d9b commit 00b75a7
Show file tree
Hide file tree
Showing 69 changed files with 9,006 additions and 8,994 deletions.
1 change: 1 addition & 0 deletions metricflow-semantics/metricflow_semantics/dag/id_prefix.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ class StaticIdPrefix(IdPrefix, Enum, metaclass=EnumMetaClassHelper):

TIME_SPINE_SOURCE = "time_spine_src"
SUB_QUERY = "subq"
NODE_RESOLVER_SUB_QUERY = "nr_subq"

@property
@override
Expand Down
11 changes: 11 additions & 0 deletions metricflow/dataflow/builder/node_data_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,11 @@

from typing import TYPE_CHECKING, Dict, Optional, Sequence

from metricflow_semantics.dag.id_prefix import StaticIdPrefix
from metricflow_semantics.dag.sequential_id import SequentialIdGenerator
from metricflow_semantics.mf_logging.runtime import log_block_runtime
from metricflow_semantics.specs.column_assoc import ColumnAssociationResolver
from typing_extensions import override

from metricflow.dataflow.dataflow_plan import (
DataflowPlanNode,
Expand Down Expand Up @@ -93,3 +96,11 @@ def copy(self) -> DataflowPlanNodeOutputDataSetResolver:
semantic_manifest_lookup=self._semantic_manifest_lookup,
_node_to_output_data_set=dict(self._node_to_output_data_set),
)

@override
def _next_unique_table_alias(self) -> str:
"""Return the next unique table alias to use in generating queries.
This uses a separate prefix in order to minimize subquery ID thrash.
"""
return SequentialIdGenerator.create_next_id(StaticIdPrefix.NODE_RESOLVER_SUB_QUERY).str_value
Loading

0 comments on commit 00b75a7

Please sign in to comment.