Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SqlWindowFunctionExpression and MetadataSpec/Instance #355

Merged
merged 8 commits into from
Nov 30, 2022

Conversation

WilliamDee
Copy link
Contributor

@WilliamDee WilliamDee commented Nov 22, 2022

Context

In aid of Conversion Metrics #352, It was best to break it out into multiple PRs. This PR is used to add the sql rendering node for window functions and the dataflow plan node for appending a row number column. The reason this was broken out into a separate PR was due to the nature of adding a row number column into a data set. Meaning there was no where to define this column in the current framework as this was never defined from anywhere (ie., in the config). In this PR, we introduce a new spec/instance type called MetadataSpec and MetadataInstance respectively. This is used for places that requires an additional spec or instance.

Changes

  • SqlFunctionExpression is now a parent object that is inherited by all sql function objects
  • SqlFunctionExpression -> SqlAggregationFunctionExpressionNode
  • Added SqlWindowFunctionExpressionNode
  • Added MetadataSpec and MetadataInstance to SpecSets
  • Added AppendRowNumberColumnNode

Note

After further discussion, AppendRowNumberColumnNode is unnecessary, but everything else could still be useful so keeping those in.

input_data_set: SqlDataSet = node.parent_node.accept(self)
input_data_set_alias = self._next_unique_table_alias()

row_number_spec = ExtraSpec.from_name("mf_row_number")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how should I name this?

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time to read this thoroughly. Directionally it seems fine to me. A couple of notes:

  1. ExtraSpec does need a new name, especially in the InstanceSpec interface. Maybe internal_specs or added_metadata_spec (as I believe @plypaul suggested) - something a bit more obvious that it's a framework internal thing.
  2. You mentioned a new identifier type (internal_key or row_number or something). It's hard for me to see how that will shake out - on the one hand it eliminates the need for a special container for extra specs. On the other hand, it mixes internal elements with model elements in the instance spec interface, and requires runtime checking to see if one of these things exists anywhere in there. My instinct is to stick with what you have here, but an internal identifier type is an idea worth exploring.

metricflow/sql/sql_exprs.py Outdated Show resolved Hide resolved
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me how the row number is used again?

metricflow/sql/sql_exprs.py Outdated Show resolved Hide resolved
def matches(self, other: SqlExpressionNode) -> bool: # noqa: D
if not isinstance(other, SqlWindowFunctionExpression):
return False
order_by_matches = all(x == y for x, y in itertools.zip_longest(self.order_by_args, other.order_by_args))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why zip_longest? Shouldn't we check for same length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the check to do len(x) == len(y) and set(x) == set(y).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, we should actually just do an equality check since ordering matters for this. So I just made it a simple self.order_by_args == other.order_by_args

row_number_select_column = SqlSelectColumn(
expr=SqlWindowFunctionExpression(
sql_function=SqlWindowFunction.ROW_NUMBER,
order_by_args=[SqlWindowOrderByArg(expr=x) for x in input_sql_column_references],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will have to be a FilterElementsNode before this node so we don't have a long list of columns here.

@WilliamDee WilliamDee force-pushed the will/add-row-number branch 2 times, most recently from f4f71b2 to 9dddf28 Compare November 30, 2022 18:27
@WilliamDee WilliamDee changed the title Added SqlWindowFunctionExpression and AppendRowNumberColumnNode Added SqlWindowFunctionExpression and MetadataSpec/Instance Nov 30, 2022
Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -107,6 +111,24 @@ def qualified_name(self) -> str:
raise NotImplementedError()


@dataclass(frozen=True)
class MetadataSpec(InstanceSpec):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine for now, but there will likely be different types of metadata later on so there might need to be subclasses of this. I'm also thinking whether to make this Visitable (also for later).

@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 20:59 Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 20:59 Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 20:59 Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 22:03 Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 22:03 Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 22:03 Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS November 30, 2022 22:03 Inactive
@WilliamDee WilliamDee merged commit ca2385c into main Nov 30, 2022
@WilliamDee WilliamDee deleted the will/add-row-number branch November 30, 2022 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants