-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split column pruner into two phases #1501
Merged
+934
−675
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
b418dae
/* PR_START p--cte 06 */ Split column pruner into two phases.
plypaul 51abaf6
Rename to `NodeToColumnAliasMapping`.
plypaul c41b7bb
Create a copy when reusing SELECT nodes.
plypaul b7e7ee1
Update snapshots due to re-created nodes.
plypaul c1fe86c
Change log call from debug to error.
plypaul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
from collections import defaultdict | ||
from typing import Dict, FrozenSet, Iterable, Set | ||
|
||
from metricflow.sql.sql_plan import ( | ||
SqlQueryPlanNode, | ||
) | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class NodeToColumnAliasMapping: | ||
"""Mutable class for mapping a SQL node to an arbitrary set of column aliases for that node. | ||
|
||
* Alternatively, this can be described as mapping a location in the SQL query plan to a set of column aliases. | ||
* See `SqlMapRequiredColumnAliasesVisitor` for the main use case for this class. | ||
* This is a thin wrapper over a dict to aid readability. | ||
""" | ||
|
||
def __init__(self) -> None: # noqa: D107 | ||
self._node_to_tagged_aliases: Dict[SqlQueryPlanNode, Set[str]] = defaultdict(set) | ||
|
||
def get_aliases(self, node: SqlQueryPlanNode) -> FrozenSet[str]: | ||
"""Return the column aliases added for the given SQL node.""" | ||
return frozenset(self._node_to_tagged_aliases[node]) | ||
|
||
def add_alias(self, node: SqlQueryPlanNode, column_alias: str) -> None: # noqa: D102 | ||
return self._node_to_tagged_aliases[node].add(column_alias) | ||
|
||
def add_aliases(self, node: SqlQueryPlanNode, column_aliases: Iterable[str]) -> None: # noqa: D102 | ||
self._node_to_tagged_aliases[node].update(column_aliases) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tangential, but I've frequently read this code and found this variable name confusing (
pruned_select_columns
). We frequently refer to "pruned columns" when we mean the ones that have been removed, but in this case we mean the columns that have been kept. I think the word pruned can technically be used both ways, but it typically is used to refer to what has been removed. Can we change this to a more clear variable name?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this could be renamed. To understand where you're coming from, you mention that
[pruned] typically is used to refer to what has been removed
. What examples were you thinking?When I think of pruned, I think about an overgrown tree. Once I prune it, I would call it a pruned tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's true - the tree has been pruned. But if you're referring to the branches, I think the "pruned" branches would typically refer to the ones removed. In this case the tree is the SQL node and the columns are the branches.
I just did a quick search through the code to see how we use this word, and there are a couple places where we use the opposite meaning of prune:
metricflow/metricflow/sql/optimizer/column_pruner.py
Line 37 in d01cbea
metricflow/tests_metricflow/sql/optimizer/test_column_pruner.py
Lines 349 to 350 in d01cbea
metricflow/tests_metricflow/sql/optimizer/test_column_pruner.py
Lines 394 to 395 in d01cbea
And this is silly but I just did a quick google search for a gut check here and it does look like the pruned leaves are the ones that have been removed:
Not totally related to this PR so this isn't blocking! But if you don't update the naming here I probably will the next time I come across it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'm thinking we use different terms like "removed" and "retained" then, but will have to handle in a follow up.