forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
优化parquet读取性能 #77
Comments
hn5092
added a commit
that referenced
this issue
Nov 10, 2019
7mming7
pushed a commit
that referenced
this issue
Nov 4, 2020
### What changes were proposed in this pull request? This PR added a physical rule to remove redundant project nodes. A `ProjectExec` is redundant when 1. It has the same output attributes and order as its child's output when ordering of these attributes is required. 2. It has the same output attributes as its child's output when attribute output ordering is not required. For example: After Filter: ``` == Physical Plan == *(1) Project [a#14L, b#15L, c#16, key#17] +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 5)) +- *(1) ColumnarToRow +- FileScan parquet [a#14L,b#15L,c#16,key#17] ``` The `Project a#14L, b#15L, c#16, key#17` is redundant because its output is exactly the same as filter's output. Before Aggregate: ``` == Physical Plan == *(2) HashAggregate(keys=[key#17], functions=[sum(a#14L), last(b#15L, false)], output=[sum_a#39L, key#17, last_b#41L]) +- Exchange hashpartitioning(key#17, 5), true, [id=#77] +- *(1) HashAggregate(keys=[key#17], functions=[partial_sum(a#14L), partial_last(b#15L, false)], output=[key#17, sum#49L, last#50L, valueSet#51]) +- *(1) Project [key#17, a#14L, b#15L] +- *(1) Filter (isnotnull(a#14L) AND (a#14L > 100)) +- *(1) ColumnarToRow +- FileScan parquet [a#14L,b#15L,key#17] ``` The `Project key#17, a#14L, b#15L` is redundant because hash aggregate doesn't require child plan's output to be in a specific order. ### Why are the changes needed? It removes unnecessary query nodes and makes query plan cleaner. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes apache#29031 from allisonwang-db/remove-project. Lead-authored-by: allisonwang-db <[email protected]> Co-authored-by: allisonwang-db <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://github.com/Kyligence/KAP/issues/15841
Kyligence/parquet-mr#8
The text was updated successfully, but these errors were encountered: