forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the alias function week of weekofyear #59
Comments
zheniantoushipashi
added a commit
that referenced
this issue
Oct 11, 2019
#59 Add the alias function week of weekofyear
7mming7
pushed a commit
that referenced
this issue
Nov 4, 2020
### What changes were proposed in this pull request? 1. Make more expressions extend `NullIntolerant`. 2. Add a checker(in `ExpressionInfoSuite`) to identify whether the expression is `NullIntolerant`. ### Why are the changes needed? Avoid skew join if the join column has many null values and can improve query performance. For examples: ```sql CREATE TABLE t1(c1 string, c2 string) USING parquet; CREATE TABLE t2(c1 string, c2 string) USING parquet; EXPLAIN SELECT t1.* FROM t1 JOIN t2 ON upper(t1.c1) = upper(t2.c1); ``` Before and after this PR: ```sql == Physical Plan == *(2) Project [c1#5, c2#6] +- *(2) BroadcastHashJoin [upper(c1#5)], [upper(c1#7)], Inner, BuildLeft :- BroadcastExchange HashedRelationBroadcastMode(List(upper(input[0, string, true]))), [id=#41] : +- *(1) ColumnarToRow : +- FileScan parquet default.t1[c1#5,c2#6] +- *(2) ColumnarToRow +- FileScan parquet default.t2[c1#7] == Physical Plan == *(2) Project [c1#5, c2#6] +- *(2) BroadcastHashJoin [upper(c1#5)], [upper(c1#7)], Inner, BuildRight :- *(2) Project [c1#5, c2#6] : +- *(2) Filter isnotnull(c1#5) : +- *(2) ColumnarToRow : +- FileScan parquet default.t1[c1#5,c2#6] +- BroadcastExchange HashedRelationBroadcastMode(List(upper(input[0, string, true]))), [id=#59] +- *(1) Project [c1#7] +- *(1) Filter isnotnull(c1#7) +- *(1) ColumnarToRow +- FileScan parquet default.t2[c1#7] ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes apache#28626 from wangyum/SPARK-28481. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
lxian
pushed a commit
to lxian/spark
that referenced
this issue
Sep 24, 2021
…query ### What changes were proposed in this pull request? Remove redundant aliases after `RewritePredicateSubquery`. For example: ```scala sql("CREATE TABLE t1 USING parquet AS SELECT id AS a, id AS b, id AS c FROM range(10)") sql("CREATE TABLE t2 USING parquet AS SELECT id AS x, id AS y FROM range(8)") sql( """ |SELECT * |FROM t1 |WHERE a IN (SELECT x | FROM (SELECT x AS x, | Rank() OVER (partition BY x ORDER BY Sum(y) DESC) AS ranking | FROM t2 | GROUP BY x) tmp1 | WHERE ranking <= 5) |""".stripMargin).explain ``` Before this PR: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- BroadcastHashJoin [a#10L], [x#7L], LeftSemi, BuildRight, false :- FileScan parquet default.t1[a#10L,b#11L,c#12L] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=Kyligence#68] +- Project [x#7L] +- Filter (ranking#8 <= 5) +- Window [rank(_w2#25L) windowspecdefinition(x#15L, _w2#25L DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS ranking#8], [x#15L], [_w2#25L DESC NULLS LAST] +- Sort [x#15L ASC NULLS FIRST, _w2#25L DESC NULLS LAST], false, 0 +- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=Kyligence#62] +- HashAggregate(keys=[x#15L], functions=[sum(y#16L)]) +- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=Kyligence#59] +- HashAggregate(keys=[x#15L], functions=[partial_sum(y#16L)]) +- FileScan parquet default.t2[x#15L,y#16L] ``` After this PR: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- BroadcastHashJoin [a#10L], [x#15L], LeftSemi, BuildRight, false :- FileScan parquet default.t1[a#10L,b#11L,c#12L] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=Kyligence#67] +- Project [x#15L] +- Filter (ranking#8 <= 5) +- Window [rank(_w2#25L) windowspecdefinition(x#15L, _w2#25L DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS ranking#8], [x#15L], [_w2#25L DESC NULLS LAST] +- Sort [x#15L ASC NULLS FIRST, _w2#25L DESC NULLS LAST], false, 0 +- HashAggregate(keys=[x#15L], functions=[sum(y#16L)]) +- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=Kyligence#59] +- HashAggregate(keys=[x#15L], functions=[partial_sum(y#16L)]) +- FileScan parquet default.t2[x#15L,y#16L] ``` ### Why are the changes needed? Reduce shuffle to improve query performance. This change can benefit TPC-DS q70. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes apache#33509 from wangyum/SPARK-36280. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
related KAP issue: https://github.com/Kyligence/KAP/issues/13871
The text was updated successfully, but these errors were encountered: