-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7267][CORE][CH] Support nested column pruning for HiveTableScan
json/parquet/orc format
#7268
[GLUTEN-7267][CORE][CH] Support nested column pruning for HiveTableScan
json/parquet/orc format
#7268
Conversation
Run Gluten Clickhouse CI |
HiveTableScan
json formatHiveTableScan
json format
5ba3026
to
4c202a6
Compare
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
HiveTableScan
json formatHiveTableScan
json/parquet/orc format
Run Gluten Clickhouse CI |
2 similar comments
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
...ckhouse/src/test/scala/org/apache/gluten/execution/hive/GlutenClickHouseHiveTableSuite.scala
Outdated
Show resolved
Hide resolved
性能测试表schema:test_tbl (a STRING, b STRUCT<x1: STRING, x2: STRING, x3: STRING, x4: STRING, x5: STRING>) 优化前 平均耗时: 优化后 平均耗时: |
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala
Outdated
Show resolved
Hide resolved
...ckhouse/src/test/scala/org/apache/gluten/execution/hive/GlutenClickHouseHiveTableSuite.scala
Outdated
Show resolved
Hide resolved
...ckhouse/src/test/scala/org/apache/gluten/execution/hive/GlutenClickHouseHiveTableSuite.scala
Show resolved
Hide resolved
...en-substrait/src/main/scala/org/apache/spark/sql/hive/HiveTableScanNestedColumnPruning.scala
Outdated
Show resolved
Hide resolved
...en-substrait/src/main/scala/org/apache/spark/sql/hive/HiveTableScanNestedColumnPruning.scala
Outdated
Show resolved
Hide resolved
Run Gluten Clickhouse CI on x86 |
4 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
@rui-mo take a look at this pr, whether velox backend need this feature ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KevinyhZou In the Velox backend, it is some kind of reader's task. For the below case you mentioned, Velox reader does not output the 'a2' and 'a3' columns. We are also working on some enhancements for the schema pruning in Velox, see facebookincubator/velox#5962.
Struct<a1 string, a2 string, a3 string>, when query s.a1 from table
I'm a little confused on the issue this PR is addressing. Are we supporting cases like those in the below suite? |
Yes, we do this to only read the fields that we need from the complex type, like |
@KevinyhZou Thanks for your feedback. We have enabled 'GlutenParquetSchemaSuite' on Velox backend. The way we enable it is: Gluten passes user-specified schema to Velox, and the Velox reader handles the schema mismatch. For example, Gluten passes 'struct<a1>' as the output type of Velox scan node, and Velox reader handles the schema pruning internally. I wonder if CH needs this PR because the CH reader cannot handle schema pruning. Would you clarify? Thanks.
|
OK, I see. This feature is already enabled for clickhouse backend |
Run Gluten Clickhouse CI on x86 |
b2e4229
to
e4af358
Compare
Run Gluten Clickhouse CI on x86 |
1 similar comment
Run Gluten Clickhouse CI on x86 |
e4af358
to
7d50be4
Compare
Run Gluten Clickhouse CI on x86 |
7d50be4
to
444f043
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Run Gluten Clickhouse CI on x86 |
2 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
HiveTableScan
json/parquet/orc formatHiveTableScan
json/parquet/orc format
HiveTableScan
json/parquet/orc formatHiveTableScan
json/parquet/orc format
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
(Fixes: #7267)
How was this patch tested?
BY UT