[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with CH backend #8095

zzcclp · 2024-11-29T03:20:48Z

What changes were proposed in this pull request?

Support reading data from the iceberg with CH backend

basic iceberg scan transformer
read from the iceberg with the copy-on-write mode

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…CH backend Support reading data from the iceberg with CH backend - basic iceberg scan transformer - read from the iceberg with the copy-on-write mode

github-actions · 2024-11-29T03:21:05Z

#8094

github-actions · 2024-11-29T03:21:19Z

Run Gluten Clickhouse CI on x86

zhztheplayer

+1 on the common change, thanks.

zhztheplayer · 2024-11-29T06:07:30Z

gluten-substrait/src/main/scala/org/apache/gluten/backendsapi/TransformerApi.scala

@@ -76,4 +76,7 @@ trait TransformerApi {
  def invalidateSQLExecutionResource(executionId: String): Unit = {}

  def genWriteParameters(fileFormat: FileFormat, writeOptions: Map[String, String]): Any
+
+  /** use Hadoop Path class to encode the file path */
+  def encodeFilePathIfNeed(filePath: String): String = filePath


It turn out that a similar difference exists on regular scan

VL

incubator-gluten/backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxIteratorApi.scala

Lines 124 to 126 in e56da33

paths.add(

GlutenURLDecoder

.decode(file.filePath.toString, StandardCharsets.UTF_8.name()))

CH

incubator-gluten/backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHIteratorApi.scala

Line 166 in e56da33

paths.add(new URI(file.filePath.toString()).toASCIIString)

Maybe the logics should be somehow consolidated in future, I am not sure.

github-actions · 2024-11-29T06:25:02Z

Run Gluten Clickhouse CI on x86

liujiayi771 · 2024-11-29T07:17:05Z

...ouse/src/test-iceberg/scala/org/apache/gluten/execution/iceberg/ClickHouseIcebergSuite.scala

+import org.apache.spark.SparkConf
+import org.apache.spark.sql.Row
+
+class ClickHouseIcebergSuite extends GlutenClickHouseWholeStageTransformerSuite {


Is it possible to allow CH and Velox to share this part of the test cases?

will share this part after the CH backend support the merge on read mode for the iceberg, and there is also a bug when using the timestamp type as partition column for the CH backend.

[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with …

b2824b9

…CH backend Support reading data from the iceberg with CH backend - basic iceberg scan transformer - read from the iceberg with the copy-on-write mode

zzcclp requested review from baibaichen, zhztheplayer and liujiayi771 November 29, 2024 03:20

github-actions bot added CORE works for Gluten Core CLICKHOUSE DATA_LAKE labels Nov 29, 2024

zhztheplayer approved these changes Nov 29, 2024

View reviewed changes

fix

6724991

liujiayi771 reviewed Nov 29, 2024

View reviewed changes

zzcclp merged commit ea0bcd5 into apache:main Nov 29, 2024
48 checks passed

lwz9103 mentioned this pull request Nov 29, 2024

[GLUTEN-8095][CH] package with iceberg profile #8106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with CH backend #8095

[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with CH backend #8095

zzcclp commented Nov 29, 2024

github-actions bot commented Nov 29, 2024

github-actions bot commented Nov 29, 2024

zhztheplayer left a comment

zhztheplayer Nov 29, 2024

github-actions bot commented Nov 29, 2024

liujiayi771 Nov 29, 2024

zzcclp Nov 29, 2024

	paths.add(
	GlutenURLDecoder
	.decode(file.filePath.toString, StandardCharsets.UTF_8.name()))

[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with CH backend #8095

[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with CH backend #8095

Conversation

zzcclp commented Nov 29, 2024

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Nov 29, 2024

github-actions bot commented Nov 29, 2024

zhztheplayer left a comment

Choose a reason for hiding this comment

zhztheplayer Nov 29, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 29, 2024

liujiayi771 Nov 29, 2024

Choose a reason for hiding this comment

zzcclp Nov 29, 2024

Choose a reason for hiding this comment