-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8094][CH][Part-1] Support reading data from the iceberg with CH backend #8095
Conversation
…CH backend Support reading data from the iceberg with CH backend - basic iceberg scan transformer - read from the iceberg with the copy-on-write mode
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the common change, thanks.
@@ -76,4 +76,7 @@ trait TransformerApi { | |||
def invalidateSQLExecutionResource(executionId: String): Unit = {} | |||
|
|||
def genWriteParameters(fileFormat: FileFormat, writeOptions: Map[String, String]): Any | |||
|
|||
/** use Hadoop Path class to encode the file path */ | |||
def encodeFilePathIfNeed(filePath: String): String = filePath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turn out that a similar difference exists on regular scan
VL
Lines 124 to 126 in e56da33
paths.add( | |
GlutenURLDecoder | |
.decode(file.filePath.toString, StandardCharsets.UTF_8.name())) |
CH
Line 166 in e56da33
paths.add(new URI(file.filePath.toString()).toASCIIString) |
Maybe the logics should be somehow consolidated in future, I am not sure.
Run Gluten Clickhouse CI on x86 |
import org.apache.spark.SparkConf | ||
import org.apache.spark.sql.Row | ||
|
||
class ClickHouseIcebergSuite extends GlutenClickHouseWholeStageTransformerSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to allow CH and Velox to share this part of the test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will share this part after the CH backend support the merge on read mode for the iceberg, and there is also a bug when using the timestamp type as partition column for the CH backend.
What changes were proposed in this pull request?
Support reading data from the iceberg with CH backend
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)