-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Batch ingest iceberg/file source #14742
Comments
Just try to provide the complete context, the details of the POC user request can be found: https://www.notion.so/risingwave-labs/optimize-parquet-source-for-batch-load-dc498a043d504621bf56461690b14bd7?d=84ebdf5d7469412680278059c5898be8 In short, if implementing the batch iceberg source takes much time due to its complexity, a parquet file source with decent performance is good enough to help move the POC forward. The user will consider switching RW only if RW's iceberg batch source is fast enough. |
Must the file format be Parquet? Is it possible to use a CSV file that has been supported in our file source? If it is ok to test a CSV file first, we can support file source batch read first, to test the performance. BTW, I tested insert select from a RisingWave table to another RisingWave table last week. Can we just compare the streaming load from Kafka to a table with insert select from a table to another table? |
Good point, I think it is ok as CSV is a less efficient format than Parquet in terms of read and write performance.
I can try to communicate this first. The closer to the user's real use case, the better, but definitely nothing wrong if we use what we have at the moment, could you post the link to the last week's results? 🙏 |
|
FInished |
Is your feature request related to a problem? Please describe.
According to RFC: Combine Historical and Incremental Data
we need to support ingesting data from an external source (e.g. iceberg or file source) as historical data. This is a typical scenario of bulk loading which is expected to be faster than streaming loading data from a Kafka.
To support this feature, we need to
Performance Improvement:
Others:
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: