-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support append data file and add e2e test #349
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ZENOTME for taking a stab at this! I like the e2e_test crate
which could be beneficial to the catalog impl crates as well (less duplicate test setup and infr)
|
||
version: '3.8' | ||
|
||
services: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll guess once this is merged when can reuse this and extend with images for Glue
and HiveMetastore
as well - this would remove duplicates from the specific catalog impl crates and the datafusion/integration crate and avoids having each crate run & define same testing infr.
Hi, I have tried to fix this PR. Some things may not be fixed well now:
|
60d548d
to
6a035a3
Compare
@ZENOTME thanks for getting back to me here. I don't want to be the one holding this up, so I would suggest that we get this in. There are still some gaps I see because I think it is pretty easy to brick a table by adding some incompatible files. I'll raise new issues for that so we can pick up those independently. Any concerns @c-thiel @liurenjie1024 @Xuanwo ? |
@@ -122,6 +172,365 @@ impl<'a> Transaction<'a> { | |||
} | |||
} | |||
|
|||
/// FastAppendAction is a transaction action for fast append data files to the table. | |||
pub struct FastAppendAction<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to move this part to a standalone module under transaction module.
fn process_manifeset(&self, manifests: Vec<ManifestFile>) -> Vec<ManifestFile>; | ||
} | ||
|
||
struct SnapshotProduceAction<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually an action is a transaction action, I think naming it SnapshotProducer
would be enough?
// # TODO | ||
// Fulfill this function | ||
fn summary<OP: SnapshotProduceOperation>(&self, snapshot_produce_operation: &OP) -> Summary { | ||
Summary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need snapshot summary builder.
Hi, @Fokko sorry for late reply, I took a pr and agree that there are still some improvements, and we can resolve them in follow up issues. |
* support append data file and add e2e test * fix typos * refine append action * fix cargo sort * add consistent check for partition value * generate unique snapshot id * avoid to set snapshot id for v2 * refine test * fix unit test * export ports * fix None case for parant_snapshot_id * fix parquect schema check * refactor append action of transaction * refine * refine e2e test * refine commit uuid * fix file format field to uppercase in manifest * refine SnapshotProduceAction * rename e2e_test to integration_tests * fix * use docker-compose.yaml from rest catalog * fix check --------- Co-authored-by: ZENOTME <[email protected]>
This PR is complete #345.
The design of this is based on https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotProducer.java.
I implement a SnapshotProduceAction which will accept a
Vec<ManifestFile>
andSummary
to generate a new snapshot and apply the snapshot to the tx.FastAppendAction will reuse
SnapshotProduceAction
and have its own interface to process the added data files.In the future, we can reuse
SnapshotProduceAction
to implement more append actions with different commit semantics as described in #348.Please let me know if this design has something that can be improved and other things missed.