-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issues of iceberg-rust v0.3.0 #348
Comments
@Fokko thanks for your effort here |
@marvinlanhenke No problem, thank you for all the work on the project. While compiling this I realized how much work has been done 🚀 |
Thanks for putting this together @Fokko! It's great to have this clarity on where we're heading. Let's go! 🙌 |
Hi, @Fokko About the read projection part, currently we can convert parquet files into arrow streams, but there are some limitations: it only support primitive types, and schema evolution is not supported yet. Our discussion is in this issue: #244 And here is the first step of projection by @viirya : #245 |
Also as we discussed in this doc, do you mind to add datafusion integration, python binding, wasm binding into futures? |
Thanks for the context, I've just added this to the list.
Ah yes, I forgot to check those marks, thanks!
Certainly! Great suggestions! I'm less familiar on some of these topics (like Datafusion), feel free to edit the post if you feel something is missing. |
...for Datafusion I have provided a basic design proposal and implementation for some of the datafustion traits, like catalog & schema provider; Perhaps we can also move forward on this: #324 |
Yeah, I'll take a review later. |
Hi, most of the issues in our 0.3 milestone have been closed. I plan to clean up the remaining issues and initiate the release process. Any ideas or comments? |
@Xuanwo Thanks for driving this. It would be good to get everything that we have on main out to the users 👍 |
I have created #543 to track the release process, please let me know if you think anything missed. |
Iceberg-rust 0.3.0
The main objective of 0.3.0 is to have a working read path (non-exhaustive list :)
field_summary
: Skipping data on the highest level by pruning away manifests:ManifestEvaluator
, used to filter manifests in table scans #322TableScan
in flight by @sdd in Implement manifest filtering inTableScan
#323102: partition
structExpressionEvaluator
#358partition-spec
schema to the102: partition
struct and evaluates it.TableScan
InclusiveMetricsEvaluator
#347ManifestEvaluator
, used to filter manifests in table scans #322partition_filters
fromManifestEvaluator
#360fn plan_files()
#362TableScan
Blocking issues:
org.apache.iceberg:iceberg-spark-runtime-3.5_2.13:1.5.0
#338field-id
's missing in generated Avro files #353Null
instead of-1
#352Nice to have (related to the query plan optimizations above):
DELETE
manifests that contain unrelated delete files.(Tracking issues of aligning storage support with iceberg-java #408)
State of catalog integration:
For the release after that, I think the commit path is going to be important.
Iceberg-rust 0.4.0 and beyond
Nice to have for the 0.3.0 release, but not required. Of course, open for debate.
Commit path
The commit path entails writing a new metadata JSON.
Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Write support
Most of the work in write support is around generating the correct Iceberg metadata. Some decisions can be made, for example first supporting only FastAppends, and only V2 metadata.
It is common to have multiple snapshots in a single commit to the catalog. For example, an overwrite operation of a partition can be a delete + append operation. This makes the implementation easier since you can separate the problems, and tackle them one by one. Also, for the roadmap it makes it easier since their operations can be developed in parallel.
Future topics
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍
The text was updated successfully, but these errors were encountered: