Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avro source/sink should not use file header #12871

Open
Tracked by #13063
xiangjinwu opened this issue Oct 16, 2023 · 0 comments
Open
Tracked by #13063

avro source/sink should not use file header #12871

xiangjinwu opened this issue Oct 16, 2023 · 0 comments
Assignees
Milestone

Comments

@xiangjinwu
Copy link
Contributor

xiangjinwu commented Oct 16, 2023

When using confluent schema registry, there is a 5-byte header during serialization:
https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#messages-wire-format

When not using confluent schema registry, we have several options:

  • No header. Like our current implementation for json or protobuf. (from_avro_datum/to_avro_datum)
  • Avro single-object encoding (ref) header. (GenericSingleObjectReader/GenericSingleObjectWriter)
  • Avro object container file (ref) header. (Reader/Writer)

The current avro source (only append-only allows no schema registry) and sink (unreleased yet) uses the file header, which is huge and intended to be shared by multiple records within the same file. As messages in a queue, we shall switch to one of the earlier options.

@github-actions github-actions bot added this to the release-1.4 milestone Oct 16, 2023
@xiangjinwu xiangjinwu self-assigned this Nov 8, 2023
@xiangjinwu xiangjinwu modified the milestones: release-1.4, release-1.5 Nov 8, 2023
@xiangjinwu xiangjinwu modified the milestones: release-1.5, release-1.6 Dec 6, 2023
@xiangjinwu xiangjinwu modified the milestones: release-1.7, release-1.8 Mar 6, 2024
@xiangjinwu xiangjinwu modified the milestones: release-1.8, release-1.9 Apr 8, 2024
@xiangjinwu xiangjinwu modified the milestones: release-1.9, release-1.10 May 13, 2024
@xiangjinwu xiangjinwu modified the milestones: release-1.10, release-1.11 Jul 10, 2024
@xiangjinwu xiangjinwu modified the milestones: release-2.1, release-2.2 Oct 17, 2024
@xiangjinwu xiangjinwu removed this from the release-2.2 milestone Jan 8, 2025
@xiangjinwu xiangjinwu added this to the release-2.3 milestone Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant