Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): ProtoEncoder and AvroEncoder #12425

Merged
merged 2 commits into from
Oct 16, 2023
Merged

Conversation

xiangjinwu
Copy link
Contributor

@xiangjinwu xiangjinwu commented Sep 19, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

For both avro and proto, we validate between RisingWave and Encoder schema during Encoder::new. The duplicated logic (mostly m x n match between data type combinations) between validate-with-type-only and encode-with-actual-data is shared with the help of MaybeData and encode_field. The call graph of core functions is as follows (dot-src):

validate_encode

There are some encoding specific complexities:

  • Proto3 allows any field to be omitted. For list (repeated), it cannot be nested, and cannot contain nulls in the list.
  • Avro represents optional field as ["null", T] (union of null and T). The upstream library is also buggy in some aspects, requiring more details we need to handle properly on our own (see test_encode_avro_lib_bug).

We will extend to support more data type combinations gradually. For any unsupported RisingWave type, the user can trivially ::varchar in RisingWave as a workaround.

Note this is not usable by user yet. There is one last step to parse format_desc.options in SinkFormatterImpl::new to retrieve MessageDescriptor/AvroSchema and then construct these encoders.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@xiangjinwu xiangjinwu force-pushed the feat-sink-encoder-proto-avro branch 2 times, most recently from 0c75ec6 to c4930c2 Compare October 2, 2023 09:48
@xiangjinwu xiangjinwu force-pushed the feat-sink-encoder-proto-avro branch 3 times, most recently from 2352aac to 7062fdb Compare October 9, 2023 09:33
@xiangjinwu xiangjinwu marked this pull request as ready for review October 9, 2023 09:33
@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

Merging #12425 (e8db4eb) into main (2db6f94) will increase coverage by 0.11%.
Report is 12 commits behind head on main.
The diff coverage is 94.38%.

@@            Coverage Diff             @@
##             main   #12425      +/-   ##
==========================================
+ Coverage   69.26%   69.38%   +0.11%     
==========================================
  Files        1480     1482       +2     
  Lines      243571   244638    +1067     
==========================================
+ Hits       168721   169742    +1021     
- Misses      74850    74896      +46     
Flag Coverage Δ
rust 69.38% <94.38%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/connector/src/lib.rs 52.08% <ø> (ø)
src/connector/src/sink/encoder/json.rs 84.63% <100.00%> (ø)
src/connector/src/sink/mod.rs 60.71% <ø> (+0.71%) ⬆️
src/connector/src/sink/encoder/mod.rs 73.68% <95.83%> (+37.96%) ⬆️
src/connector/src/sink/encoder/proto.rs 91.51% <91.51%> (ø)
src/connector/src/sink/encoder/avro.rs 95.65% <95.65%> (ø)

... and 9 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xiangjinwu xiangjinwu force-pushed the feat-sink-encoder-proto-avro branch 6 times, most recently from ca3d74b to 8d57d0f Compare October 11, 2023 04:03
@xiangjinwu xiangjinwu changed the title feat(sink): (WIP) ProtoEncoder and AvroEncoder feat(sink): ProtoEncoder and AvroEncoder Oct 11, 2023
@xiangjinwu xiangjinwu force-pushed the feat-sink-encoder-proto-avro branch from 8d57d0f to 2717d5b Compare October 11, 2023 06:38
@xiangjinwu xiangjinwu force-pushed the feat-sink-encoder-proto-avro branch from 2717d5b to 268e581 Compare October 11, 2023 07:58
@xiangjinwu
Copy link
Contributor Author

Are there still concerns before we merge this?

@xiangjinwu xiangjinwu added this pull request to the merge queue Oct 16, 2023
Merged via the queue into main with commit ea27579 Oct 16, 2023
7 of 8 checks passed
@xiangjinwu xiangjinwu deleted the feat-sink-encoder-proto-avro branch October 16, 2023 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants