Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): upsert avro with schema registry #13007

Merged
merged 7 commits into from
Oct 26, 2023

Conversation

xiangjinwu
Copy link
Contributor

@xiangjinwu xiangjinwu commented Oct 23, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Only upsert, not append-only yet
    • Shorter to review. Will refactor to make it less repetitive.
  • Only avro, not protobuf yet
    • Requires more header (doc)
  • Only schema registry, not file / https / s3 yet
  • Only kafka, not kinesis or pulsar
    • Kinesis / Plusar does not allow key to be non-string bytes
    • Pulsar has its own schema registry (doc).

Note that apeend-only protobuf with file / https in kafka has been supported by #12858. We will fill the support matrix gradually.

SinkFormatterImpl::new will be refactored in a followup.

  • topic is used in schema registry name strategy, but does not apply to Redis.
  • db_name and sink_from_name is only useful for kafka connect (incl debezium) metadata.
  • AwsAuthProps is only useful when loading schema definition from s3.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

Supports kafka sink with upsert avro using schema registry.

Example create sink command can be found in e2e_test/sink/kafka/avro.slt

Same as source, the following options can be used after format upsert encode avro:

  • schema.registry
  • schema.registry.username
  • schema.registry.password
  • schema.registry.name.strategy (optional)
  • key.message (only when name strategy is set to record_name_strategy or topic_record_name_strategy)
  • message (only when name strategy is set to record_name_strategy or topic_record_name_strategy)

@xiangjinwu xiangjinwu force-pushed the feat-sink-kafka-avro-registry branch from 0cf2f02 to c1d6029 Compare October 23, 2023 12:58
@hzxa21 hzxa21 self-requested a review October 23, 2023 13:14
@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Merging #13007 (9158f9b) into main (6495d90) will decrease coverage by 0.02%.
Report is 2 commits behind head on main.
The diff coverage is 20.12%.

@@            Coverage Diff             @@
##             main   #13007      +/-   ##
==========================================
- Coverage   68.57%   68.55%   -0.02%     
==========================================
  Files        1496     1497       +1     
  Lines      251405   251538     +133     
==========================================
+ Hits       172389   172447      +58     
- Misses      79016    79091      +75     
Flag Coverage Δ
rust 68.55% <20.12%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/connector/src/schema/mod.rs 0.00% <ø> (ø)
src/connector/src/sink/encoder/mod.rs 89.74% <ø> (ø)
src/connector/src/sink/redis.rs 57.54% <66.66%> (-0.06%) ⬇️
src/frontend/src/handler/create_sink.rs 66.66% <0.00%> (ø)
src/connector/src/sink/kafka.rs 31.62% <0.00%> (-0.15%) ⬇️
src/connector/src/sink/encoder/avro.rs 94.30% <72.50%> (-1.35%) ⬇️
src/connector/src/sink/kinesis.rs 0.00% <0.00%> (ø)
src/connector/src/sink/pulsar.rs 0.00% <0.00%> (ø)
src/connector/src/schema/schema_registry/client.rs 0.63% <0.00%> (-0.05%) ⬇️
src/connector/src/sink/formatter/mod.rs 29.41% <5.00%> (-3.93%) ⬇️
... and 1 more

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gitguardian

This comment was marked as off-topic.

@xiangjinwu xiangjinwu force-pushed the feat-sink-kafka-avro-registry branch from b32cc4e to 9158f9b Compare October 25, 2023 05:53
@xiangjinwu
Copy link
Contributor Author

Conflicts resolved and ready for review

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally LGTM

Comment on lines +49 to +52
pub async fn fetch_schema(
format_options: &BTreeMap<String, String>,
topic: &str,
) -> Result<(SchemaWithId, SchemaWithId), SchemaFetchError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy from the source part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be merged eventually.

src/connector/src/sink/encoder/avro.rs Show resolved Hide resolved
src/connector/src/sink/formatter/mod.rs Show resolved Hide resolved
@xiangjinwu xiangjinwu added this pull request to the merge queue Oct 26, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 26, 2023
@xiangjinwu xiangjinwu added this pull request to the merge queue Oct 26, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 26, 2023
@xiangjinwu xiangjinwu added this pull request to the merge queue Oct 26, 2023
Merged via the queue into main with commit d6e5bec Oct 26, 2023
8 of 9 checks passed
@xiangjinwu xiangjinwu deleted the feat-sink-kafka-avro-registry branch October 26, 2023 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants