Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): Support Sinking Enum16 Ids to Clickhouse #14668

Merged
merged 10 commits into from
Feb 26, 2024

Conversation

HurricanKai
Copy link
Contributor

@HurricanKai HurricanKai commented Jan 18, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This is an attempt at solving #14658 it simply allows the Clickhouse connector to sink Int16 / smallint as Enum16s to clickhouse. This "just works" although it's on the user to make sure all data provided is valid.

I've also tested Enum8, but presumably this has slightly different encoding, so causes CH to reject the row.

Checklist

  • I have written necessary rustdoc comments - I believe this is unecessary?
  • I have added necessary unit tests and integration tests - Could add E2E test but not sure, let me know what you want here. The change is pretty minimal.
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)
    TODO Will do this if this change is generally received positively! (Just a tiny change in the Data Type Mapping in the docs).

Release note

Allow sinking Enum16 values to Clickhouse

@HurricanKai HurricanKai changed the title Support Sinking Enum16 Ids to Clickhouse feat(sink): Support Sinking Enum16 Ids to Clickhouse Jan 18, 2024
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. Generally LGTM.

Would you mind adding a e2e test in https://github.com/risingwavelabs/risingwave/blob/9f9294e5957a9d0d89c3e709b8672c6dc5b72e9b/e2e_test/sink/clickhouse_sink.slt ? You need to modify the CK table schema and the result check here accordingly as well.

src/connector/src/sink/clickhouse.rs Show resolved Hide resolved
@HurricanKai
Copy link
Contributor Author

Added E2E tests & a comment. I couldn't get ./risedev d to actually run, it just hangs lld for me. Seems like the E2E tests are covered by CI so 🤞 that passes

Copy link
Contributor

@xxhZs xxhZs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e ci is error, please fix and try again

e2e_test/sink/clickhouse_sink.slt Outdated Show resolved Hide resolved
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the ordering of the result check in e2e-clickhouse-sink-test.sh doesn't match the ordering of the INSERT values. That is why you mistakenly put the wrong expected value in $4. Feel free to reorder the check.

ci/scripts/e2e-clickhouse-sink-test.sh Outdated Show resolved Hide resolved
ci/scripts/e2e-clickhouse-sink-test.sh Outdated Show resolved Hide resolved
@HurricanKai
Copy link
Contributor Author

I was away so just got to updating it. Everything seems to be working now! Thanks for the help 😄

@HurricanKai HurricanKai requested review from hzxa21 and xxhZs February 9, 2024 10:50
Copy link

gitguardian bot commented Feb 10, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- Generic Password bcb7492 ci/scripts/regress-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

Copy link
Member

@stdrc stdrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am personally a bit concerned about this PR. It seems rely heavily on the internal representation/encoding of Enum in ClickHouse, which may be subject to change. As you said, Enum8 does not work as expected because it's encoding may differ from Enum16, but we cannot find any explanation on the difference in ClickHouse docs. So I don't think this is type safe.

Would it be better if we can support Enum by enhancing ClickHouseField? I see the crate clickhouse supports Enum type while I am not sure if it's possible to utilize it in RisingWave.

@HurricanKai
Copy link
Contributor Author

Clickhouse will of course reject any rows that are not possible, when enum values don't exist, imo it's reasonable to ask a user to handle this.

Otherwise, it is type safe. Supporting enum8 is a matter of wiring through the Info that this is an enum and should be written as 8
bits instead.
I didn't find any enum support in the crate, I might've overlooked it. I did ask on the ClickHouse slack, and learned that when writing rows using any of the efficient formats, writing as an integer directly is the only option, and converting between string and integers is the job of the user.
Also the "layout" of enums is well defined. Even if you omit specifying numbers in the definition it is defined that it will be numbered sequentially starting from 1 or the value assigned to the first entry.

@stdrc
Copy link
Member

stdrc commented Feb 19, 2024

I didn't find any enum support in the crate, I might've overlooked it.

On the homepage of the crate on docs.rs:

截屏2024-02-19 13 29 54

But it seems not easy to utilized in this PR's situation.

I did ask on the ClickHouse slack, and learned that when writing rows using any of the efficient formats, writing as an integer directly is the only option, and converting between string and integers is the job of the user.

Then looks good to me. May wait for code owner's approval.

@HurricanKai
Copy link
Contributor Author

I see, thank you! I think that example only works because #[repr(u8)] essentially makes the enum transparently a u8 in rusts compiler, and it'll be automatically converted to u8 when passed to the crate. It's a nice convenience feature when using the crate directly, but as you say, can't make use of that here as the enums are dynamically created by the user.

Copy link
Contributor

@xxhZs xxhZs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@xxhZs xxhZs added this pull request to the merge queue Feb 20, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to Branch Protection failures Feb 20, 2024
You're not authorized to push to this branch. Visit "About protected branches" for more information.
@stdrc stdrc enabled auto-merge February 21, 2024 04:21
auto-merge was automatically disabled February 22, 2024 06:09

Merge queue setting changed

@stdrc stdrc added this pull request to the merge queue Feb 26, 2024
Merged via the queue into risingwavelabs:main with commit 9651f39 Feb 26, 2024
26 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants