Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(udf): switch to the latest arrow-udf versions #16619

Merged
merged 32 commits into from
May 11, 2024
Merged

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented May 7, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Currently, decimal and jsonb types are mapped to Arrow LargeBinary and LargeString types in UDFs. However, since we have decoupled UDF code into the arrow-udf project, this design blocks community users from using LargeBinary and LargeString types in Arrow. (arrow-udf/arrow-udf#16) To solve this problem, we decided to use Arrow extension types for these special types. The migration on the arrow-udf side is completed. (arrow-udf/arrow-udf#17, arrow-udf/arrow-udf#18) This PR finishes the rest work on the RisingWave side, which is, switching to new arrow-udf versions and handling backward compatibility issues.

In detail, this PR:

  • removes risingwave-udf crate. the code is moved to arrow-udf-flight.
  • removes Python UDF SDK. the code is moved to arrow-udf-flight/python.
  • removes Java UDF SDK. the code is moved to arrow-udf-flight/java.
  • updates arrow-udf-* versions
  • introduces new Arrow conversion for decimal and jsonb types and keeps the old way as legacy. (see arrow_udf.rs)
  • checks the version of Python/Java/Rust UDF SDK at runtime to determine whether using the legacy conversion.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

The Python UDF SDK is updated. If using RisingWave 1.10, you are encouraged to update to the latest version following the migration guide (to be added later). Older versions are still supported, but may be deprecated in the future version.

Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
@wangrunji0408 wangrunji0408 requested a review from a team as a code owner May 7, 2024 14:08
Base automatically changed from wrj/arrow to main May 8, 2024 07:15
Signed-off-by: Runji Wang <[email protected]>
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -84,7 +90,7 @@ def hex_to_dec(hex: Optional[str]) -> Optional[Decimal]:
return dec


@udf(input_types=["FLOAT8"], result_type="DECIMAL")
@udf(input_types=["FLOAT64"], result_type="DECIMAL")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this considered a breaking change that should be notable for existing users?

Copy link
Contributor Author

@wangrunji0408 wangrunji0408 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a breaking change, but only for the new version (arrow-udf v0.2). Existing users will not be affected unless they try to upgrade to the new version.

@@ -22,7 +22,9 @@ embedded-python-udf = ["arrow-udf-python"]
[dependencies]
anyhow = "1"
arrow-array = { workspace = true }
arrow-flight = "50"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can arrow-udf-flight re-export this?

Copy link

gitguardian bot commented May 10, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password ec49c16 e2e_test/source/cdc/cdc.validate.postgres.slt View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Signed-off-by: Runji Wang <[email protected]>
@wangrunji0408 wangrunji0408 added this pull request to the merge queue May 11, 2024
Merged via the queue into main with commit 228b9e8 May 11, 2024
26 of 28 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/udf branch May 11, 2024 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants