Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(udf): introduce #[derive(StructType)] for struct types in Rust UDF #15372

Merged
merged 9 commits into from
Mar 14, 2024

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Feb 29, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

See background in arrow-udf/arrow-udf#2.

This PR changes the syntax for returning struct types in Rust UDFs, making it easier for users to define complex struct type.

#[derive(StructType)]
struct KeyValue<'a> {
    key: &'a str,
    value: &'a str,
}

#[function("key_value(varchar) -> struct KeyValue")]
fn key_value(kv: &str) -> Option<KeyValue<'_>> {
    let (key, value) = kv.split_once('=')?;
    Some(KeyValue { key, value })
}

It has been released with arrow-udf v0.2. The old syntax in v0.1 is deprecated. But old binaries built with v0.1 are still accepted. We maintain backward compatibility for WASM binaries.

This PR also optimized the speed of the statement above. Previously, it took over 10s because it always required a clean build of the Rust code. Now, we set up a fixed temporary directory for it so that we can reuse the build cache. Building from the second time the can be shortened to about 2s.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

updates the syntax in #14903

Comment on lines 206 to 200
let script = format!("#[arrow_udf::function(\"{identifier}\")]\n{function_body}");
body = Some(function_body.clone());
let script = format!("use arrow_udf::{{function, types::*}};\n{}", script);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the introduction of the new struct syntax, users now need to write #[function(...)] explicitly (previously we would prepend this macro for them).

What about checks whether user uses #[function in the script, and decide whether to added this for them? 🤡 In this way, we can keep the simple old syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a little hacky. But I like this idea. 🤡

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final rule:

  • if the return type is a struct, they have to write #[function] by themselves.
  • otherwise, we add it for them.

@BugenZhao
Copy link
Member

BugenZhao commented Mar 1, 2024

Due to the introduction of the new struct syntax, users now need to write #[function(...)] explicitly (previously we would prepend this macro for them).

Can you elaborate more on what blocks us to infer the function attribute after this PR?

  • For non-struct return types, I suppose there should be no change on both syntax and implementation.
  • For struct return types, can we identify these cases based on an unrecognized return type of the function (other than well-known types like i32 or bool)?

@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented Mar 1, 2024

Can you elaborate more on what blocks us to infer the function attribute after this PR?

Previously the attribute is:

#[function("key_value(varchar) -> struct<key:varchar, value:varchar>")]

We have changed it to:

#[function("key_value(varchar) -> struct KeyValue")]

where KeyValue is a struct defined in Rust code.

If we also support user-defined type in SQL:

create type KeyValue as (key varchar, value varchar);

The syntax could be more natural and consistent:

create function key_value(varchar) returns KeyValue language rust as $$
    // generate: #[function("key_value(varchar) -> struct KeyValue")]
    fn key_value(kv: &str) -> Option<KeyValue<'_>> {...}
$$;

For struct return types, can we identify these cases based on an unrecognized return type of the function (other than well-known types like i32 or bool)?

Yes, it's feasible. While we don't support user-defined type, we can still infer its struct schema from the WASM binary. The binary has stored the schema as a string, like KeyValue=struct<key:varchar, value:varchar>. RW can decode this metadata to resolve the real struct type.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

@wangrunji0408 wangrunji0408 enabled auto-merge March 14, 2024 09:15
@wangrunji0408 wangrunji0408 added this pull request to the merge queue Mar 14, 2024
Merged via the queue into main with commit 2a03233 Mar 14, 2024
27 of 28 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/rust-udf-0.2 branch March 14, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants