Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: WASM UDF MVP #10910

Closed
wants to merge 5 commits into from
Closed

feat: WASM UDF MVP #10910

wants to merge 5 commits into from

Conversation

xxchan
Copy link
Member

@xxchan xxchan commented Jul 12, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

To test it, just run:

./src/udf/wit_example/create.sh

TODO

  • user friendly SDK (proc macro)
    • More crazily, what about let user upload Rust, and let us compile it ...?
  • consider a upgrade procedure if breaking changes happens
  • schema check
  • memory/CPU limitation

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR contains user-facing changes.

Types of user-facing changes

  • SQL commands, functions, and operators

Release note

Support WebAssembly UDF.

Note

Powered by WebAssembly Component Model, which is still work in progress, although already usable. Therefore, RisingWave Wasm UDF is also highly experimental, and thus we cannot provide stability guarantee.

Usage

Users can create a WASM component with the WIT file src/udf/wit/udf.wit and Apache Arrow. They can use different programming languages (rust and golang are provided examples).

To create a function with the compiled WASM component,

-- $encoded is base64-encoded WASM component
CREATE FUNCTION foo(...) RETURNS ... LANGUAGE wasm_v1 USING BASE64 '$encoded';

See src/udf/wit_example for example code & required tools.

Configuration

One additional system parameter wasm_storage_url is added (defaults to fs://@/tmp/risingwave, which is used to store the user-uploaded WASM file, and intermediate compilation artifacts.

xxchan added a commit that referenced this pull request Jul 12, 2023
@xxchan xxchan force-pushed the xxchan/wasm-udf branch from 746c432 to b702d0a Compare July 12, 2023 20:53
@xxchan xxchan force-pushed the xxchan/wasm-udf branch 5 times, most recently from 237d240 to 3039d04 Compare July 13, 2023 21:17
@github-actions github-actions bot added the user-facing-changes Contains changes that are visible to users label Jul 13, 2023
@xxchan xxchan force-pushed the xxchan/wasm-udf branch from 3039d04 to a7d172d Compare July 17, 2023 12:48
Cargo.lock Outdated Show resolved Hide resolved
@wangrunji0408 wangrunji0408 self-requested a review August 2, 2023 10:40
identifier: &str,
) -> WasmUdfResult<InstantiatedComponent> {
let object_store = get_wasm_storage(wasm_storage_url).await?;
let serialized_component = object_store.read(&compiled_path(identifier), None).await?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME: This stuck forever when I use S3. WHY??? 🥵

Copy link
Member Author

@xxchan xxchan Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: this was because of using futures rt inside tokio block_on (or sth). Already fixed by tokio::task::block_in_place(|| { tokio::runtime::Handle::current().block_on

commit b52a004
Author: xxchan <[email protected]>
Date:   Thu Oct 26 14:25:18 2023 +0800

    update arrow-ipc

commit e94feeb
Author: xxchan <[email protected]>
Date:   Thu Oct 26 06:21:34 2023 +0000

    Fix "cargo-hakari"

commit 08a5601
Merge: 56e6fc4 942e99d
Author: xxchan <[email protected]>
Date:   Thu Oct 26 14:19:34 2023 +0800

    Merge branch 'main' into xxchan/wasm-udf

commit 942e99d
Author: Yufan Song <[email protected]>
Date:   Wed Oct 25 22:10:31 2023 -0700

    fix(nats-connector): change stream into optional string, add replace stream name logic (#13024)

commit 90fb4a3
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 26 04:25:11 2023 +0000

    chore(deps): Bump comfy-table from 7.0.1 to 7.1.0 (#13049)

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit b724be7
Author: jinser <[email protected]>
Date:   Thu Oct 26 00:26:15 2023 +0800

    feat: add `comment on` clause support (#12849)

    Co-authored-by: Richard Chien <[email protected]>
    Co-authored-by: August <[email protected]>

commit 7f791d6
Author: August <[email protected]>
Date:   Wed Oct 25 20:29:16 2023 +0800

    feat: move model_v2 and model_migration into a separate crates (#13058)

commit 7f82929
Author: Noel Kwan <[email protected]>
Date:   Wed Oct 25 16:57:45 2023 +0800

    fix(meta): persist internal tables of `CREATE TABLE` (#13039)

commit 09a67ab
Author: Noel Kwan <[email protected]>
Date:   Wed Oct 25 16:49:08 2023 +0800

    fix: `WAIT` should return error if timeout (#13045)

commit e48547d
Author: Runji Wang <[email protected]>
Date:   Wed Oct 25 16:41:16 2023 +0800

    refactor(type): switch jsonb to flat representation (#12952)

    Signed-off-by: Runji Wang <[email protected]>

commit 56e6fc4
Author: xxchan <[email protected]>
Date:   Wed Oct 25 15:33:36 2023 +0800

    fix merge issue

commit c644361
Merge: fcd6992 2d428b1
Author: xxchan <[email protected]>
Date:   Wed Oct 25 15:23:44 2023 +0800

    Merge remote-tracking branch 'origin/main' into xxchan/wasm-udf

commit fcd6992
Author: xxchan <[email protected]>
Date:   Wed Oct 25 14:28:53 2023 +0800

    fix s3 stuck

commit 21e9740
Author: xxchan <[email protected]>
Date:   Wed Oct 25 12:47:24 2023 +0800

    Revert "fix s3 stuck (why?)"

    This reverts commit f19a6b4.

commit f19a6b4
Author: xxchan <[email protected]>
Date:   Wed Sep 13 14:32:28 2023 +0800

    fix s3 stuck (why?)

commit 019f309
Author: xxchan <[email protected]>
Date:   Tue Sep 12 15:29:52 2023 +0800

    ON_ERROR_STOP=1

commit 6e4ee3c
Author: xxchan <[email protected]>
Date:   Tue Sep 12 15:09:58 2023 +0800

    generate-config

commit b63a1c3
Merge: 2b0cc96 53611bf
Author: xxchan <[email protected]>
Date:   Tue Sep 12 14:53:10 2023 +0800

    Merge remote-tracking branch 'origin/main' into xxchan/wasm-udf

commit 2b0cc96
Author: xxchan <[email protected]>
Date:   Sat Sep 9 23:49:43 2023 +0800

    fix conflicts

commit 6b13fe3
Author: xxchan <[email protected]>
Date:   Sat Sep 9 23:35:50 2023 +0800

    update system param default

commit a273943
Merge: cc34bfe f649aa6
Author: xxchan <[email protected]>
Date:   Sat Sep 9 23:33:38 2023 +0800

    Merge remote-tracking branch 'origin/main' into xxchan/wasm-udf

commit cc34bfe
Author: xxchan <[email protected]>
Date:   Tue Aug 1 17:47:42 2023 +0200

    use count_char as the example

commit f913f63
Merge: 53bf8e0 2637dbd
Author: xxchan <[email protected]>
Date:   Tue Aug 1 17:22:13 2023 +0200

    Merge branch 'main' into xxchan/wasm-udf

commit 53bf8e0
Author: xxchan <[email protected]>
Date:   Mon Jul 31 14:20:07 2023 +0200

    minor update

commit 70cee42
Author: xxchan <[email protected]>
Date:   Mon Jul 17 14:53:29 2023 +0200

    fix arrow_schema into -> try_into

commit a7d172d
Author: xxchan <[email protected]>
Date:   Fri Jul 14 16:31:20 2023 +0200

    buf format

commit 43a3290
Author: xxchan <[email protected]>
Date:   Thu Jul 13 23:04:16 2023 +0200

    add tinygo example & turn on wasi support

commit 61a4998
Author: xxchan <[email protected]>
Date:   Wed Jul 12 11:40:56 2023 +0200

    cleanup

commit 165d4d9
Author: xxchan <[email protected]>
Date:   Wed Jul 12 11:02:44 2023 +0200

    use object store to store wasm

commit 88979e4
Author: xxchan <[email protected]>
Date:   Tue Jul 11 15:32:52 2023 +0200

    add wasm_storage_url system param

commit a897320
Author: xxchan <[email protected]>
Date:   Thu Jul 6 20:04:45 2023 +0200

    Load compiled wasm module in expr 🚀🚀🚀

commit 63b3523
Author: xxchan <[email protected]>
Date:   Sun Jul 2 19:27:22 2023 +0200

    it works (although very slow)
@wangrunji0408 wangrunji0408 force-pushed the xxchan/wasm-udf branch 2 times, most recently from 69a280b to 5b53a7c Compare December 8, 2023 06:52
Copy link
Contributor

@wangrunji0408 wangrunji0408 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested and it works! Great job! Next I'm going to review the code and begin to learn wasm. 🤡

cfg-or-panic = "0.2"
futures-util = "0.3.28"
itertools = "0.11"
risingwave_object_store = { workspace = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to decouple UDF from object store and put related code into the compute node.

src/udf/Cargo.toml Outdated Show resolved Hide resolved
@@ -25,7 +25,7 @@ tokio = { version = "0.2", package = "madsim-tokio" }
tracing = "0.1"

[dev-dependencies]
itertools = "0.10.5"
itertools = "0.11"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
itertools = "0.11"
itertools = "0.12"

src/udf/wit/udf.wit Outdated Show resolved Hide resolved
Comment on lines 16 to 17
# debug: 23557258
# release: 12457072
Copy link
Contributor

@wangrunji0408 wangrunji0408 Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary size looks much larger than I imagine. Is there a way to strip the binary? or is it possible to build on target wasm32-unknown-unknown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to build on target wasm32-unknown-unknown?

This is not possible and there's a long story. But I forget the details now. 🤡

Is there a way to strip the binary?

I think so. I remember there are some tools for this purpose. And IIRC currently debuginfo is also included.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please tell me when you recall the story. 🤡

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe my memory was wrong.

I remember:

  • We can definitely compile a wit module to wasm32-unknown-unknown and run it.
  • There are many things don't work in wasm32-unknown-unknown.
  • From the runtime side (host), API for wasi and non-wasi are different.

But I don't remember:

  • (Mainly) Whether the WIT UDF example (especially arrow-rs) works in wasm32-unknown-unknown
  • Whether a non-wasi module can be run with the wasi host API.

Maybe I was using wasi just because I can println 🤡

@xxchan
Copy link
Member Author

xxchan commented Dec 8, 2023

Next I'm going to review the code and begin to learn wasm. 🤡

I will also need to relearn it 🤡

WasmUdf(
#[from]
#[backtrace]
risingwave_udf::wasm::WasmUdfError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this error be merged into risingwave_udf::Error?

Comment on lines +35 to +36
// for backward compatibility, newly added fields should be optional
pub extra: Option<PbExtra>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this struct can be reorganized without worrying about backward compatibility like protobuf messages.

src/udf/wit_example/rust/Cargo.toml Outdated Show resolved Hide resolved
Comment on lines 24 to 25
[profile.release]
debug = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the binary size in release mode could be reduced to 3MB.

Suggested change
[profile.release]
debug = 1

}

impl Udf for CountChar {
fn eval(batch: RecordBatch) -> Result<RecordBatch, EvalErrno> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to build another #[function] macro to generate these code. It would be very similar to our internal #[function] macro. The only difference is fitting Arrow arrays.

Comment on lines 6 to 8
// TODO: is schema needed? since record-batch already contains schema.
export input-schema: func() -> schema
export output-schema: func() -> schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's needed as a signature of the function.

export input-schema: func() -> schema
export output-schema: func() -> schema

// export init: func(inputs: list<scalar>) -> result<_, init-errno>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this method for?

src/udf/wit/udf.wit Outdated Show resolved Hide resolved
@@ -48,7 +51,7 @@ pub async fn handle_create_function(
Some(lang) => {
let lang = lang.real_value().to_lowercase();
match &*lang {
"python" | "java" => lang,
"python" | "java" | "wasm_v1" => lang,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version can be encoded as metadata in wasm binaries so that the language can be always "wasm".

@xxchan xxchan closed this Dec 29, 2023
@xxchan xxchan deleted the xxchan/wasm-udf branch April 18, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants