-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udf): add initial support for WASM-based Rust UDF #14271
Conversation
commit b52a004 Author: xxchan <[email protected]> Date: Thu Oct 26 14:25:18 2023 +0800 update arrow-ipc commit e94feeb Author: xxchan <[email protected]> Date: Thu Oct 26 06:21:34 2023 +0000 Fix "cargo-hakari" commit 08a5601 Merge: 56e6fc4 942e99d Author: xxchan <[email protected]> Date: Thu Oct 26 14:19:34 2023 +0800 Merge branch 'main' into xxchan/wasm-udf commit 942e99d Author: Yufan Song <[email protected]> Date: Wed Oct 25 22:10:31 2023 -0700 fix(nats-connector): change stream into optional string, add replace stream name logic (#13024) commit 90fb4a3 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Oct 26 04:25:11 2023 +0000 chore(deps): Bump comfy-table from 7.0.1 to 7.1.0 (#13049) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit b724be7 Author: jinser <[email protected]> Date: Thu Oct 26 00:26:15 2023 +0800 feat: add `comment on` clause support (#12849) Co-authored-by: Richard Chien <[email protected]> Co-authored-by: August <[email protected]> commit 7f791d6 Author: August <[email protected]> Date: Wed Oct 25 20:29:16 2023 +0800 feat: move model_v2 and model_migration into a separate crates (#13058) commit 7f82929 Author: Noel Kwan <[email protected]> Date: Wed Oct 25 16:57:45 2023 +0800 fix(meta): persist internal tables of `CREATE TABLE` (#13039) commit 09a67ab Author: Noel Kwan <[email protected]> Date: Wed Oct 25 16:49:08 2023 +0800 fix: `WAIT` should return error if timeout (#13045) commit e48547d Author: Runji Wang <[email protected]> Date: Wed Oct 25 16:41:16 2023 +0800 refactor(type): switch jsonb to flat representation (#12952) Signed-off-by: Runji Wang <[email protected]> commit 56e6fc4 Author: xxchan <[email protected]> Date: Wed Oct 25 15:33:36 2023 +0800 fix merge issue commit c644361 Merge: fcd6992 2d428b1 Author: xxchan <[email protected]> Date: Wed Oct 25 15:23:44 2023 +0800 Merge remote-tracking branch 'origin/main' into xxchan/wasm-udf commit fcd6992 Author: xxchan <[email protected]> Date: Wed Oct 25 14:28:53 2023 +0800 fix s3 stuck commit 21e9740 Author: xxchan <[email protected]> Date: Wed Oct 25 12:47:24 2023 +0800 Revert "fix s3 stuck (why?)" This reverts commit f19a6b4. commit f19a6b4 Author: xxchan <[email protected]> Date: Wed Sep 13 14:32:28 2023 +0800 fix s3 stuck (why?) commit 019f309 Author: xxchan <[email protected]> Date: Tue Sep 12 15:29:52 2023 +0800 ON_ERROR_STOP=1 commit 6e4ee3c Author: xxchan <[email protected]> Date: Tue Sep 12 15:09:58 2023 +0800 generate-config commit b63a1c3 Merge: 2b0cc96 53611bf Author: xxchan <[email protected]> Date: Tue Sep 12 14:53:10 2023 +0800 Merge remote-tracking branch 'origin/main' into xxchan/wasm-udf commit 2b0cc96 Author: xxchan <[email protected]> Date: Sat Sep 9 23:49:43 2023 +0800 fix conflicts commit 6b13fe3 Author: xxchan <[email protected]> Date: Sat Sep 9 23:35:50 2023 +0800 update system param default commit a273943 Merge: cc34bfe f649aa6 Author: xxchan <[email protected]> Date: Sat Sep 9 23:33:38 2023 +0800 Merge remote-tracking branch 'origin/main' into xxchan/wasm-udf commit cc34bfe Author: xxchan <[email protected]> Date: Tue Aug 1 17:47:42 2023 +0200 use count_char as the example commit f913f63 Merge: 53bf8e0 2637dbd Author: xxchan <[email protected]> Date: Tue Aug 1 17:22:13 2023 +0200 Merge branch 'main' into xxchan/wasm-udf commit 53bf8e0 Author: xxchan <[email protected]> Date: Mon Jul 31 14:20:07 2023 +0200 minor update commit 70cee42 Author: xxchan <[email protected]> Date: Mon Jul 17 14:53:29 2023 +0200 fix arrow_schema into -> try_into commit a7d172d Author: xxchan <[email protected]> Date: Fri Jul 14 16:31:20 2023 +0200 buf format commit 43a3290 Author: xxchan <[email protected]> Date: Thu Jul 13 23:04:16 2023 +0200 add tinygo example & turn on wasi support commit 61a4998 Author: xxchan <[email protected]> Date: Wed Jul 12 11:40:56 2023 +0200 cleanup commit 165d4d9 Author: xxchan <[email protected]> Date: Wed Jul 12 11:02:44 2023 +0200 use object store to store wasm commit 88979e4 Author: xxchan <[email protected]> Date: Tue Jul 11 15:32:52 2023 +0200 add wasm_storage_url system param commit a897320 Author: xxchan <[email protected]> Date: Thu Jul 6 20:04:45 2023 +0200 Load compiled wasm module in expr 🚀🚀🚀 commit 63b3523 Author: xxchan <[email protected]> Date: Sun Jul 2 19:27:22 2023 +0200 it works (although very slow)
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
@@ -182,3 +182,4 @@ backup_storage_url = "memory" | |||
backup_storage_directory = "backup" | |||
max_concurrent_creating_streaming_jobs = 1 | |||
pause_on_next_bootstrap = false | |||
wasm_storage_url = "fs://.risingwave/data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what the default path should be in a production environment. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to none?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path .risingwave/data
should be specific to RiseDev. 🤔 Maybe you want to set it with risedev d
?
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
9c76f14
to
667bad6
Compare
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
/// Runtimes returned by this function are cached inside for at least 60 seconds. | ||
/// Later calls with the same link will reuse the same runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we essentially trying to reuse the same runtime for different actors/tasks for the same job? BTW, what's the benefits or overheads for reusing or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The runtime supports parallel calls from multiple threads. A WASM instance can only be called from a single thread (for now, multi-thread support is on the way), but we can use an instance pool to share and recycle instances across threads. The benefits of reuse are primarily saving memory, reducing the overhead of duplicate JITs and repeated downloads from object store.
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
I prefer to merge it without documenting it now. |
So we will maintain the doc inside this repo but won't publish it into risingwave.dev? |
Why not just storing them in meta store? In most cases the WASM won't be very big and meta store (will be migrated to RDBMS e.g. PostgresQL) should be good enough. I prefer to make it simple and avoid introduce new dependent systems. A related discussion: #12982. Also recommend meta store for it in my opinion. |
The wasm binary in e2e test is about 1.5MB after strip, 400KB after compression. It sounds okay to store them into meta. |
Let me note it down as an issue as a reminder. |
Signed-off-by: Runji Wang <[email protected]> Co-authored-by: xxchan <[email protected]> Co-authored-by: wangrunji0408 <[email protected]>
Signed-off-by: Runji Wang <[email protected]> Co-authored-by: xxchan <[email protected]> Co-authored-by: wangrunji0408 <[email protected]> Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]> Co-authored-by: xxchan <[email protected]> Co-authored-by: wangrunji0408 <[email protected]> Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]> Co-authored-by: xxchan <[email protected]> Co-authored-by: wangrunji0408 <[email protected]>
} | ||
|
||
/// Convert a data type to string used in identifier. | ||
fn datatype_name(ty: &DataType) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't we use DataType
's Display
? 🤪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type names defined in WASM module are slightly different than RW's type names, as the former follows Arrow specification while the latter follows postgres'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think arrow doesn't specified things like T[]
, and thus the "string representation of a datatype" is still kind of arbitrary IMO.
So does the signature matter in the arrow-udf-wasm crate, or it can be decided by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think arrow doesn't specified things like
T[]
, and thus the "string representation of a datatype" is still kind of arbitrary IMO.
Sure. T[]
is still pg style and should be changed to list<T>
in my opinion.
So does the signature matter in the arrow-udf-wasm crate, or it can be decided by the caller?
The signature format is a part of arrow-udf
API and may vary between versions. (e.g. from API version 2 to 3, the type name of bigint
was changed from int8
to int64
, in order to be consistent with Arrow and avoid ambiguity) The caller needs to use the correct name according to the API version declared by the WASM module.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR adds an initial support for Rust UDF, based on @xxchan's #10910.
Please see
src/expr/udf/README.md
for usage and current limitations.The core implementation of UDF is maintained in an independent crate arrow-udf. It depends on arrow-rs only and is intended to be shared by the Rust community. However, its API design is mostly inherited from our built-in functions and will be kept as consistent as possible. Example:
On RW side, we provide two ways to create functions from WASM module:
wasm_storage_url
.You may have noticed that
AS <identifier>
is no longer required. Because this time we define a unique identifier for each combination of function name and data types. These identifiers are base64 encoded as function symbols, and can be decoded by UDF runtime when loading WASM modules.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
Add experimental support for Rust UDF.
See src/expr/udf/README.md for a draft document.