Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(udf): Add deno as UDF language #15719

Merged
merged 46 commits into from
Apr 10, 2024

Conversation

bakjos
Copy link
Contributor

@bakjos bakjos commented Mar 15, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR add the option to use deno as udf language, it includes fetch, web and crypto API support.

CREATE FUNCTION digest( t string ) RETURNS bytea LANGUAGE javascript RUNTIME deno AS $$
    const subtle = crypto.subtle; 
    const key = await subtle.generateKey({
        name: 'HMAC',
        hash: 'SHA-256',
        length: 256,
        }, true, ['sign', 'verify']);
    
        const enc = new TextEncoder();
        const message = enc.encode(t);
    
        const result = await subtle.sign({
        name: 'HMAC',
        }, key, message);
        console.log('result', result);
        return result;
$$ ASYNC;
CREATE FUNCTION delay_response()
    RETURNS TABLE (x int) LANGUAGE javascript RUNTIME deno AS $$
                 const delayedResponses = {
                    delays: [50, 10, 15], 
                    wait(delay) {
                      return new Promise((resolve) => {
                        setTimeout(resolve, delay);
                      });
                    },
                  
                    async *[Symbol.asyncIterator]() {
                      for (const delay of this.delays) {
                        await this.wait(delay);
                        yield delay;
                      }
                    },
                  };
                return delayedResponses;
$$   SYNC;
CREATE FUNCTION fetch_api() RETURNS TABLE ( data struct< id int, name string, gender string >) LANGUAGE javascript RUNTIME deno AS $$
    const response = await fetch('https://rickandmortyapi.com/api/character');
    const resp = await response.json();
    for (const r of resp.results) {
        yield r;
    }
$$ ASYNC GENERATOR;

I added support for setting the function type when creating a function

type function declaration
ASYNC A java script async function, to enable the usage of the await api async function <name> { <body> }
ASYNC GENERATOR A java script async generator function, to enable the usage of the await and yield api async function* <name> { <body> }
GENERATOR A java script generator function, to enable the usage of the yield api function* <name> { <body> }
SYNC A regular javascript function, if a promise is returned, will wait until is fulfilled to return the result function <name> { <body> }

If not defined will use normal for UDFs and generator for table functions;

I put the deno language support under a feature flag, which is currently enabled by default. (Plan to remove before this PR gets merged).

It will require arrow-udf/arrow-udf#7 to get merged

I'd also like to use one of the table functions as a source (to receive data from websockets or sse). But it's not currently supported. What do you think would be a good way to do it?

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@bakjos bakjos requested a review from a team as a code owner March 15, 2024 16:36
Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your amazing work! Could you share the background and your motivation of the work? like why do you need it, what's its advantage over current js UDF, what's not achievable now?

Users may feel strange that there are both js UDF and deno UDF, if they are not very familiar with js runtimes. So it's important to figure out whether we want both of them, and if so, make clear the difference for users.

@bakjos
Copy link
Contributor Author

bakjos commented Mar 15, 2024

@xxchan TLDR The main motivation is being able to use web apis (fetch, crypto) as part of the UDFS, the full conversation here https://risingwave-community.slack.com/archives/C02T3F7UYM6/p1709825596494219

@xxchan
Copy link
Member

xxchan commented Mar 15, 2024

Got it. Sorry that I missed the Slack discussion. For such large feature, it would still be nicer if we open an issue to have larger visibility and more discussion before the PR. ❤️

@bakjos bakjos force-pushed the bakjos/deno_udf branch 2 times, most recently from 2e01bf8 to 61175b3 Compare March 18, 2024 14:34
@bakjos bakjos requested a review from xxchan March 18, 2024 14:43
@bakjos bakjos force-pushed the bakjos/deno_udf branch 2 times, most recently from e55f8c0 to 8fc2c6a Compare March 20, 2024 18:47
@bakjos
Copy link
Contributor Author

bakjos commented Mar 20, 2024

Also do you think the sintax CREATE [TABLE|SOURCE ] .. AS FUNCTION <function_name> (param1, param2) could work?

@wangrunji0408
Copy link
Contributor

wangrunji0408 commented Mar 21, 2024

I added support for setting a parameter when creating the function to define the function_type:

According to postgres' documentation, the SET param TO value is to override a specific system parameter during the function call. Using this syntax to define properties of the function itself seems like a misuse.

For simplicity, is it possible to just assume async function for scalar functions and async function* for table functions? It seems possible because I see that all functions will be converted to promises in arrow-udf-js-deno. However, if explicit async configuration can not be avoided, I'd prefer to introduce a new ASYNC keyword.

src/cmd_all/Cargo.toml Outdated Show resolved Hide resolved
let schema = serde_json::from_str::<Value>(&schema).unwrap();
let ans = serde_json::from_str::<Value>(ans).unwrap();
assert_eq!(schema, ans);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems irrelevant to the UDF feature?

Copy link
Contributor Author

@bakjos bakjos Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was due an update in the dependencies, the order of the string changed, so I changed it to use json instead of just plain string, because the order should not matter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was because deno_core adds the feature preserve_order for serde_json, I patched the dependency to remove it, but I'm not sure if this is the right place to do so

risingwave/Cargo.toml

Lines 309 to 310 in cd2c0d4

# patch to remove preserve_order from serde_json
deno_core = { git = "https://github.com/bakjos/deno_core", rev = "70d1544" }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this patch. cc @xxchan any better idea on this feature interaction?

src/frontend/src/handler/create_function.rs Outdated Show resolved Hide resolved
src/frontend/src/handler/create_function.rs Show resolved Hide resolved
src/frontend/src/handler/create_function.rs Show resolved Hide resolved
src/sqlparser/src/ast/mod.rs Show resolved Hide resolved
@wangrunji0408
Copy link
Contributor

wangrunji0408 commented Mar 21, 2024

Please also add some end-to-end tests in e2e_test/udf/js_deno_udf.slt. You can use the system command in sqllogictest to start a mock server. For example:

system ok
python3 e2e_test/udf/test.py &

@bakjos
Copy link
Contributor Author

bakjos commented Mar 21, 2024

I added support for setting a parameter when creating the function to define the function_type:

According to postgres' documentation, the SET param TO value is to override a specific system parameter during the function call. Using this syntax to define properties of the function itself seems like a misuse.

For simplicity, is it possible to just assume async function for scalar functions and async function* for table functions? It seems possible because I see that all functions will be converted to promises in arrow-udf-js-deno. However, if explicit async configuration can not be avoided, I'd prefer to introduce a new ASYNC keyword.

I like the idea of introducing new keywords, the main issue is with AsyncIterators vs generators in the table function since it changes the signature of the function, so SYNC ASYNC GENERATOR and ASYNC GENERATOR are needed

@bakjos bakjos requested a review from wangrunji0408 March 23, 2024 00:52
@bakjos bakjos force-pushed the bakjos/deno_udf branch 3 times, most recently from f0d4399 to a0c2285 Compare March 23, 2024 11:43
@wangrunji0408 wangrunji0408 added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@wangrunji0408 wangrunji0408 added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@wangrunji0408 wangrunji0408 added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@wangrunji0408 wangrunji0408 added this pull request to the merge queue Apr 10, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 10, 2024
@wangrunji0408
Copy link
Contributor

Seems the deno e2e test is still flaky. 👀

Caused by:
    query failed: connection closed
    [SQL] select * FROM call_sse();
    at e2e_test/udf/deno_udf.slt:233

@bakjos
Copy link
Contributor Author

bakjos commented Apr 10, 2024

Seems the deno e2e test is still flaky. 👀

Caused by:
    query failed: connection closed
    [SQL] select * FROM call_sse();
    at e2e_test/udf/deno_udf.slt:233

@wangrunji0408 This started to happen after I switched from rust to python to the mock server, I changed the protocol from HTTP 1 to HTTP 1.1 to see if fix it

@bakjos
Copy link
Contributor Author

bakjos commented Apr 10, 2024

@wangrunji0408 Updating the protocol didn't work, so I made some extra changes over arrow-udf/arrow-udf#12 and changed the tests to use a single mock server for both sse and rest, and it seems to have fixed the issue. I ran the test 100 times and got no crash/error. Also, merged the latest changes to run the tests twice, and it was also successful both times

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Apr 10, 2024
Merged via the queue into risingwavelabs:main with commit a1b4b51 Apr 10, 2024
31 checks passed
@xiangjinwu
Copy link
Contributor

@bakjos
Copy link
Contributor Author

bakjos commented Apr 11, 2024

@wangrunji0408 I created this PR #16263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants