Provide java udf server instead of libs. #10322

liurenjie1024 · 2023-06-14T01:52:42Z

Currently user needs to setup udf server by themselves. Another approach is that we can provide a udf server to the user, and loads user provided jar when startup. This way user only needs to focus on udf development.

liurenjie1024 · 2023-06-14T01:57:41Z

cc @wangrunji0408 Feel free to comment.

xxchan · 2023-07-18T12:12:28Z

I'm also thinking about this when considering #9002.

This might be a larger problem for cloud. If we allow arbitrary UDF server, we need extensive defensive checks. If we host the servers, and let users to register functions, we can at least ensure the protocol is correct... (avoid problems like #10828, #11022) But of course that might limit flexibility and increase operation burden. 🤔️

liurenjie1024 · 2023-07-19T02:04:00Z

I think providing udf server rather than only libs have many advantages:

Improve user experience. This way user only needs to focus on their bussiness logic, and uploading jars to some file server, then use statements like create udf xxx at s3://xx/bb.jar
Easier management and observability. There are many things to consider when deploying udf server, for example auto scaling, observability, failover, etc. These in fact require managed service.

xxchan · 2023-07-19T11:49:58Z

Btw, isn’t that Flink’s solution for Python UDF? i.e., the Python runtime process is fully managed by Flink. I think there are different solutions and each has different advantages: - Fully external (current): maximal flexibility, users can have any dependencies, and can have their own middlewares/gateway (to achieve scaling/observability/failover). Snowflake/Redshift supports this (Mainly deployed as Lambda function, and they both use JSON as the protocol). - Sidecar runtime process managed by RisingWave: User submits jar/py. I guess dependencies & debugging are not as good. - Separate server but manually deployed (like connector node) (can be deployed both by user and us?): Looks like a tradeoff between the above two. Does any other product support this? 🤔️

xxchan · 2023-07-19T11:53:23Z

There are many things to consider when deploying udf server, for example auto scaling, observability, failover, etc. These in fact require managed service.

Agree. But maybe providing a server (like connector node) for users isn't enough to solve problems like scaling/failover. Maybe the solution should be to allow users to deploy UDF to Lambda, and/or have our managed UDF servers.

…

On Wed, Jul 19, 2023 at 1:49 PM xxchan ***@***.***> wrote: Btw, isn’t that Flink’s solution for Python UDF? i.e., the Python runtime process is fully managed by Flink. I think there are different solutions and each has different advantages: - Fully external (current): maximal flexibility, users can have any dependencies, and can have their own middlewares/gateway (to achieve scaling/observability/failover). Snowflake/Redshift supports this (Mainly deployed as Lambda function, and they both use JSON as the protocol). - Sidecar runtime process managed by RisingWave: User submits jar/py. I guess dependencies & debugging are not as good. - Separate server but manually deployed (like connector node) (can be deployed both by user and us?) : Looks like a tradeoff between the above two. Does any other product support this? 🤔️ On Wed, 19 Jul 2023 at 04:04, Renjie Liu ***@***.***> wrote: > I think providing udf server rather than only libs have many advantages: > > 1. Improve user experience. This way user only needs to focus on > their bussiness logic, and uploading jars to some file server, then use > statements like create udf xxx at s3://xx/bb.jar > 2. Easier management and observability. There are many things to > consider when deploying udf server, for example auto scaling, > observability, failover, etc. These in fact require managed service. > > — > Reply to this email directly, view it on GitHub > <#10322 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AJBQZNL2SAMVRB3NWPC5473XQ46BVANCNFSM6AAAAAAZFU54EA> > . > You are receiving this because you commented.Message ID: > ***@***.***> >

liurenjie1024 mentioned this issue Jun 14, 2023

Tracking: User-Defined Function(UDF) Support #7405

Closed

26 tasks

github-actions bot added this to the release-0.20 milestone Jun 14, 2023

fuyufjh removed this from the release-1.0 milestone Jul 18, 2023

xxchan mentioned this issue Aug 7, 2023

Discussion: Optionally integrate Java UDF coprocessor with connector node? #11487

Closed

xxchan closed this as not planned Won't fix, can't repro, duplicate, stale May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide java udf server instead of libs. #10322

Provide java udf server instead of libs. #10322

liurenjie1024 commented Jun 14, 2023 •

edited

Loading

liurenjie1024 commented Jun 14, 2023

xxchan commented Jul 18, 2023

liurenjie1024 commented Jul 19, 2023

xxchan commented Jul 19, 2023 via email •

edited

Loading

xxchan commented Jul 19, 2023 via email •

edited

Loading

Provide java udf server instead of libs. #10322

Provide java udf server instead of libs. #10322

Comments

liurenjie1024 commented Jun 14, 2023 • edited Loading

liurenjie1024 commented Jun 14, 2023

xxchan commented Jul 18, 2023

liurenjie1024 commented Jul 19, 2023

xxchan commented Jul 19, 2023 via email • edited Loading

xxchan commented Jul 19, 2023 via email • edited Loading

liurenjie1024 commented Jun 14, 2023 •

edited

Loading

xxchan commented Jul 19, 2023 via email •

edited

Loading

xxchan commented Jul 19, 2023 via email •

edited

Loading