-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide java udf server instead of libs. #10322
Comments
cc @wangrunji0408 Feel free to comment. |
I'm also thinking about this when considering #9002. This might be a larger problem for cloud. If we allow arbitrary UDF server, we need extensive defensive checks. If we host the servers, and let users to register functions, we can at least ensure the protocol is correct... (avoid problems like #10828, #11022) But of course that might limit flexibility and increase operation burden. 🤔️ |
I think providing udf server rather than only libs have many advantages:
|
Btw, isn’t that Flink’s solution for Python UDF? i.e., the Python runtime
process is fully managed by Flink.
I think there are different solutions and each has different advantages:
- Fully external (current): maximal flexibility, users can have any dependencies, and can have their own middlewares/gateway (to achieve scaling/observability/failover). Snowflake/Redshift supports this (Mainly deployed as Lambda function, and they both use JSON as the protocol).
- Sidecar runtime process managed by RisingWave: User submits jar/py. I guess dependencies & debugging are not as good.
- Separate server but manually deployed (like connector node) (can be deployed both by user and us?): Looks like a tradeoff between the above two. Does any other product support this? 🤔️
|
There are many things to consider when deploying udf server, for example auto scaling, observability, failover, etc. These in fact require managed service.
Agree. But maybe providing a server (like connector node) for users isn't
enough to solve problems like scaling/failover. Maybe the solution should
be to allow users to deploy UDF to Lambda, and/or have our managed UDF
servers.
…On Wed, Jul 19, 2023 at 1:49 PM xxchan ***@***.***> wrote:
Btw, isn’t that Flink’s solution for Python UDF? i.e., the Python runtime
process is fully managed by Flink.
I think there are different solutions and each has different advantages:
- Fully external (current): maximal flexibility, users can have any
dependencies, and can have their own middlewares/gateway (to achieve
scaling/observability/failover). Snowflake/Redshift supports this (Mainly
deployed as Lambda function, and they both use JSON as the protocol).
- Sidecar runtime process managed by RisingWave: User submits jar/py. I
guess dependencies & debugging are not as good.
- Separate server but manually deployed (like connector node) (can be
deployed both by user and us?) : Looks like a tradeoff between the above
two. Does any other product support this? 🤔️
On Wed, 19 Jul 2023 at 04:04, Renjie Liu ***@***.***> wrote:
> I think providing udf server rather than only libs have many advantages:
>
> 1. Improve user experience. This way user only needs to focus on
> their bussiness logic, and uploading jars to some file server, then use
> statements like create udf xxx at s3://xx/bb.jar
> 2. Easier management and observability. There are many things to
> consider when deploying udf server, for example auto scaling,
> observability, failover, etc. These in fact require managed service.
>
> —
> Reply to this email directly, view it on GitHub
> <#10322 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AJBQZNL2SAMVRB3NWPC5473XQ46BVANCNFSM6AAAAAAZFU54EA>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
Currently user needs to setup udf server by themselves. Another approach is that we can provide a udf server to the user, and loads user provided jar when startup. This way user only needs to focus on udf development.
The text was updated successfully, but these errors were encountered: