-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(udf): support always_retry_on_network_error
config for udf functions
#15163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: What's the relationship with #15171?
This PR makes it configurable as an In the other PR, it just hardcodes it to always retry. Reason being I want to introduce as little code as possible to the user's cluster to avoid breakage, and their UDFs must always retry. |
Can @xxchan and @wangrunji0408 PTAL at this? I want to include this in 1.7. Since it's a breaking change, so users can upgrade their UDF functions ASAP, since they have to recreate all dependent MVs. |
Can review the code first, there's some separate failure due to some migration issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I'm not sure what's the best way, but I think offering this option doesn't hurt.
@@ -126,7 +128,12 @@ impl UserDefinedFunction { | |||
UdfImpl::JavaScript(runtime) => runtime.call(&self.identifier, &input)?, | |||
UdfImpl::External(client) => { | |||
let disable_retry_count = self.disable_retry_count.load(Ordering::Relaxed); | |||
let result = if disable_retry_count != 0 { | |||
let result = if self.always_retry_on_network_error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put all disable_retry_count
related stuff in else
branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm no difference to me? Seems to add more nesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mixing them together can lead to confusion. Why do we still update disable_retry_count
when always_retry_on_network_error
?
If nesting doesn't look good, we can add a sth like call_with_ disable_retry_count
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't get what you meant originally. Now I do. Updated it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Runji Wang <[email protected]>
45e305a
to
44bd236
Compare
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Closes #15137.
The main issue is that when UDF call fails, the function call returns null instead.
If the function call updates a
key
, e.g. forjoin
orgroup by
, it is possible for the state to be inconsistent.As such we provide the option to define a udf function which will always retry on network errors, since we cannot tolerate the UDF failing non-deterministically in such cases.
Here are the changes:
CreateFunctionWithOptions
.option
for each of its fields. This is to make it compatible toDisplay
(otherwise noWITH
options, or only some options set, we still display allCreateFunctionWithOptions
. See tests for examples). This also preserves the AST structure.CreateFunctionWithOptions
into parameters toFunction
,UserDefinedFunction
definitions.always_retry_on_network_error
.CreateFunctionWithOptions
.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.
Users can now do:
This means network errors will always be retried for function calls of that function.
Note that the entire stream graph will be blocked when UDF server goes offline, until it is back online.