-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(expr): tolerate table function errors on streams #17156
Conversation
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
statement ok | ||
flush; | ||
|
||
# Output 0 row when the set-returning function returns error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an discussion on the behavior in this case. #12474 (comment) cc @fuyufjh
Not sure which one we should adopt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior for a table function not being called (e.g. with null input) is returning no rows. So I vote for this for the sake of consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behavior for a table function not being called (e.g. with null input) is returning no rows. So I vote for this for the sake of consistency.
I agree that these 2 cases should be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Option 1 in the original issue. And I agree it is consistent with null input. In other words, the table function returns null
(which is similar to {}
but not {null}
) on error.
Related edges cases on the differences among null
, {}
and {null}
:
array_agg
function on zero input rows return no rows withgroup by
, but anull
row with scalar agg. fix(optimizer): decorrelate SimpleAgg witharray_agg
/jsonb_agg
/jsonb_object_agg
#15590- Scalar subquery returning zero rows result in a
null
row. Array constructing subquery returning zero rows result in a{}
. fix(planner): array-construction subquery shall return{}
rather thannull
#15593
^ Just mentioning these cases. Not suggesting any value is always preferred than the other 2.
Signed-off-by: Runji Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Runji Wang <[email protected]>
2c52ee3
to
9b350e9
Compare
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Cool idea 👍
What I don't fully understand is where is the line leading to panic before? 🤣
When an error occurs, an error column will be appended to the returned chunk. (see the example in arrow-udf)
To clarify, arrow_udf::function
and risingwave_expr::function
don't have strict relationship, but share similar ideas. Do I understand it correctly?
I think there won't be any panic. We'll just enter a recovery loop where the expression error gets thrown out of the actor. |
Yes. They share the same behavior behind the interface so that we don't need to write additional adapters for UDFs. |
Signed-off-by: Runji Wang [email protected]I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
partially fix_ #11915
Currently, when an error occurs in a table function call, the returned stream stops and the streaming actor will exit. This PR makes streaming tolerant to table function errors. When an error occurs, an error column will be appended to the returned chunk. (see the example in arrow-udf) The batch side will raise error immediately when seeing this column, while the streaming side will report errors and keep going. If some rows have been output when the error occurs, these rows will not be reverted.
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.