Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(udf): avoid panic on invalid output and fix decimal output scale lost #13828

Merged
merged 4 commits into from
Dec 7, 2023

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Dec 6, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR fixed the panic:

failed to convert UDF output to DataChunk: FromArrow("invalid decimal: \"-5.10043e-05\"")

First of all, as mentioned in #9002, compute nodes must not panic on errors caused by UDF servers. cc @KveinAxel

Next, after #11839, decimal values are stored as strings in the arrow array. However, the panic message indicates that a float number with scientific notation -5.10043e-05 was written by Python, and it couldn't be parsed by rust decimal. It turned out that user defined a function returning NUMERIC, but the actual return value is a float rather than Decimal. Even it returns a Decimal, its string form may still contains scientific notation and has a long precision number.

>>> from decimal import Decimal
>>> str(Decimal(1e-10))
'1.0000000000000000364321973154977415791655470655996396089904010295867919921875E-10'

When parsed by RisingWave, the tailing numbers and exponent will be truncated, resulting in wrong results.

dev=> select decimal();
            decimal             
--------------------------------
 1.0000000000000000364321973155

So, this PR:

  1. check the return value for Python functions that return DECIMAL and reject values other than Decimal type. (this problem doesn't exist in Java because it's strong typed)
  2. format printing decimals in Python and Java to avoid scientific notation

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

let data_chunk =
DataChunk::try_from(&output).expect("failed to convert UDF output to DataChunk");
let data_chunk = DataChunk::try_from(&output)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we only change this line without changing udf sdk? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will result in expression error: invalid decimal, or silently produce wrong value 1e-10 -> 1.000...

Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (59d56c3) 68.24% compared to head (7000acb) 68.23%.
Report is 8 commits behind head on main.

Files Patch % Lines
src/expr/core/src/expr/expr_udf.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13828      +/-   ##
==========================================
- Coverage   68.24%   68.23%   -0.01%     
==========================================
  Files        1525     1525              
  Lines      262214   262212       -2     
==========================================
- Hits       178946   178920      -26     
- Misses      83268    83292      +24     
Flag Coverage Δ
rust 68.23% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@yufansong yufansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Dec 7, 2023
Merged via the queue into main with commit fcea158 Dec 7, 2023
26 of 27 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/udf-dont-panic branch December 7, 2023 04:06
wangrunji0408 added a commit that referenced this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants