Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(udf): fix decimal values #11839

Merged
merged 14 commits into from
Dec 1, 2023
Merged

fix(udf): fix decimal values #11839

merged 14 commits into from
Dec 1, 2023

Conversation

wangrunji0408
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR fixes #11823.

The problem is that Arrow has no data type equivalent to the DECIMAL (unconstrained numeric) in RisingWave/PostgreSQL. The decimal types in Arrow are fixed-point, while our decimal type is floating-point. Previously we used decimal(38,0) in arrow to represent decimal values. This is clearly wrong because it only accepts integer numbers.

This PR switches the arrow type for decimal to "large-binary". Values are stored as their string representations in the array, like [b"1.23", b"0", b"-inf"]. This may not be efficient in encoding, but is least error prone. And theoretically, this design can accept arbitrary-precision decimal values.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
@liurenjie1024
Copy link
Contributor

This solution is too hacky to me. Converting to arrow not only used in udf, but also used in inserting to iceberg/parquet.

@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented Aug 23, 2023

This solution is too hacky to me. Converting to arrow not only used in udf, but also used in inserting to iceberg/parquet.

But arrow does not have a type for floating-point decimal. It's impossible to use a fixed type decimal array to store them.

@liurenjie1024
Copy link
Contributor

This solution is too hacky to me. Converting to arrow not only used in udf, but also used in inserting to iceberg/parquet.

But arrow does not have a type for floating-point decimal. It's impossible to use a fixed type decimal array to store them.

Yeah, that's a problem. I think we need a discussion before deciding a solution.

@github-actions
Copy link
Contributor

This PR has been open for 60 days with no activity. Could you please update the status? Feel free to ping a reviewer if you are waiting for review.

Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Copy link

codecov bot commented Dec 1, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (da79ff5) 68.15% compared to head (9964686) 68.13%.

Files Patch % Lines
src/common/src/array/arrow.rs 69.23% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11839      +/-   ##
==========================================
- Coverage   68.15%   68.13%   -0.03%     
==========================================
  Files        1524     1524              
  Lines      262331   262309      -22     
==========================================
- Hits       178793   178721      -72     
- Misses      83538    83588      +50     
Flag Coverage Δ
rust 68.13% <69.23%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's better to add some comments to explain it.

@wangrunji0408
Copy link
Contributor Author

This change breaks decimal handling in iceberg. Should be fixed in the future.

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Dec 1, 2023
Merged via the queue into main with commit 0bd10c4 Dec 1, 2023
26 of 27 checks passed
@wangrunji0408 wangrunji0408 deleted the wrj/fix-udf-decimal branch December 1, 2023 07:34
wangrunji0408 added a commit that referenced this pull request Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Decimal processing is incorrect when using external.
2 participants