Skip to content

Commit

Permalink
Add compatibility documentation with respect to decimal overflow dete…
Browse files Browse the repository at this point in the history
…ction (#9864)

Signed-off-by: Jason Lowe <[email protected]>
  • Loading branch information
jlowe authored Nov 27, 2023
1 parent d8b0a41 commit 15ac047
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,21 @@ after Spark 3.1.0.
We do not disable operations that produce different results due to `-0.0` in the data because it is
considered to be a rare occurrence.

## Decimal Support

Apache Spark supports decimal values with a precision up to 38. This equates to 128-bits.
When processing the data, in most cases, it is temporarily converted to Java's `BigDecimal` type
which allows for effectively unlimited precision. Overflows will be detected whenever the
`BigDecimal` value is converted back into the Spark decimal type.

The RAPIDS Accelerator does not implement a GPU equivalent of `BigDecimal`, but it does implement
computation on 256-bit values to allow the detection of overflows. The points at which overflows
are detected may differ between the CPU and GPU. Spark gives no guarantees that overflows are
detected if an intermediate value could overflow the original decimal type during computation
but the final value does not (e.g.: a sum of values with many large positive values followed by
many large negative values). Spark injects overflow detection at various points during aggregation,
and these points can fluctuate depending on cluster shape and number of shuffle partitions.

## Unicode

Spark delegates Unicode operations to the underlying JVM. Each version of Java complies with a
Expand Down

0 comments on commit 15ac047

Please sign in to comment.