Skip to content

Commit

Permalink
Remove Nan/Inf values from simple_float_values.csv and enable overflo…
Browse files Browse the repository at this point in the history
…w tests. Also update compatibility guide.

Signed-off-by: Andy Grove <[email protected]>
  • Loading branch information
andygrove committed Jan 28, 2022
1 parent 334a2fc commit 240ac19
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 12 deletions.
4 changes: 3 additions & 1 deletion docs/compatibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,7 @@ date. Typically, one that overflowed.

### CSV Floating Point

Any number that overflows will not be turned into a null value.
Parsing floating-point values has the same limitations as [casting from string to float](#String-to-Float).

Also parsing of some values will not produce bit for bit identical results to what the CPU does.
They are within round-off errors except when they are close enough to overflow to Inf or -Inf which
Expand Down Expand Up @@ -473,6 +473,8 @@ The nested types(array, map and struct) are not supported yet in current version

### JSON Floating Point

Parsing floating-point values has the same limitations as [casting from string to float](#String-to-Float).

The GPU JSON reader does not support `NaN` and `Inf` values with full compatibility with Spark.

The following are the only formats that are parsed consistently between CPU and GPU. Any other variation, including
Expand Down
4 changes: 2 additions & 2 deletions integration_tests/src/main/python/csv_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,8 +229,8 @@ def read_impl(spark):
pytest.param('nan_and_inf.csv', _float_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/125')),
pytest.param('floats_invalid.csv', _float_schema, {'header': 'true'}),
pytest.param('floats_invalid.csv', _double_schema, {'header': 'true'}),
pytest.param('simple_float_values.csv', _float_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/126')),
pytest.param('simple_float_values.csv', _double_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/126')),
pytest.param('simple_float_values.csv', _float_schema, {'header': 'true'}),
pytest.param('simple_float_values.csv', _double_schema, {'header': 'true'}),
pytest.param('simple_boolean_values.csv', _bool_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/2071')),
pytest.param('ints_with_whitespace.csv', _number_as_string_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/2069')),
pytest.param('ints_with_whitespace.csv', _byte_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/130'))
Expand Down
10 changes: 1 addition & 9 deletions integration_tests/src/test/resources/simple_float_values.csv
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,4 @@ bad
1.7976931348623157E308
1.7976931348623157e+308
1.7976931348623158E308
1.2e-234
NAN
nan
NaN
Inf
-Inf
INF
-INF

1.2e-234

0 comments on commit 240ac19

Please sign in to comment.