Remove Nan/Inf values from simple_float_values.csv and enable overflo…

…w tests. Also update compatibility guide. Signed-off-by: Andy Grove <[email protected]>
NVIDIA · Jan 28, 2022 · 240ac19 · 240ac19
1 parent 334a2fc
commit 240ac19
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 12 deletions.
diff --git a/docs/compatibility.md b/docs/compatibility.md
@@ -377,7 +377,7 @@ date. Typically, one that overflowed.
 
 ### CSV Floating Point
 
-Any number that overflows will not be turned into a null value.
+Parsing floating-point values has the same limitations as [casting from string to float](#String-to-Float).
 
 Also parsing of some values will not produce bit for bit identical results to what the CPU does.
 They are within round-off errors except when they are close enough to overflow to Inf or -Inf which
@@ -473,6 +473,8 @@ The nested types(array, map and struct) are not supported yet in current version
 
 ### JSON Floating Point
 
+Parsing floating-point values has the same limitations as [casting from string to float](#String-to-Float).
+
 The GPU JSON reader does not support `NaN` and `Inf` values with full compatibility with Spark.
 
 The following are the only formats that are parsed consistently between CPU and GPU. Any other variation, including 

diff --git a/integration_tests/src/main/python/csv_test.py b/integration_tests/src/main/python/csv_test.py
@@ -229,8 +229,8 @@ def read_impl(spark):
     pytest.param('nan_and_inf.csv', _float_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/125')),
     pytest.param('floats_invalid.csv', _float_schema, {'header': 'true'}),
     pytest.param('floats_invalid.csv', _double_schema, {'header': 'true'}),
-    pytest.param('simple_float_values.csv', _float_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/126')),
-    pytest.param('simple_float_values.csv', _double_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/126')),
+    pytest.param('simple_float_values.csv', _float_schema, {'header': 'true'}),
+    pytest.param('simple_float_values.csv', _double_schema, {'header': 'true'}),
     pytest.param('simple_boolean_values.csv', _bool_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/2071')),
     pytest.param('ints_with_whitespace.csv', _number_as_string_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/2069')),
     pytest.param('ints_with_whitespace.csv', _byte_schema, {'header': 'true'}, marks=pytest.mark.xfail(reason='https://github.com/NVIDIA/spark-rapids/issues/130'))

diff --git a/integration_tests/src/test/resources/simple_float_values.csv b/integration_tests/src/test/resources/simple_float_values.csv
@@ -16,12 +16,4 @@ bad
 1.7976931348623157E308
 1.7976931348623157e+308
 1.7976931348623158E308
-1.2e-234
-NAN
-nan
-NaN
-Inf
--Inf
-INF
--INF
-
+1.2e-234