You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current CUDF code parses types how they want to, and not how Spark wants them to be parsed.
We might be able to ask CUDF to read all of the types as Strings and then parse them ourselves.
This will not fix everything because looking at cast there are a number of types
that we do not fully support when casting from a string still.
But it will be a step in the right direction and should let us avoid most of the enable this type for JSON configs.
We could at a minimum reused the cast from string configs that already exist.
This will not work 100% because Spark parses it line by line and then casts it to the desired result.
So with {"a": "100.0"} and {"a": 100.0}. If we asked for a double to be returned,
Spark would parse the first one as a String and then cast it to a double,
but for the second one it would parse it directly as a double and do no casting at the end.
In most cases this should not make a difference, but there can be very subtile differences,
between Sparks casting and what the JSON parser does to read the data.
The current CUDF code parses types how they want to, and not how Spark wants them to be parsed.
We might be able to ask CUDF to read all of the types as Strings and then parse them ourselves.
This will not fix everything because looking at cast there are a number of types
that we do not fully support when casting from a string still.
But it will be a step in the right direction and should let us avoid most of the enable this type for JSON configs.
We could at a minimum reused the cast from string configs that already exist.
This will not work 100% because Spark parses it line by line and then casts it to the desired result.
So with {"a": "100.0"} and {"a": 100.0}. If we asked for a double to be returned,
Spark would parse the first one as a String and then cast it to a double,
but for the second one it would parse it directly as a double and do no casting at the end.
In most cases this should not make a difference, but there can be very subtile differences,
between Sparks casting and what the JSON parser does to read the data.
The text was updated successfully, but these errors were encountered: