[FEA] JSON reader parses types compatible with Spark #4609

GaryShen2008 · 2022-01-24T03:24:53Z

The current CUDF code parses types how they want to, and not how Spark wants them to be parsed.
We might be able to ask CUDF to read all of the types as Strings and then parse them ourselves.
This will not fix everything because looking at cast there are a number of types
that we do not fully support when casting from a string still.
But it will be a step in the right direction and should let us avoid most of the enable this type for JSON configs.
We could at a minimum reused the cast from string configs that already exist.

This will not work 100% because Spark parses it line by line and then casts it to the desired result.
So with {"a": "100.0"} and {"a": 100.0}. If we asked for a double to be returned,
Spark would parse the first one as a String and then cast it to a double,
but for the second one it would parse it directly as a double and do no casting at the end.
In most cases this should not make a difference, but there can be very subtile differences,
between Sparks casting and what the JSON parser does to read the data.

sameerz · 2022-07-07T23:10:49Z

Removing from 22.08 until the cudf dependencies are satisfied.

GaryShen2008 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 24, 2022

GaryShen2008 mentioned this issue Jan 24, 2022

[FEA] JSON input support #9

Open

62 tasks

andygrove self-assigned this Jan 24, 2022

andygrove added this to the Jan 10 - Jan 28 milestone Jan 24, 2022

sameerz added task Work required that improves the product but is not user facing and removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Jan 25, 2022

andygrove modified the milestones: Jan 10 - Jan 28, Jan 31 - Feb 11 Jan 31, 2022

andygrove added the epic Issue that encompasses a significant feature or body of work label Feb 11, 2022

sameerz modified the milestones: Jan 31 - Feb 11, Feb 14 - Feb 25 Feb 15, 2022

sameerz modified the milestones: Feb 14 - Feb 25, Feb 28 - Mar 18 Feb 26, 2022

sameerz modified the milestones: Feb 28 - Mar 18, Mar 21 - Apr 1 Mar 18, 2022

andygrove modified the milestones: Mar 21 - Apr 1, Apr 4 - Apr 15 Apr 1, 2022

sameerz removed this from the Apr 4 - Apr 15 milestone Apr 19, 2022

andygrove mentioned this issue Oct 17, 2023

[FEA] [EPIC] Priority JSON Issues #9458

Open

26 tasks

andygrove removed their assignment Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] JSON reader parses types compatible with Spark #4609

[FEA] JSON reader parses types compatible with Spark #4609

GaryShen2008 commented Jan 24, 2022 •

edited by revans2

Loading

sameerz commented Jul 7, 2022

[FEA] JSON reader parses types compatible with Spark #4609

[FEA] JSON reader parses types compatible with Spark #4609

Comments

GaryShen2008 commented Jan 24, 2022 • edited by revans2 Loading

sameerz commented Jul 7, 2022

GaryShen2008 commented Jan 24, 2022 •

edited by revans2

Loading