-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON reader validation of values #15968
Merged
rapids-bot
merged 47 commits into
rapidsai:branch-24.10
from
karthikeyann:fea-json_spark_validation
Sep 11, 2024
Merged
Changes from 34 commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
bb991ef
validation of tokens code
karthikeyann 4e707cb
fix pre-commit check failures
karthikeyann 35a8268
Merge branch 'branch-24.08' into fea-json_spark_validation
karthikeyann cd6a30f
Merge branch 'branch-24.08' into fea-json_spark_validation
karthikeyann 0c2e4da
Add Spark Compatible JSON validation (#10)
revans2 6a38578
Merge branch 'branch-24.08' of github.com:rapidsai/cudf into fea-json…
karthikeyann 0d6cb12
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into fea-json…
karthikeyann dfa6b18
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann e944937
style fixes
karthikeyann 23072c0
Update json normalization to take device_buffer
karthikeyann a885340
fix char comparison error
karthikeyann 3867c61
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann ab1385d
update char comparison
karthikeyann 80c7c3a
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann f2e2b44
rename to tabulate_output_iterator.cuh
karthikeyann 0963218
absorb counting_iterator to tabulate_output_iterator
karthikeyann be7402c
update documentation
karthikeyann b114401
add na_values to validation
karthikeyann a1e9afc
add strict validation to test
karthikeyann ec78ef9
rename tabulate_output_iterator namespace
karthikeyann a225ce0
remove comments and notes
karthikeyann 7a2a451
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann 875a72b
fix unsigned/signed issue with ARM systems
karthikeyann ef6f298
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann be7f17e
remove comments
karthikeyann fb62877
fix condition
karthikeyann e4f7d04
fix char issue with typecast
karthikeyann 851fe3e
Update cpp/include/cudf/io/json.hpp
karthikeyann 35e4b89
Update cpp/include/cudf/io/json.hpp
karthikeyann 3681823
address review comments
karthikeyann 1d897f7
fix doc
karthikeyann e1435ce
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann 6bf4d3f
address review comments
karthikeyann e9ebb91
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann e093d64
address review comments
karthikeyann 00ef690
rename lambda name
karthikeyann 86bbeab
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann cecb42f
Apply suggestions from code review
karthikeyann 9cd3098
Apply suggestions from code review
karthikeyann c816c73
update docs
karthikeyann 53db703
Update cpp/include/cudf/io/json.hpp
ttnghia c3832b6
Update cpp/include/cudf/io/json.hpp
ttnghia 070263e
Update cpp/include/cudf/io/json.hpp
ttnghia fb0e85f
fix strict_validation dependent options with if
karthikeyann e7fce07
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann 252c38b
fix typo
karthikeyann 5ab337b
Merge branch 'branch-24.10' into fea-json_spark_validation
karthikeyann File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify what is "strict" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@revans2 would you like to keep 3 booleans for strict validation as separate boolean option or a single struct with 3 booleans? (if we want to add more boolean options here then, it may break backward compatibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strict to me means that it enforces the JSON specification https://www.json.org/json-en.html. We probably should update the comment to explain that.
As for how we enable and disable things, I am fine with whatever so long as there is a clear way to put it into a mode where it conforms to the JSON spec and that there are ways to enable/disable other individual checks. I am not super concerned about backwards compatibility on the C++ side, as java is more forgiving in that respect.
There are a lot of options here that we could use. If we really want to tie these together we could have a separate builder that is specific for the extra validation steps. That way we can maintain backwards compatibility when/if we add in more validation options, and it conforms with the same pattern used to configure the readers.