-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable all json reader options in pylibcudf read_json #17563
base: branch-25.02
Are you sure you want to change the base?
Enable all json reader options in pylibcudf read_json #17563
Conversation
@@ -198,6 +193,7 @@ def read_json( | |||
mixed_types_as_string=mixed_types_as_string, | |||
prune_columns=prune_columns, | |||
recovery_mode=c_on_bad_lines, | |||
extra_parameters=kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an FYI. I'm updating this API to use json reader options classes to match the other IO functions (eg. https://github.com/rapidsai/cudf/blob/branch-25.02/python/pylibcudf/pylibcudf/io/parquet.pyx#L309). So it will look like
plc.io.json.read_json(
plc.io.json.JsonReaderOptions.builder(
plc.io.SourceInfo(file_paths_or_buffers)
)
.byte_range_size(...)
...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These options will change because most of options are for spark, and they will change until all spark json feature requests are completed. These options are not intended for python users. It's exposed for quicker testing.
/ok to test |
/ok to test |
I'm OK with the dict as a short-term experimental solution but I wouldn't want to merge it without a plan for removing and replacing with an options struct the way that the other I/O APIs work. Will this be removed by 25.02? Can we open an issue and tag it as required for the release? |
To clarify, I'm referring to |
Description
This PR exposes all json reader options in pylibcudf and enables it via kwargs in
cudf.read_json
since kwargs cannot be used in cython, kwargs is passed as dict to cython.
These options are hidden in docs intentionally, as these options are mostly used for testing feature requests from spark json reader now. These options are expected to change.
Checklist