Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Non-nullable bools in a nullable struct fails #11762

Closed
kuhushukla opened this issue Nov 25, 2024 · 3 comments · Fixed by #11781
Closed

[BUG] Non-nullable bools in a nullable struct fails #11762

kuhushukla opened this issue Nov 25, 2024 · 3 comments · Fixed by #11781
Assignees
Labels
bug Something isn't working

Comments

@kuhushukla
Copy link
Collaborator

Describe the bug
While working on a test for issue #11736. I see failures with below stack trace when booleans that have nullable=false are passed into structs with nullable=true .

Caused by: ai.rapids.cudf.CudfException: CUDF failure at: /home/jenkins/agent/workspace/jenkins-spark-rapids-jni-release-4-cuda12/thirdparty/cudf/cpp/src/io/orc/writer_impl.cu:586: Mismatch in metadata prescribed nullability and input column. Metadata for input column with nulls cannot prescribe nullability = false
E                   	at ai.rapids.cudf.Table.writeORCChunk(Native Method)
E                   	at ai.rapids.cudf.Table.access$1300(Table.java:41)
E                   	at ai.rapids.cudf.Table$ORCTableWriter.write(Table.java:2038)
E                   	at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$encodeAndBufferToHost$1(ColumnarOutputWriter.scala:205)
E                   	at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$encodeAndBufferToHost$1$adapted(ColumnarOutputWriter.scala:196)
E                   	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
E                   	at com.nvidia.spark.rapids.ColumnarOutputWriter.encodeAndBufferToHost(ColumnarOutputWriter.scala:196)
E                   	at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$2(ColumnarOutputWriter.scala:180)
E                   	at com.nvidia.spark.rapids.ColumnarOutputWriter.$anonfun$bufferBatchAndClose$2$adapted(ColumnarOutputWriter.scala:179)

Steps/Code to reproduce bug
orc_write_test.py , pass BooleanGen(nullable=False) to StructGen and run the tests.

orc_write_basic_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
        string_gen, BooleanGen(nullable=False), DateGen(start=date(1590, 1, 1)),
        TimestampGen(start=datetime(1970, 1, 1, tzinfo=timezone.utc)) ] + \
        decimal_gens
orc_write_basic_struct_gen = StructGen([['child'+str(ind), sub_gen] for ind, sub_gen in enumerate(orc_write_basic_gens)])

Run command :

TEST_PARALLEL=10 ./run_pyspark_from_build.sh  -k "orc_write_test"

Expected behavior
Not error, fallback or ideally allow for Spark parity with nullables being reflected thru the metadata field.

Environment details (please complete the following information)

  • Any, onprem in this case
@kuhushukla kuhushukla added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 25, 2024
@mattahrens mattahrens changed the title [BUG] Nullable bools in a struct that is not-nullable fails [BUG] Non-nullable bools in a nullable struct fails Nov 26, 2024
@mattahrens
Copy link
Collaborator

Make sure we re-test this with latest changes related to boolean writes.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 26, 2024
@revans2
Copy link
Collaborator

revans2 commented Nov 26, 2024

I extended these tests to other types and it looks like it fails for all of them on ORC.

@revans2
Copy link
Collaborator

revans2 commented Nov 26, 2024

This also fails for parquet in the same way...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants