Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed some of the failing parquet_tests [databricks] #11429

Merged
merged 4 commits into from
Sep 10, 2024

Conversation

razajafri
Copy link
Collaborator

This PR contributes towards fixing #11024

@@ -35,15 +35,19 @@ def read_parquet_df(data_path):
def read_parquet_sql(data_path):
return lambda spark : spark.sql('select * from parquet.`{}`'.format(data_path))

datetimeRebaseModeInWriteKey = 'spark.sql.legacy.parquet.datetimeRebaseModeInWrite' if is_before_spark_400() else 'spark.sql.parquet.datetimeRebaseModeInWrite'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the non-legacy versions of these configs appear to have been added in 3.0.0. Is there a reason we are not just switching over to using them instead?

@@ -47,7 +47,7 @@ case class GpuBatchScanExec(
@transient override lazy val batch: Batch = if (scan == null) null else scan.toBatch
// TODO: unify the equal/hashCode implementation for all data source v2 query plans.
override def equals(other: Any): Boolean = other match {
case other: GpuBatchScanExec =>
case other: BatchScanExec =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being changed? This is a GpuBatchScanExec. We don't want to be equal to non-GPU versions do we?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. While debugging I wasn't sure what was causing a failure and looking at 330 shim I changed this and didn't change it back before submitting this PR.

I am adding that change to this PR as well.

@razajafri razajafri requested a review from revans2 September 6, 2024 16:06
@razajafri
Copy link
Collaborator Author

build

@razajafri
Copy link
Collaborator Author

The failure in CI seems unrelated

 - extract mortgage data *** FAILED ***

[2024-09-06T17:53:06.482Z]   0 did not equal 10000 (MortgageSparkSuite.scala:65)

The test only reads a CSV, sorts it and does a row count. It passes locally

@razajafri
Copy link
Collaborator Author

build

@jlowe
Copy link
Member

jlowe commented Sep 10, 2024

The failure in CI seems unrelated

This is tracked in #11436, test was temporarily disabled in #11451.

@razajafri
Copy link
Collaborator Author

We currently do not have sufficient g5.4xlarge capacity in the Availability Zone you requested (us-west-2c). Our system will be working on provisioning additional capacity. You can currently get g5.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-west-2a, us-west-2b.

@razajafri
Copy link
Collaborator Author

build

@razajafri razajafri merged commit 502f5a3 into NVIDIA:branch-24.10 Sep 10, 2024
45 checks passed
@razajafri razajafri deleted the SP-11024-fix-parquet-tests branch September 10, 2024 22:50
@sameerz sameerz added Spark 4.0+ Spark 4.0+ issues bug Something isn't working labels Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants