Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spark400 build due to LogicalRelation signature changes [databricks] #11695

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

jlowe
Copy link
Contributor

@jlowe jlowe commented Nov 5, 2024

Fixes #11684. SPARK-50144 added a new parameter to LogicalRelation, breaking old unapply usages. Added a new shim for gathering the locations for a logical plan which was the one place using LogicalRelation.unapply.

@jlowe jlowe added the build Related to CI / CD or cleanly building label Nov 5, 2024
@jlowe jlowe self-assigned this Nov 5, 2024
@jlowe
Copy link
Contributor Author

jlowe commented Nov 5, 2024

build

@mythrocks
Copy link
Collaborator

Hey, thanks for picking this up, @jlowe. I was about to raise a PR for this. I do like yours better.

@jlowe
Copy link
Contributor Author

jlowe commented Nov 5, 2024

build

Comment on lines -86 to +87
sparkSession.table(tableIdent).queryExecution.analyzed.collect {
case LogicalRelation(t: HadoopFsRelation, _, _, _) => t.location
}.head
LogicalPlanShims.getLocations(sparkSession.table(tableIdent).queryExecution.analyzed).head
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a nice Scala but I recommend to future-proof and avoid small shim burden:

    val fileIndex = catalogTable.map(_.identifier).map { tableIdent =>
      sparkSession.table(tableIdent).queryExecution.analyzed.collect {
        case lr: LogicalRelation if lr.relation.isInstanceOf[HadoopFsRelation] => 
          lr.relation.asInstanceOf[HadoopFsRelation].location
      }.head
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this at first, but I'm not sure this is "future-proof." I think we'd rather prefer to have the compiler and/or runtime inform us if a Spark platform changed this interface, since we may need to do more than just ignore the added parameters. In those cases Spark would have to change this corresponding code, and we would need to change it in a compatible way. Maybe that change is to ignore the added or removed parameters, maybe not.

@jlowe jlowe merged commit ad4233d into NVIDIA:branch-24.12 Nov 6, 2024
47 of 48 checks passed
@jlowe jlowe deleted the fix-spark400 branch November 6, 2024 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] 24.12 Precommit fails with wrong number of arguments in GpuDataSource
4 participants