Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spark350emr shim layer [EMR] #10463

Closed

Conversation

mimaomao
Copy link
Contributor

This PR targets to add a new shim layer spark350emr which supports running Spark RAPIDS on AWS EMR Spark 3.5.0.


Note, this PR is a new revision of previous PR rebased on branch-24.04. You can find more details about testing in that PR.

This PR targets to add a new shim layer spark350emr which supports
running Spark RAPIDS on AWS EMR Spark 3.5.0.

Signed-off-by: Maomao Min <[email protected]>
@gerashegalov
Copy link
Collaborator

Somewhere in ./jenkins/version-def.sh or in .github/workflows/mvn-verify-check.yml we need to exclude 350emr from package-tests matrix on GH hosted PR checks because we won't have access to 3.5.0-amzn-0 dependencies presumably.

Error:  Failed to execute goal on project rapids-4-spark-emr-bom: Could not resolve dependencies for project com.nvidia:rapids-4-spark-emr-bom:pom:24.04.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0, org.apache.spark:spark-hive_2.12:jar:3.5.0-amzn-0: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0 was not found in https://repo1.maven.org/maven2 during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

We will also need to produce a spark-rapids-private shim for 350emr

cc @GaryShen2008 @sameerz

@sameerz sameerz added build Related to CI / CD or cleanly building feature request New feature or request labels Feb 23, 2024
@NvTimLiu
Copy link
Collaborator

Somewhere in ./jenkins/version-def.sh or in .github/workflows/mvn-verify-check.yml we need to exclude 350emr from package-tests matrix on GH hosted PR checks because we won't have access to 3.5.0-amzn-0 dependencies presumably.

Error:  Failed to execute goal on project rapids-4-spark-emr-bom: Could not resolve dependencies for project com.nvidia:rapids-4-spark-emr-bom:pom:24.04.0-SNAPSHOT: The following artifacts could not be resolved: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0, org.apache.spark:spark-hive_2.12:jar:3.5.0-amzn-0: org.apache.spark:spark-sql_2.12:jar:3.5.0-amzn-0 was not found in https://repo1.maven.org/maven2 during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

We will also need to produce a spark-rapids-private shim for 350emr

cc @GaryShen2008 @sameerz

EMR pre-merge/nightly build&test will be like what we've done for Databricks runtime.

We'll need separated CI jobs running on EMR

@NvTimLiu NvTimLiu changed the base branch from branch-24.04 to branch-24.06 April 15, 2024 03:31
@NvTimLiu
Copy link
Collaborator

Retarget to branch-24.06 for next release, as we're running v24.04 release, please let me know if you've any concern, thanks!

@sameerz
Copy link
Collaborator

sameerz commented Jul 30, 2024

Closing until we can retarget to the latest branch

@sameerz sameerz closed this Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants