Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run '--packages' only with default cuda11 jar #10279

Merged
merged 2 commits into from
Jan 30, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions jenkins/spark-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -304,10 +304,17 @@ if [[ $TEST_MODE == "DEFAULT" ]]; then
PYSP_TEST_spark_shuffle_manager=com.nvidia.spark.rapids.${SHUFFLE_SPARK_SHIM}.RapidsShuffleManager \
./run_pyspark_from_build.sh

SPARK_SHELL_SMOKE_TEST=1 \
PYSP_TEST_spark_jars_packages=com.nvidia:rapids-4-spark_${SCALA_BINARY_VER}:${PROJECT_VER} \
PYSP_TEST_spark_jars_repositories=${PROJECT_REPO} \
./run_pyspark_from_build.sh
# As '--packages' only works on the default cuda11 jar, it does not support classifiers
# refer to issue : https://issues.apache.org/jira/browse/SPARK-20075
# "$CLASSIFIER" == ''" is usally for the case running by developers,
# while "$CLASSIFIER" == "cuda11" is for the case running on CI.
# We expect to run packages test for both cases
if [[ "$CLASSIFIER" == "" || "$CLASSIFIER" == "cuda11" ]]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in a comment why OR-ing with "$CLASSIFIER" == "cuda11" is necessary (if it is)?

As you point out, we know that Spark does not support classifiers in Maven coordinates and it does not look used inside this if branch

Copy link
Collaborator Author

@NvTimLiu NvTimLiu Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov , good question.

cuda11 jar the the same as main jar, though we do not put cuda11 as part of the --packages param, "CLASSIFIER==cuda11" is the most common case(e.g.https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/jenkins/version-def.sh#L30-L31) to to run integration tests. If only filter it with "CLASSIFIER==''", we may skip packages test in all CI jobs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the default artifact without a classifier is the same as the cuda11-classified artifact.

IIUC,

"$CLASSIFIER" == ''" is for the case when the script is run by a developer. "$CLASSIFIER" == "cuda11" is for cases when run on CI. If so, please add a comment about this. It is strange to see a check for the cuda11 classifier next to the comment saying it's not supported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, let me add comment for it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

SPARK_SHELL_SMOKE_TEST=1 \
PYSP_TEST_spark_jars_packages=com.nvidia:rapids-4-spark_${SCALA_BINARY_VER}:${PROJECT_VER} \
PYSP_TEST_spark_jars_repositories=${PROJECT_REPO} \
./run_pyspark_from_build.sh
if
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'fi'


# ParquetCachedBatchSerializer cache_test
PYSP_TEST_spark_sql_cache_serializer=com.nvidia.spark.ParquetCachedBatchSerializer \
Expand Down
Loading