Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 'spark-test.sh' integration tests FAILED on 'ps: command not found" in Rocky Docker environment #10154

Closed
NvTimLiu opened this issue Jan 5, 2024 · 0 comments
Assignees
Labels
bug Something isn't working build Related to CI / CD or cleanly building

Comments

@NvTimLiu
Copy link
Collaborator

NvTimLiu commented Jan 5, 2024

Describe the bug

'spark-test.sh' integration tests FAILED on ps: command not found in Rocky Docker containers, ps is not installed in some nvidia Rocky docker image, e.g. nvidia/cuda:12.0.0-runtime-rockylinux9

ps command is required for starting/stopping spark clusters: e.g. https://github.com/apache/spark/blob/v3.3.2/bin/load-spark-env.sh#L68

As our integration tests support running in Rocky Docker containers, https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/jenkins/Dockerfile-blossom.integration.rocky. We can install ps tools in Rocky Docker images to fix the issue.

Failure logs

 + stop-worker.sh
 /home/jenkins/agent/workspace/jenkins-rapids_it-3.5.x-SNAPSHOT-dev-github-9/jars/spark-3.5.1-SNAPSHOT-bin-hadoop3.2/bin/load-spark-env.sh: line 68: ps: command not found
 /home/jenkins/agent/workspace/jenkins-rapids_it-3.5.x-SNAPSHOT-dev-github-9/jars/spark-3.5.1-SNAPSHOT-bin-hadoop3.2/bin/load-spark-env.sh: line 68: ps: command not found
 /home/jenkins/agent/workspace/jenkins-rapids_it-3.5.x-SNAPSHOT-dev-github-9/jars/spark-3.5.1-SNAPSHOT-bin-hadoop3.2/sbin/spark-daemon.sh: line 214: ps: command not found
 no org.apache.spark.deploy.worker.Worker to stop
 + stop-master.sh
 /home/jenkins/agent/workspace/jenkins-rapids_it-3.5.x-SNAPSHOT-dev-github-9/jars/spark-3.5.1-SNAPSHOT-bin-hadoop3.2/bin/load-spark-env.sh: line 68: ps: command not found
 /home/jenkins/agent/workspace/jenkins-rapids_it-3.5.x-SNAPSHOT-dev-github-9/jars/spark-3.5.1-SNAPSHOT-bin-hadoop3.2/sbin/spark-daemon.sh: line 214: ps: command not found
 no org.apache.spark.deploy.master.Master to stop

Steps/Code to reproduce bug
1, Run Rocky9 docker container using the docker file : https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/jenkins/Dockerfile-blossom.integration.rocky
2, Run integration tests https://github.com/NVIDIA/spark-rapids/blob/branch-24.02/jenkins/spark-test.sh, e.g.

    docker build -f jenkins/Dockerfile-blossom.integration.rocky -t plugin-it:cuda12.0.0-rocky9 --build-arg ARG CUDA_VER=11.8.0 ARG --build-argROCKY_VER=9 ./
    docker run --runtime=nvidia -it plugin-it:cuda12.0.0-rocky9 bash
    bash jenins/spark-test.sh

Expected behavior
Integration tests can PASS in the above provided Rocky8 docker container

Environment details (please complete the following information)

@NvTimLiu NvTimLiu added bug Something isn't working build Related to CI / CD or cleanly building labels Jan 5, 2024
@NvTimLiu NvTimLiu self-assigned this Jan 5, 2024
NvTimLiu added a commit to NvTimLiu/spark-rapids that referenced this issue Jan 5, 2024
To fix issue: NVIDIA#10154

Install 'procps' to fix 'ps: command not found' in nvidia Rocky9 Docker containers,

when runing integration tests with jenkins/spark-test.sh

'procps' is required for rocky docker containers to run spark standalone cluster, see:

    https://github.com/apache/spark/blob/v3.3.2/bin/load-spark-env.sh#L68

Signed-off-by: Tim Liu <[email protected]>
jlowe pushed a commit that referenced this issue Jan 5, 2024
To fix issue: #10154

Install 'procps' to fix 'ps: command not found' in nvidia Rocky9 Docker containers,

when runing integration tests with jenkins/spark-test.sh

'procps' is required for rocky docker containers to run spark standalone cluster, see:

    https://github.com/apache/spark/blob/v3.3.2/bin/load-spark-env.sh#L68

Signed-off-by: Tim Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Related to CI / CD or cleanly building
Projects
None yet
Development

No branches or pull requests

1 participant