Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to pandas-tests to a dedicated workflow file and trigger it from branch.yaml #15516

Merged
merged 13 commits into from
Apr 12, 2024
18 changes: 18 additions & 0 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,21 @@ jobs:
sha: ${{ inputs.sha }}
date: ${{ inputs.date }}
package-name: dask_cudf
trigger-pandas-tests:
if: inputs.build_type == 'nightly'
needs: wheel-build-cudf
runs-on: ubuntu-latest
steps:
- name: Checkout code repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.sha }}
persist-credentials: false
- name: Trigger pandas-tests
env:
GH_TOKEN: ${{ github.token }}
run: |
gh workflow run pandas-tests.yaml \
-f branch=${{ inputs.branch }} \
-f sha=${{ inputs.sha }} \
-f date=${{ inputs.date }}
27 changes: 27 additions & 0 deletions .github/workflows/pandas-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Pandas Test Job

on:
workflow_dispatch:
inputs:
branch:
required: true
type: string
date:
required: true
type: string
sha:
required: true
type: string

jobs:
pandas-tests:
# run the Pandas unit tests
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.9" and .CUDA_VER == "12.2.2" ))
Copy link
Member

@ajschmidt8 ajschmidt8 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding matrix filter values is error prone since they can fail as our support matrix changes.

What's the intended matrix that this should run against? The latest CUDA version and earliest Python version?

Once I know that information, I can help you put together a more generalized matrix filter.

cc: @bdice for additional input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the baseline job is being run as nightly job and comparison job is being run as a pull-request job. The only matrix combination common to both is

- { ARCH: 'amd64', PY_VER: '3.9',  CUDA_VER: '12.2.2', LINUX_VER: 'ubuntu22.04', gpu: 'v100', driver: 'latest' }

https://github.com/rapidsai/shared-workflows/blob/13e008c746e30a830c4ebe83a6786862858dfcb8/.github/workflows/wheels-test.yaml#L75-L94`

Copy link
Contributor Author

@galipremsagar galipremsagar Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice too pointed out this inconsistency before, but I haven't found a way to write a matrix filter that will fetch an identical job for both the jobs of pandas-tests in pr.yaml(pull-request build type) and branch.yaml (nightly build type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using different job configurations is not an option here because that will create some noise(test failures) that's specific to certain configurations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing is why I've advocated for the nightly matrix to be a strict superset of the PR matrix.

rapidsai/shared-workflows#176 (comment)
rapidsai/build-planning#5

build_type: nightly
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
script: ci/cudf_pandas_scripts/pandas-tests/run.sh main
2 changes: 1 addition & 1 deletion .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ jobs:
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
matrix_filter: map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(max_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
matrix_filter: map(select(.ARCH == "amd64" and .PY_VER == "3.9" and .CUDA_VER == "12.2.2" ))
build_type: pull-request
script: ci/cudf_pandas_scripts/pandas-tests/run.sh pr
# Hide test failures because they exceed the GITHUB_STEP_SUMMARY output limit.
Expand Down
11 changes: 0 additions & 11 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -125,14 +125,3 @@ jobs:
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
script: ci/cudf_pandas_scripts/run_tests.sh
pandas-tests:
# run the Pandas unit tests
secrets: inherit
uses: rapidsai/shared-workflows/.github/workflows/[email protected]
with:
matrix_filter: map(select(.ARCH == "amd64")) | group_by(.CUDA_VER|split(".")|map(tonumber)|.[0]) | map(min_by([(.PY_VER|split(".")|map(tonumber)), (.CUDA_VER|split(".")|map(tonumber))]))
build_type: nightly
branch: ${{ inputs.branch }}
date: ${{ inputs.date }}
sha: ${{ inputs.sha }}
script: ci/cudf_pandas_scripts/pandas-tests/run.sh main
4 changes: 3 additions & 1 deletion ci/cudf_pandas_scripts/pandas-tests/diff.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,16 @@

# Hard-coded needs to match the version deduced by rapids-upload-artifacts-dir
GH_JOB_NAME="pandas-tests-diff / build"
RAPIDS_FULL_VERSION=$(<./VERSION)
Copy link
Contributor Author

@galipremsagar galipremsagar Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raydouglass Instead of introducing a hard-coded version, I'm re-utilizing VERSION value that is more than enough for us here thus not needing to update the version script entry.

rapids-logger "Github job name: ${GH_JOB_NAME}"
rapids-logger "Rapids version: ${RAPIDS_FULL_VERSION}"

PY_VER="39"
MAIN_ARTIFACT=$(rapids-s3-path)cuda12_$(arch)_py${PY_VER}.main-results.json
PR_ARTIFACT=$(rapids-s3-path)cuda12_$(arch)_py${PY_VER}.pr-results.json

rapids-logger "Fetching latest available results from nightly"
aws s3api list-objects-v2 --bucket rapids-downloads --prefix "nightly/" --query "sort_by(Contents[?ends_with(Key, '_py${PY_VER}.main-results.json')], &LastModified)[::-1].[Key]" --output text > s3_output.txt
aws s3api list-objects-v2 --bucket rapids-downloads --prefix "nightly/" --query "sort_by(Contents[?ends_with(Key, '_py${PY_VER}.main-${RAPIDS_FULL_VERSION}-results.json')], &LastModified)[::-1].[Key]" --output text > s3_output.txt

read -r COMPARE_ENV < s3_output.txt
export COMPARE_ENV
Expand Down
11 changes: 6 additions & 5 deletions ci/cudf_pandas_scripts/pandas-tests/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
set -euo pipefail

PANDAS_TESTS_BRANCH=${1}

rapids-logger "Running Pandas tests using $PANDAS_TESTS_BRANCH branch"
RAPIDS_FULL_VERSION=$(<./VERSION)
rapids-logger "Running Pandas tests using $PANDAS_TESTS_BRANCH branch and rapids-version $RAPIDS_FULL_VERSION"
rapids-logger "PR number: ${RAPIDS_REF_NAME:-"unknown"}"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
Expand All @@ -27,9 +27,10 @@ bash python/cudf/cudf/pandas/scripts/run-pandas-tests.sh \
--dist worksteal \
--report-log=${PANDAS_TESTS_BRANCH}.json 2>&1

SUMMARY_FILE_NAME=${PANDAS_TESTS_BRANCH}-24.06-results.json
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
galipremsagar marked this conversation as resolved.
Show resolved Hide resolved
# summarize the results and save them to artifacts:
python python/cudf/cudf/pandas/scripts/summarize-test-results.py --output json pandas-testing/${PANDAS_TESTS_BRANCH}.json > pandas-testing/${PANDAS_TESTS_BRANCH}-results.json
python python/cudf/cudf/pandas/scripts/summarize-test-results.py --output json pandas-testing/${PANDAS_TESTS_BRANCH}.json > pandas-testing/${SUMMARY_FILE_NAME}
RAPIDS_ARTIFACTS_DIR=${RAPIDS_ARTIFACTS_DIR:-"${PWD}/artifacts"}
mkdir -p "${RAPIDS_ARTIFACTS_DIR}"
mv pandas-testing/${PANDAS_TESTS_BRANCH}-results.json ${RAPIDS_ARTIFACTS_DIR}/
rapids-upload-to-s3 ${RAPIDS_ARTIFACTS_DIR}/${PANDAS_TESTS_BRANCH}-results.json "${RAPIDS_ARTIFACTS_DIR}"
mv pandas-testing/${SUMMARY_FILE_NAME} ${RAPIDS_ARTIFACTS_DIR}/
rapids-upload-to-s3 ${RAPIDS_ARTIFACTS_DIR}/${SUMMARY_FILE_NAME} "${RAPIDS_ARTIFACTS_DIR}"
Loading