Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate checksum of internal dependencies in the GH cache key [skip ci] #11791

Merged

Conversation

YanxuanLiu
Copy link
Collaborator

@YanxuanLiu YanxuanLiu commented Nov 28, 2024

Fixes #11748

Replace date with sha1 of latest snapshot of jni & private in cache key name. If failed to get timestamp, use date instead.

Test build: https://github.com/YanxuanLiu/spark-rapids/actions/runs/12061667486/job/33634063951?pr=6

@YanxuanLiu YanxuanLiu requested a review from pxLi November 28, 2024 03:03
@YanxuanLiu YanxuanLiu changed the title Fix bug cache dependencies Fix bug: cache dependencies Nov 28, 2024
Comment on lines 56 to 64
jniVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-jni.version -DforceStdout)
privateVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout)
jniTimestamp=$(curl -s -H "Accept: application/json" \
"https://oss.sonatype.org/service/local/artifact/maven/resolve?r=snapshots&g=com.nvidia&a=spark-rapids-jni&v=${jniVer}&c=&e=jar&wt=json" \
| jq .data.snapshotTimeStamp) || $(date +'%Y-%m-%d')
privateTimestamp=$(curl -s -H "Accept: application/json" \
"https://oss.sonatype.org/service/local/artifact/maven/resolve?r=snapshots&g=com.nvidia&a=rapids-4-spark-private_2.12&v=${privateVer}&c=&e=jar&wt=json" \
| jq .data.snapshotTimeStamp) || $(date +'%Y-%m-%d')
cacheKey="${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}-${{ github.event.pull_request.base.ref }}-${jniTimestamp}-${privateTimestamp}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of repetition inside this stance and with the scala2.13 copy on top of it. It is also harder to read if a lot of shell code included in YAML. We might be able to avoid both of these issues by having a dedicated .sh file with a function that can be called for spark-rapids-jni, rapids-4-spark_private for 2.12 and 2.13 Scala

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gerashegalov
gerashegalov previously approved these changes Nov 28, 2024
Comment on lines 170 to 171
jniTimestamp=$(. .github/workflows/mvn-verify-check/get-artifact-timestamp.sh spark-rapids-jni)
privateTimestamp=$(. .github/workflows/mvn-verify-check/get-artifact-timestamp.sh rapids-4-spark-private_2.13)
Copy link
Collaborator

@gerashegalov gerashegalov Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already looks good.

Optional nits:

  • we could return a key reflecting both artifacts in a single invocation
  • it is more accurate to use the sha1 field of the response than the timestamp because in the end it is about whether the artifact changed or not and the timestamp is a secondary indicator.

So the single invocation of the script could simply return a concatenation of spark-rapdis-jni and private sha1 field. If we have to limit the number of characters used in the key the script can md5sum the concatenation for the final output of the script.

Copy link
Collaborator Author

@YanxuanLiu YanxuanLiu Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks gera.

I've modified the script to return md5 hash value of concatenation of jni and private sha1.

Latest test passed: https://github.com/YanxuanLiu/spark-rapids/actions/runs/12078734123/job/33683559303?pr=6

@YanxuanLiu YanxuanLiu marked this pull request as ready for review November 29, 2024 04:17
@YanxuanLiu YanxuanLiu changed the title Fix bug: cache dependencies Fix bug: cache dependencies [skip ci] Nov 29, 2024
@YanxuanLiu
Copy link
Collaborator Author

build

@gerashegalov gerashegalov changed the title Fix bug: cache dependencies [skip ci] Incorporate checksum of internal dependencies in the GH cache key [skip ci] Nov 30, 2024
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing nits

@pxLi
Copy link
Collaborator

pxLi commented Dec 2, 2024

build

@YanxuanLiu YanxuanLiu merged commit bd14dbf into NVIDIA:branch-25.02 Dec 2, 2024
49 checks passed
@sameerz sameerz added the build Related to CI / CD or cleanly building label Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Invalidate GH action dependency cache when internal nightly dependencies are updated
4 participants