-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate checksum of internal dependencies in the GH cache key [skip ci] #11791
Incorporate checksum of internal dependencies in the GH cache key [skip ci] #11791
Conversation
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
jniVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-jni.version -DforceStdout) | ||
privateVer=$(mvn help:evaluate -q -pl dist -Dexpression=spark-rapids-private.version -DforceStdout) | ||
jniTimestamp=$(curl -s -H "Accept: application/json" \ | ||
"https://oss.sonatype.org/service/local/artifact/maven/resolve?r=snapshots&g=com.nvidia&a=spark-rapids-jni&v=${jniVer}&c=&e=jar&wt=json" \ | ||
| jq .data.snapshotTimeStamp) || $(date +'%Y-%m-%d') | ||
privateTimestamp=$(curl -s -H "Accept: application/json" \ | ||
"https://oss.sonatype.org/service/local/artifact/maven/resolve?r=snapshots&g=com.nvidia&a=rapids-4-spark-private_2.12&v=${privateVer}&c=&e=jar&wt=json" \ | ||
| jq .data.snapshotTimeStamp) || $(date +'%Y-%m-%d') | ||
cacheKey="${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}-${{ github.event.pull_request.base.ref }}-${jniTimestamp}-${privateTimestamp}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a bit of repetition inside this stance and with the scala2.13 copy on top of it. It is also harder to read if a lot of shell code included in YAML. We might be able to avoid both of these issues by having a dedicated .sh file with a function that can be called for spark-rapids-jni, rapids-4-spark_private for 2.12 and 2.13 Scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a script to get artifacts timestamp.
Test passed: https://github.com/YanxuanLiu/spark-rapids/actions/runs/12063808370/job/33639585513?pr=6
Signed-off-by: YanxuanLiu <[email protected]>
jniTimestamp=$(. .github/workflows/mvn-verify-check/get-artifact-timestamp.sh spark-rapids-jni) | ||
privateTimestamp=$(. .github/workflows/mvn-verify-check/get-artifact-timestamp.sh rapids-4-spark-private_2.13) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already looks good.
Optional nits:
- we could return a key reflecting both artifacts in a single invocation
- it is more accurate to use the
sha1
field of the response than the timestamp because in the end it is about whether the artifact changed or not and the timestamp is a secondary indicator.
So the single invocation of the script could simply return a concatenation of spark-rapdis-jni and private sha1 field. If we have to limit the number of characters used in the key the script can md5sum the concatenation for the final output of the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks gera.
I've modified the script to return md5 hash value of concatenation of jni and private sha1.
Latest test passed: https://github.com/YanxuanLiu/spark-rapids/actions/runs/12078734123/job/33683559303?pr=6
Signed-off-by: YanxuanLiu <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing nits
build |
Fixes #11748
Replace date with sha1 of latest snapshot of jni & private in cache key name. If failed to get timestamp, use date instead.
Test build: https://github.com/YanxuanLiu/spark-rapids/actions/runs/12061667486/job/33634063951?pr=6