-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify mvn verify cache key [skip ci] #12237
base: branch-25.04
Are you sure you want to change the base?
Conversation
Signed-off-by: Yanxuan Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
Help reviewers understand the need for this change. Did we hit some error on gh? Can you file an issue with the description and reference it in your PR ? |
@@ -54,7 +54,8 @@ jobs: | |||
run: | | |||
set -x | |||
depsSHA1=$(. .github/workflows/mvn-verify-check/get-deps-sha1.sh 2.12) | |||
cacheKey="${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}-${{ github.event.pull_request.base.ref }}-${depsSHA1}" | |||
hashfile=$(echo ${{ hashFiles('**/pom.xml') }} | cut -c1-8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of truncating hashFile why not feed all the inputs to md5sum as in https://github.com/NVIDIA/spark-rapids/pull/12237/files#diff-4e6d49cc27e982e1413522f21efeb5632004b1b911cbbdfc8a53af561ee4ad92R53 and then truncate if it is really necessary to truncate it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, now we just use the first 8 chars of md5sum of raw hash value & deps sha1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Yanxuan Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
It is to prevent the cache key from being too long in the future which may be truncated by github leading to a conflict. |
Where is GH key truncation documented? Or is it something you have seen in the action logs and could provide a link? Provide a rationale for why manual key truncation is better than key truncation done by GH |
Signed-off-by: Yanxuan Liu <[email protected]>
There is no actual issue occurring. This is a proactive preventative measure, similar to how Kubernetes would truncate excessively long labels.
It provides deterministic behavior where we control the key truncation ourselves, rather than encountering unexpected failures similar to k8s pod naming issues I mentioned above. In k8s, pod label segments must be 63 characters or less, but the platform silently accepts longer names and truncates them without error notification. This hidden truncation can cause different workloads to be inadvertently scheduled to the same instance. Similarly, with our cache_key implementation, we're implementing controlled truncation to ensure we have predictable, deterministic short keys that avoid these hidden collision problems. |
As GH official document says, it allows maximum 512 characters for cache key, or action will fail. |
Currently the cache key ID contains hash value of all pom files and dependences sha1 md5 value. Intercept first 8 chars to avoid make it too long.