Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify mvn verify cache key [skip ci] #12237

Open
wants to merge 7 commits into
base: branch-25.04
Choose a base branch
from

Conversation

YanxuanLiu
Copy link
Collaborator

Currently the cache key ID contains hash value of all pom files and dependences sha1 md5 value. Intercept first 8 chars to avoid make it too long.

Signed-off-by: Yanxuan Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
@YanxuanLiu YanxuanLiu requested a review from a team as a code owner February 26, 2025 16:11
@YanxuanLiu
Copy link
Collaborator Author

Signed-off-by: Yanxuan Liu <[email protected]>
@gerashegalov
Copy link
Collaborator

Help reviewers understand the need for this change. Did we hit some error on gh? Can you file an issue with the description and reference it in your PR ?

@@ -54,7 +54,8 @@ jobs:
run: |
set -x
depsSHA1=$(. .github/workflows/mvn-verify-check/get-deps-sha1.sh 2.12)
cacheKey="${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}-${{ github.event.pull_request.base.ref }}-${depsSHA1}"
hashfile=$(echo ${{ hashFiles('**/pom.xml') }} | cut -c1-8)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of truncating hashFile why not feed all the inputs to md5sum as in https://github.com/NVIDIA/spark-rapids/pull/12237/files#diff-4e6d49cc27e982e1413522f21efeb5632004b1b911cbbdfc8a53af561ee4ad92R53 and then truncate if it is really necessary to truncate it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, now we just use the first 8 chars of md5sum of raw hash value & deps sha1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerashegalov gerashegalov changed the title Simplify mvn verify cache key Simplify mvn verify cache key [skip ci] Feb 26, 2025
@gerashegalov gerashegalov added the build Related to CI / CD or cleanly building label Feb 26, 2025
Signed-off-by: Yanxuan Liu <[email protected]>
Signed-off-by: Yanxuan Liu <[email protected]>
@YanxuanLiu
Copy link
Collaborator Author

Help reviewers understand the need for this change. Did we hit some error on gh? Can you file an issue with the description and reference it in your PR ?

It is to prevent the cache key from being too long in the future which may be truncated by github leading to a conflict.

@gerashegalov
Copy link
Collaborator

Help reviewers understand the need for this change. Did we hit some error on gh? Can you file an issue with the description and reference it in your PR ?

It is to prevent the cache key from being too long in the future which may be truncated by github leading to a conflict.

Where is GH key truncation documented? Or is it something you have seen in the action logs and could provide a link?

Provide a rationale for why manual key truncation is better than key truncation done by GH

Signed-off-by: Yanxuan Liu <[email protected]>
@pxLi
Copy link
Member

pxLi commented Feb 27, 2025

Where is GH key truncation documented? Or is it something you have seen in the action logs and could provide a link?

There is no actual issue occurring. This is a proactive preventative measure, similar to how Kubernetes would truncate excessively long labels.

Provide a rationale for why manual key truncation is better than key truncation done by GH

It provides deterministic behavior where we control the key truncation ourselves, rather than encountering unexpected failures similar to k8s pod naming issues I mentioned above. In k8s, pod label segments must be 63 characters or less, but the platform silently accepts longer names and truncates them without error notification. This hidden truncation can cause different workloads to be inadvertently scheduled to the same instance. Similarly, with our cache_key implementation, we're implementing controlled truncation to ensure we have predictable, deterministic short keys that avoid these hidden collision problems.

@YanxuanLiu
Copy link
Collaborator Author

Help reviewers understand the need for this change. Did we hit some error on gh? Can you file an issue with the description and reference it in your PR ?

It is to prevent the cache key from being too long in the future which may be truncated by github leading to a conflict.

Where is GH key truncation documented? Or is it something you have seen in the action logs and could provide a link?

Provide a rationale for why manual key truncation is better than key truncation done by GH

As GH official document says, it allows maximum 512 characters for cache key, or action will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants