Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Fix caching-related warnings in GHA build-test-publish CI #475

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jhlegarreta
Copy link
Contributor

@jhlegarreta jhlegarreta commented Dec 20, 2024

Fix caching-related warnings in GHA build-test-publish CI:

  • Save the apt-get cache in a directory other than /var/lib/apt to avoid permission issues.
  • Make the AFNI cache key name be specific to the CI matrix configuration to avoid clashes across cache names. Use the root part as the restore key name so that any cache found can be restored, as the AFNI version being installed is the same across configurations.
  • Check if the AFNI cache exists before trying to install it in GitHub Actions build-test-publish CI workflow.

Fixes:

Failed to save: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2

The full log showing

2024-12-19T13:35:45.2830631Z
 [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/sdcflows/sdcflows --files-from manifest.txt --use-compress-program zstdmt
2024-12-19T13:35:46.5536670Z
 Failed to save: Unable to reserve cache with key afni-v1, another job may be creating this cache. More details: Cache already exists. Scope: refs/heads/master, Key: afni-v1, Version: d04022ae09f8f21b8c0f9f00e4a784b6e510fe6a47d30aa3b0853a42885b92cb
2024-12-19T13:35:46.5924639Z
 Post job cleanup.
2024-12-19T13:35:46.7348688Z
 [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/sdcflows/sdcflows --files-from manifest.txt --use-compress-program zstdmt
2024-12-19T13:35:46.8326360Z
 /usr/bin/tar: ../../../../../var/lib/apt/lists/lock: Cannot open: Permission denied
2024-12-19T13:35:47.1807249Z
 /usr/bin/tar: ../../../../../var/lib/apt/lists/partial: Cannot open: Permission denied
2024-12-19T13:35:47.2842971Z
 /usr/bin/tar: Exiting with failure status due to previous errors
2024-12-19T13:35:47.2851756Z
 ##[warning]Failed to save: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2

raised for exmaple in:
https://github.com/nipreps/sdcflows/actions/runs/12413644206

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.85%. Comparing base (50c053d) to head (7cc0732).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #475   +/-   ##
=======================================
  Coverage   83.85%   83.85%           
=======================================
  Files          30       30           
  Lines        2819     2819           
  Branches      365      365           
=======================================
  Hits         2364     2364           
  Misses        384      384           
  Partials       71       71           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jhlegarreta jhlegarreta force-pushed the FixBuildTestPublishGHATarWarning branch 4 times, most recently from e6a245b to 04618e8 Compare December 20, 2024 22:41
@jhlegarreta jhlegarreta changed the title ENH: Check if cache exists before trying to install AFNI in GHA CI ENH: Fix caching-related warnings in GHA build-test-publish CI Dec 20, 2024
@jhlegarreta
Copy link
Contributor Author

Warnings are gone:
https://github.com/nipreps/sdcflows/actions/runs/12439499846

@effigies The 3.9, latest, slow and 3.12, latest, veryslow seem to be failing often within the scope of this PR, e.g.
https://github.com/nipreps/sdcflows/actions/runs/12438935967

I tried the problem had been solved naming the AFNI cache keys with distinct names::
https://github.com/nipreps/sdcflows/actions/runs/12439231605

But it does not seem to be the case:
https://github.com/nipreps/sdcflows/actions/runs/12439499846/job/34733630867

Any clue?

Fix caching-related warnings in GHA `build-test-publish` CI:
- Save the `apt-get` cache in a directory other than `/var/lib/apt` to
  avoid permission issues.
- Make the AFNI cache key name be specific to the CI matrix
  configuration to avoid clashes across cache names. Use the root part
  as the restore key name so that any cache found can be restored, as
  the AFNI version being installed is the same across configurations.
- Check if the AFNI cache exists before trying to install it in GitHub
  Actions `build-test-publish` CI workflow.

Fixes:
```
Failed to save: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2
```

The full log showing
```
2024-12-19T13:35:45.2830631Z
 [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/sdcflows/sdcflows --files-from manifest.txt --use-compress-program zstdmt
2024-12-19T13:35:46.5536670Z
 Failed to save: Unable to reserve cache with key afni-v1, another job may be creating this cache. More details: Cache already exists. Scope: refs/heads/master, Key: afni-v1, Version: d04022ae09f8f21b8c0f9f00e4a784b6e510fe6a47d30aa3b0853a42885b92cb
2024-12-19T13:35:46.5924639Z
 Post job cleanup.
2024-12-19T13:35:46.7348688Z
 [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/sdcflows/sdcflows --files-from manifest.txt --use-compress-program zstdmt
2024-12-19T13:35:46.8326360Z
 /usr/bin/tar: ../../../../../var/lib/apt/lists/lock: Cannot open: Permission denied
2024-12-19T13:35:47.1807249Z
 /usr/bin/tar: ../../../../../var/lib/apt/lists/partial: Cannot open: Permission denied
2024-12-19T13:35:47.2842971Z
 /usr/bin/tar: Exiting with failure status due to previous errors
2024-12-19T13:35:47.2851756Z
 ##[warning]Failed to save: "/usr/bin/tar" failed with error: The process '/usr/bin/tar' failed with exit code 2
```

raised for exmaple in:
https://github.com/nipreps/sdcflows/actions/runs/12413644206
@effigies effigies force-pushed the FixBuildTestPublishGHATarWarning branch from 04618e8 to 7cc0732 Compare January 22, 2025 02:18
Comment on lines +105 to +114
- name: Restore cache for AFNI
id: cache-afni
uses: actions/cache@v4
with:
path: /opt/afni
key: afni-v1
key: afni-v1-${{ matrix.python-version }}-${{ matrix.dependencies }}-${{ matrix.marks }}
restore-keys: |
afni-v1
afni-v1-
- name: Install AFNI
if: steps.cache-afni.outputs.cache-hit != 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set up this cache to not depend on the specific job, as its contents should not vary. Not saving multiple caches is a feature, not a bug.

Copy link
Contributor Author

@jhlegarreta jhlegarreta Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory is vague now, but looking at the commit message maybe GHA was showing some warning about name clashes (e.g. if two processes were trying to write to the same file)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there's a cache miss, each job will populate the directory, and the first to finish will successfully save the cache. That's fine and expected. The alternative would be to create a separate job to ensure the cache exists, and insert it into the workflow ahead of these.

@@ -88,7 +88,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/cache@v4
with:
path: /var/lib/apt
path: ${{ runner.temp }}/cache-linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just drop the APT cache, since it hasn't worked, and it would only save us up to 15s in 8m jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine if we want to drop this; I do not have a strong opinion. Let me know if you firmly believe we should go ahead and remove it, or whether you want other maintainers to weigh in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clearly not serving its purpose, so it's safe to remove. If you or @oesteban want to take another stab at reducing this time, you're welcome to, but if you're just motivated to reduce warnings, the simple path forward is remove the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants