[MNT] CI add pip dependency caching #1352

chrisholder · 2024-03-27T17:56:08Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds further caching to the CI allowing all dependencies and wheels to be cache'd and restored locally greatly speeding up the CI.

One of the main bottle necks for the CI currently is installing dependencies and building wheels. Currently every time the CI is ran dependencies are installed from pip and built. In the 'perfect' scenario this is fine and installs and building of wheels takes around 5 minutes or less. However, due to external factors such as network issues, a bad CI node and other factors sometimes this install and building step can take upwards of 20 minutes. Normally in every CI run, one runner takes 15+ minutes to install and build the dependencies. As the CI is only as fast as its slowest runner this presents a bottleneck.

This PR caches the dependencies and the wheels built each time a PR is merged to main (if pyproject.toml is updated). As a result the CI can the restore the dependencies and wheel on all branches and runs skipping the download and build phase. This reduces the time to install down to around 3 minutes in the best case and in the worse case (due to bad runner) 7 minutes. This greatly speeds up the CI as the slowest runner should be much faster.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

For new estimators and functions

I've added the estimator to the online API documentation.
(OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

For developers with write access

(OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

aeon-actions-bot · 2024-03-27T17:56:27Z

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ $\color{#EC843A}{\textsf{maintenance}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#5209C9}{\textsf{distances}}$, $\color{#2C2F20}{\textsf{testing}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

chrisholder · 2024-03-27T18:16:35Z

One key change worth documenting is in the cache script there is this step:

      - name: Install CPU version of pytorch on linux
        uses: nick-fields/retry@v3
        if: steps.cache.outputs.cache-hit != 'true' && runner.os == 'Linux'
        with:
          timeout_minutes: 30
          max_attempts: 3
          command: python -m pip install torch --index-url https://download.pytorch.org/whl/cpu

I found the default pip install torch to be multiple gbs in size because it installs all the GPU dependencies (which we dont use in the CI). I instead opt to only install the CPU dependencies for the cache which is only 200mbs of size. This allowed us to stay under the 10gb limit

TonyBagnall · 2024-03-27T20:58:00Z

is this sped up?

chrisholder · 2024-03-27T21:06:25Z

The caching doesn't exist until merged onto main (as the cache's need to be made on main).

This is a run on my fork using the pip cache:

chrisholder · 2024-03-27T21:08:00Z

This is the usage on this branch that doesn't have any pip caching:

Overall the caching saves a hour over all tests

MatthewMiddlehurst

This PR caches the dependencies and the wheels built each time a PR is merged to main (if pyproject.toml is updated).

The majority of dependencies (even more for patch versions) will install new versions without any update to pyproject.toml. Our main indication that there is a breakage on fresh installations is the general CI failing due to an incompatible dependency update.

This change currently could potentially delay a response. Even if we catch this on the periodic tests and make an update, there is a potential for a mismatch of dependency versions used in that test and PR tests. Someone could probably pick at some security issues on being behind on versions, but I'm not going to do that.

tl;dr: I think we should update the cache more frequently 🙂.

MatthewMiddlehurst · 2024-03-28T00:57:20Z

.github/actions/numba_cache/action.yml

-      # GNU tar on windows runs much faster than the default BSD tar
-    - name: Use GNU tar instead BSD tar if Windows
-      if: ${{ inputs.runner_os == 'Windows' }}
-      shell: cmd
-      run: echo C:\Program Files\Git\usr\bin>>"%GITHUB_PATH%"
-


Any reason why? Is this the only action which does this?

I have removed this completely now since it doesn't make a difference now we're using the windows D drive (it seems to have fixed all the cache restoring issues)

.github/workflows/pr_examples.yml

MatthewMiddlehurst · 2024-03-28T01:00:16Z

.github/workflows/pr_pytest.yml

+      - name: Set optimal environment variables for pip cache and restore
+        uses: ./.github/actions/pip_cache
+        with:
+          runner_os: ${{ runner.os }}
+          python_version: ${{ matrix.python-version }}
+          restore_cache: "true"


Any reason for these two jobs only?

.github/actions/pip_cache/action.yml

MatthewMiddlehurst · 2024-03-28T01:06:42Z

.github/workflows/update_pip_cache.yml

+      - name: Install CPU version of pytorch on linux
+        uses: nick-fields/retry@v3
+        if: steps.cache.outputs.cache-hit != 'true' && runner.os == 'Linux'
+        with:
+          timeout_minutes: 30
+          max_attempts: 3
+          command: python -m pip install torch --index-url https://download.pytorch.org/whl/cpu


Why only Linux, are there no benefits for other OS?

So pytorch by default on macos and windows will only install the CPU version unless you specify otherwise. Linux on the otherhand it installs everything (GPU version). This is why the Linux cache sizes were over 2GBs whereas windows and macos was like that 600 mbs.

.github/workflows/update_pip_cache.yml

MatthewMiddlehurst · 2024-03-28T01:10:21Z

.github/workflows/update_pip_cache.yml

+name: Update pip cache on merge to main
+on:
+  push:
+    branches:
+      - main


Maybe set it so this can be run manually.

I made it so it runs at the same time as periodic. The reason they are separate actions is because I don't want to use the cpu version of pytorch in periodic just in case something breaks because of using the cpu vs regular install (it should be identical)

MatthewMiddlehurst · 2024-03-28T01:11:45Z

Even with the comments and questions in the review, I really like the idea of this and hope it works out. Thanks for the work.

hadifawaz1999 and others added 13 commits March 18, 2024 09:58

added precomputation capability

54e669b

docstrings

a544afa

bug fix

57dd19e

fixes copy and optional

7038aa1

fixes copy and optional

ab6e724

added shape_dtw to param tests

f33673e

moved to utils

e146ce0

added bounding tests to shape_dtw

4a6fa39

removed redundent test file

a659648

added comment

44707a6

pyproject

8a765f6

merge main

43fcc6e

CI setup for pip caching

d9a54df

chrisholder requested a review from TonyBagnall as a code owner March 27, 2024 17:56

aeon-actions-bot bot added distances Distances package maintenance Continuous integration, unit testing & package distribution testing Testing related issue or pull request labels Mar 27, 2024

reset changes accidently committed

78a1e97

added spaces

a1a5f21

MatthewMiddlehurst removed the distances Distances package label Mar 28, 2024

MatthewMiddlehurst requested changes Mar 28, 2024

View reviewed changes

chrisholder added 2 commits March 28, 2024 16:27

address comments

6b9fbbd

remove tar

172ac58

chrisholder marked this pull request as draft April 4, 2024 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] CI add pip dependency caching #1352

[MNT] CI add pip dependency caching #1352

chrisholder commented Mar 27, 2024 •

edited

Loading

aeon-actions-bot bot commented Mar 27, 2024

chrisholder commented Mar 27, 2024

TonyBagnall commented Mar 27, 2024

chrisholder commented Mar 27, 2024 •

edited

Loading

chrisholder commented Mar 27, 2024 •

edited

Loading

MatthewMiddlehurst left a comment

MatthewMiddlehurst Mar 28, 2024

chrisholder Mar 28, 2024

MatthewMiddlehurst Mar 28, 2024

MatthewMiddlehurst Mar 28, 2024

chrisholder Mar 28, 2024

MatthewMiddlehurst Mar 28, 2024

chrisholder Mar 28, 2024

MatthewMiddlehurst commented Mar 28, 2024 •

edited

Loading

[MNT] CI add pip dependency caching #1352

Are you sure you want to change the base?

[MNT] CI add pip dependency caching #1352

Conversation

chrisholder commented Mar 27, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions

For new estimators and functions

For developers with write access

aeon-actions-bot bot commented Mar 27, 2024

Thank you for contributing to aeon

chrisholder commented Mar 27, 2024

TonyBagnall commented Mar 27, 2024

chrisholder commented Mar 27, 2024 • edited Loading

chrisholder commented Mar 27, 2024 • edited Loading

MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

MatthewMiddlehurst Mar 28, 2024

Choose a reason for hiding this comment

chrisholder Mar 28, 2024

Choose a reason for hiding this comment

MatthewMiddlehurst Mar 28, 2024

Choose a reason for hiding this comment

MatthewMiddlehurst Mar 28, 2024

Choose a reason for hiding this comment

chrisholder Mar 28, 2024

Choose a reason for hiding this comment

MatthewMiddlehurst Mar 28, 2024

Choose a reason for hiding this comment

chrisholder Mar 28, 2024

Choose a reason for hiding this comment

MatthewMiddlehurst commented Mar 28, 2024 • edited Loading

chrisholder commented Mar 27, 2024 •

edited

Loading

Thank you for contributing to `aeon`

chrisholder commented Mar 27, 2024 •

edited

Loading

chrisholder commented Mar 27, 2024 •

edited

Loading

MatthewMiddlehurst commented Mar 28, 2024 •

edited

Loading