Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNT] CI add pip dependency caching #1352

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

[MNT] CI add pip dependency caching #1352

wants to merge 17 commits into from

Conversation

chrisholder
Copy link
Contributor

@chrisholder chrisholder commented Mar 27, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds further caching to the CI allowing all dependencies and wheels to be cache'd and restored locally greatly speeding up the CI.

One of the main bottle necks for the CI currently is installing dependencies and building wheels. Currently every time the CI is ran dependencies are installed from pip and built. In the 'perfect' scenario this is fine and installs and building of wheels takes around 5 minutes or less. However, due to external factors such as network issues, a bad CI node and other factors sometimes this install and building step can take upwards of 20 minutes. Normally in every CI run, one runner takes 15+ minutes to install and build the dependencies. As the CI is only as fast as its slowest runner this presents a bottleneck.

This PR caches the dependencies and the wheels built each time a PR is merged to main (if pyproject.toml is updated). As a result the CI can the restore the dependencies and wheel on all branches and runs skipping the download and build phase. This reduces the time to install down to around 3 minutes in the best case and in the worse case (due to bad runner) 7 minutes. This greatly speeds up the CI as the slowest runner should be much faster.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors. Alternatively, you can use the @all-contributors bot to do this for you.
  • The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.
For new estimators and functions
  • I've added the estimator to the online API documentation.
  • (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.
For developers with write access
  • (OPTIONAL) I've updated aeon's CODEOWNERS to receive notifications about future changes to these files.

@aeon-actions-bot aeon-actions-bot bot added distances Distances package maintenance Continuous integration, unit testing & package distribution testing Testing related issue or pull request labels Mar 27, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#EC843A}{\textsf{maintenance}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#5209C9}{\textsf{distances}}$, $\color{#2C2F20}{\textsf{testing}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

@chrisholder
Copy link
Contributor Author

One key change worth documenting is in the cache script there is this step:

      - name: Install CPU version of pytorch on linux
        uses: nick-fields/retry@v3
        if: steps.cache.outputs.cache-hit != 'true' && runner.os == 'Linux'
        with:
          timeout_minutes: 30
          max_attempts: 3
          command: python -m pip install torch --index-url https://download.pytorch.org/whl/cpu

I found the default pip install torch to be multiple gbs in size because it installs all the GPU dependencies (which we dont use in the CI). I instead opt to only install the CPU dependencies for the cache which is only 200mbs of size. This allowed us to stay under the 10gb limit

@TonyBagnall
Copy link
Contributor

is this sped up?
image

@chrisholder
Copy link
Contributor Author

chrisholder commented Mar 27, 2024

The caching doesn't exist until merged onto main (as the cache's need to be made on main).

This is a run on my fork using the pip cache:
image

@chrisholder
Copy link
Contributor Author

chrisholder commented Mar 27, 2024

This is the usage on this branch that doesn't have any pip caching:
image

Overall the caching saves a hour over all tests

@MatthewMiddlehurst MatthewMiddlehurst removed the distances Distances package label Mar 28, 2024
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR caches the dependencies and the wheels built each time a PR is merged to main (if pyproject.toml is updated).

The majority of dependencies (even more for patch versions) will install new versions without any update to pyproject.toml. Our main indication that there is a breakage on fresh installations is the general CI failing due to an incompatible dependency update.

This change currently could potentially delay a response. Even if we catch this on the periodic tests and make an update, there is a potential for a mismatch of dependency versions used in that test and PR tests. Someone could probably pick at some security issues on being behind on versions, but I'm not going to do that.

tl;dr: I think we should update the cache more frequently 🙂.

Comment on lines -47 to -52
# GNU tar on windows runs much faster than the default BSD tar
- name: Use GNU tar instead BSD tar if Windows
if: ${{ inputs.runner_os == 'Windows' }}
shell: cmd
run: echo C:\Program Files\Git\usr\bin>>"%GITHUB_PATH%"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why? Is this the only action which does this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed this completely now since it doesn't make a difference now we're using the windows D drive (it seems to have fixed all the cache restoring issues)

.github/workflows/pr_examples.yml Outdated Show resolved Hide resolved
Comment on lines +107 to +112
- name: Set optimal environment variables for pip cache and restore
uses: ./.github/actions/pip_cache
with:
runner_os: ${{ runner.os }}
python_version: ${{ matrix.python-version }}
restore_cache: "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for these two jobs only?

.github/actions/pip_cache/action.yml Outdated Show resolved Hide resolved
Comment on lines 45 to 51
- name: Install CPU version of pytorch on linux
uses: nick-fields/retry@v3
if: steps.cache.outputs.cache-hit != 'true' && runner.os == 'Linux'
with:
timeout_minutes: 30
max_attempts: 3
command: python -m pip install torch --index-url https://download.pytorch.org/whl/cpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only Linux, are there no benefits for other OS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So pytorch by default on macos and windows will only install the CPU version unless you specify otherwise. Linux on the otherhand it installs everything (GPU version). This is why the Linux cache sizes were over 2GBs whereas windows and macos was like that 600 mbs.

.github/workflows/update_pip_cache.yml Outdated Show resolved Hide resolved
Comment on lines 1 to 5
name: Update pip cache on merge to main
on:
push:
branches:
- main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set it so this can be run manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it so it runs at the same time as periodic. The reason they are separate actions is because I don't want to use the cpu version of pytorch in periodic just in case something breaks because of using the cpu vs regular install (it should be identical)

@MatthewMiddlehurst
Copy link
Member

MatthewMiddlehurst commented Mar 28, 2024

Even with the comments and questions in the review, I really like the idea of this and hope it works out. Thanks for the work.

@chrisholder chrisholder marked this pull request as draft April 4, 2024 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Continuous integration, unit testing & package distribution testing Testing related issue or pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants