Skip to content

Commit

Permalink
Improve cache handling (#1679)
Browse files Browse the repository at this point in the history
Refactors cache handling to dedicated functions under `lib/cache.sh`
(rather than being scattered around the buildpack), and makes the
following improvements:
- Ensures the cache is now also discards the cache when the package
  manager (or its version) changes.
- Improves the build log output shown when restoring or discarding the
  cache. For example, if the cache was invalidated all reasons are now
  shown.
- Stops performing unnecessary cache file copies when the cache is due
  to be invalidated. This required moving the cache restoration step to
  after the `bin/pre_compile` hook runs.
- Fixes cache restoration in the case where an app's `requirements.txt`
  was formerly a symlink.
- Adds buildpack metrics for the status of the cache and duration of
  cache restoration/saving.

Fixes #1673.
Fixes #1674.
Fixes #1675.
Fixes #1676.
Fixes #1677.
Fixes #1678.
Prep for #796.
Unblocks upgrading pip (since #1674 prevents pypa/pip#12950).
GUS-W-16811131.
  • Loading branch information
edmorley authored Oct 31, 2024
1 parent dba9b86 commit f00f258
Show file tree
Hide file tree
Showing 18 changed files with 322 additions and 168 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

## [Unreleased]

- Fixed cache handling so that it now also discards the cache when the package manager (or its version) changes. ([#1679](https://github.com/heroku/heroku-buildpack-python/pull/1679))
- Improved the build log output shown when restoring or discarding the cache. For example, if the cache was invalidated all reasons are now shown. ([#1679](https://github.com/heroku/heroku-buildpack-python/pull/1679))
- Stopped performing unnecessary cache file copies when the cache is due to be invalidated. This required moving the cache restoration step to after the `bin/pre_compile` hook runs. ([#1679](https://github.com/heroku/heroku-buildpack-python/pull/1679))
- Fixed cache restoration in the case where an app's `requirements.txt` was formerly a symlink. ([#1679](https://github.com/heroku/heroku-buildpack-python/pull/1679))
- Added buildpack metrics for the status of the cache and duration of cache restoration/saving. ([#1679](https://github.com/heroku/heroku-buildpack-python/pull/1679))

## [v262] - 2024-10-25

Expand Down
63 changes: 6 additions & 57 deletions bin/compile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ ENV_DIR="${3}"
BUILDPACK_DIR=$(cd "$(dirname "$(dirname "${BASH_SOURCE[0]}")")" && pwd)

source "${BUILDPACK_DIR}/bin/utils"
source "${BUILDPACK_DIR}/lib/cache.sh"
source "${BUILDPACK_DIR}/lib/hooks.sh"
source "${BUILDPACK_DIR}/lib/metadata.sh"
source "${BUILDPACK_DIR}/lib/output.sh"
Expand Down Expand Up @@ -93,56 +94,14 @@ export PIP_NO_PYTHON_VERSION_WARNING=1

cd "$BUILD_DIR"

# The Cache
# ---------

# The workflow for the Python Buildpack's cache is as follows:
#
# - `~/.heroku/{known-paths}` are copied from the cache into the slug.
# - The build is executed, modifying `~/.heroku/{known-paths}`.
# - Once the build is complete, `~/.heroku/{known-paths}` is copied back into the cache.

mkdir -p "$CACHE_DIR/.heroku"

# Restore old artifacts from the cache.
mkdir -p .heroku
# The Python installation.
cp -R "$CACHE_DIR/.heroku/python" .heroku/ &>/dev/null || true
# A plain text file which contains the current stack being used (used for cache busting).
cp -R "$CACHE_DIR/.heroku/python-stack" .heroku/ &>/dev/null || true
# A plain text file which contains the current python version being used (used for cache busting).
cp -R "$CACHE_DIR/.heroku/python-version" .heroku/ &>/dev/null || true
# A plain text file which contains the current sqlite3 version being used (used for cache busting).
cp -R "$CACHE_DIR/.heroku/python-sqlite3-version" .heroku/ &>/dev/null || true
# "editable" installations of code repositories, via pip or pipenv.
if [[ -d "$CACHE_DIR/.heroku/src" ]]; then
cp -R "$CACHE_DIR/.heroku/src" .heroku/ &>/dev/null || true
fi

# Runs a `bin/pre_compile` script if found in the app source, allowing build customisation.
hooks::run_hook "pre_compile"

# TODO: Clear the cache if this isn't a valid version, as part of the cache refactor.
# (Currently the version is instead validated in `read_requested_python_version()`)
if [[ -f "$CACHE_DIR/.heroku/python-version" ]]; then
cached_python_version="$(cat "${CACHE_DIR}/.heroku/python-version")"
# `python-X.Y.Z` -> `X.Y`
cached_python_version="${cached_python_version#python-}"
else
cached_python_version=
fi

# We didn't always record the stack version.
if [[ -f "$CACHE_DIR/.heroku/python-stack" ]]; then
CACHED_PYTHON_STACK=$(cat "$CACHE_DIR/.heroku/python-stack")
else
# shellcheck disable=SC2154 # TODO: Env var is referenced but not assigned.
CACHED_PYTHON_STACK=$STACK
fi

package_manager="$(package_manager::determine_package_manager "${BUILD_DIR}")"
meta_set "package_manager" "${package_manager}"

cached_python_version="$(cache::cached_python_version "${CACHE_DIR}")"

# We use the Bash 4.3+ `nameref` feature to pass back multiple values from this function
# without having to hardcode globals. See: https://stackoverflow.com/a/38997681
python_version::read_requested_python_version "${BUILD_DIR}" "${package_manager}" "${cached_python_version}" requested_python_version python_version_origin
Expand Down Expand Up @@ -170,6 +129,8 @@ python_major_version="${python_full_version%.*}"
meta_set "python_version" "${python_full_version}"
meta_set "python_version_major" "${python_major_version}"

cache::restore "${BUILD_DIR}" "${CACHE_DIR}" "${STACK:?}" "${cached_python_version}" "${python_full_version}" "${package_manager}"

# The directory for the .profile.d scripts.
mkdir -p "$(dirname "$PROFILE_PATH")"
# The directory for editable VCS dependencies.
Expand Down Expand Up @@ -300,18 +261,6 @@ cp "${BUILDPACK_DIR}/vendor/python.gunicorn.sh" "$GUNICORN_PROFILE_PATH"
# Runs a `bin/post_compile` script if found in the app source, allowing build customisation.
hooks::run_hook "post_compile"

# Store new artifacts in the cache.
rm -rf "$CACHE_DIR/.heroku/python"
rm -rf "$CACHE_DIR/.heroku/python-version"
rm -rf "$CACHE_DIR/.heroku/python-stack"
rm -rf "$CACHE_DIR/.heroku/src"

mkdir -p "$CACHE_DIR/.heroku"
cp -R .heroku/python "$CACHE_DIR/.heroku/"
cp -R .heroku/python-version "$CACHE_DIR/.heroku/"
cp -R .heroku/python-stack "$CACHE_DIR/.heroku/" &>/dev/null || true
if [[ -d .heroku/src ]]; then
cp -R .heroku/src "$CACHE_DIR/.heroku/" &>/dev/null || true
fi
cache::save "${BUILD_DIR}" "${CACHE_DIR}" "${STACK}" "${python_full_version}" "${package_manager}"

meta_time "total_duration" "${compile_start_time}"
3 changes: 3 additions & 0 deletions bin/report
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ kv_pair_string() {
}

STRING_FIELDS=(
cache_status
django_collectstatic
failure_reason
nltk_downloader
Expand All @@ -76,6 +77,8 @@ STRING_FIELDS=(

# We don't want to quote numeric or boolean fields.
ALL_OTHER_FIELDS=(
cache_restore_duration
cache_save_duration
dependencies_install_duration
django_collectstatic_duration
nltk_downloader_duration
Expand Down
89 changes: 22 additions & 67 deletions bin/steps/python
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#!/usr/bin/env bash
# shellcheck disable=SC2154 # TODO: Env var is referenced but not assigned.
# shellcheck disable=SC2250 # TODO: Use braces around variable references even when not strictly required.

set -euo pipefail

Expand All @@ -27,6 +26,28 @@ if ! curl --output /dev/null --silent --head --fail --retry 3 --retry-connrefuse
exit 1
fi

if [[ -f "${BUILD_DIR}/.heroku/python/bin/python" ]]; then
output::step "Using cached install of Python ${python_full_version}"
else
output::step "Installing Python ${python_full_version}"
mkdir -p "${BUILD_DIR}/.heroku/python"

if ! curl --silent --show-error --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}" | tar --zstd --extract --directory "${BUILD_DIR}/.heroku/python"; then
# The Python version was confirmed to exist previously, so any failure here is due to
# a networking issue or archive/buildpack bug rather than the runtime not existing.
output::error <<-EOF
Error: Failed to download/install Python ${python_full_version}.
In some cases, this happens due to an unstable network connection.
Please try again and to see if the error resolves itself.
EOF
meta_set "failure_reason" "python-download"
exit 1
fi

hash -r
fi

function warn_if_patch_update_available() {
local requested_full_version="${1}"
local requested_major_version="${2}"
Expand Down Expand Up @@ -69,69 +90,3 @@ if [[ "${python_major_version}" == "3.8" ]]; then
fi

warn_if_patch_update_available "${python_full_version}" "${python_major_version}"

if [[ "$STACK" != "$CACHED_PYTHON_STACK" ]]; then
output::step "Stack has changed from $CACHED_PYTHON_STACK to $STACK, clearing cache"
rm -rf .heroku/python-stack .heroku/python-version .heroku/python .heroku/vendor .heroku/python .heroku/python-sqlite3-version
fi

# TODO: Clean this up as part of the cache refactor.
if [[ -f .heroku/python-version ]]; then
if [[ "${cached_python_version}" != "${python_full_version}" ]]; then
output::step "Python version has changed from ${cached_python_version} to ${python_full_version}, clearing cache"
rm -rf .heroku/python
else
SKIP_INSTALL=1
fi
fi

# If using pip, check if we should reinstall python dependencies given that requirements.txt
# is non-deterministic (not all packages pinned, doesn't handle uninstalls etc). We don't need
# to do this when using Pipenv, since it has a lockfile and syncs the packages for us.
if [[ -f "${BUILD_DIR}/requirements.txt" ]]; then
if [[ ! -f "$CACHE_DIR/.heroku/requirements.txt" ]]; then
# This is a the first build of an app (or the build cache was cleared). Since there
# are no cached packages, we only need to store the requirements file for next time.
cp -R "$BUILD_DIR/requirements.txt" "$CACHE_DIR/.heroku/requirements.txt"
else
# IF there IS a cached directory, check for differences with the new one
if ! diff "$BUILD_DIR/requirements.txt" "$CACHE_DIR/.heroku/requirements.txt" &>/dev/null; then
output::step "Requirements file has been changed, clearing cached dependencies"
# if there are any differences, clear the Python cache
# Installing Python over again does not take noticably more time
cp -R "$BUILD_DIR/requirements.txt" "$CACHE_DIR/.heroku/requirements.txt"
rm -rf .heroku/python
unset SKIP_INSTALL
else
output::step "No change in requirements detected, installing from cache"
fi
fi
fi

if [[ "${SKIP_INSTALL:-0}" == "1" ]]; then
output::step "Using cached install of Python ${python_full_version}"
else
output::step "Installing Python ${python_full_version}"

# Prepare destination directory.
mkdir -p .heroku/python

if ! curl --silent --show-error --fail --retry 3 --retry-connrefused --connect-timeout 10 "${PYTHON_URL}" | tar --zstd --extract --directory .heroku/python; then
# The Python version was confirmed to exist previously, so any failure here is due to
# a networking issue or archive/buildpack bug rather than the runtime not existing.
output::error <<-EOF
Error: Failed to download/install Python ${python_full_version}.
In some cases, this happens due to an unstable network connection.
Please try again and to see if the error resolves itself.
EOF
meta_set "failure_reason" "python-download"
exit 1
fi

# Record for future reference.
echo "python-${python_full_version}" >.heroku/python-version
echo "$STACK" >.heroku/python-stack

hash -r
fi
3 changes: 0 additions & 3 deletions bin/steps/sqlite3
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,4 @@ buildpack_sqlite3_install() {
echo "Sqlite3 failed to install."
# mcount "failure.python.sqlite3"
fi

# shellcheck disable=SC2154 # TODO: Env var is referenced but not assigned.
mkdir -p "$CACHE_DIR/.heroku/"
}
Loading

0 comments on commit f00f258

Please sign in to comment.