Skip to content

Commit

Permalink
Parallelize bytecode compilation ✨
Browse files Browse the repository at this point in the history
Bytecode compilation is slow. It's often one of the biggest contributors
to the install step's sluggishness. For better or worse, we can't really
enable --no-compile by default as it has the potential to render certain
workflows permanently slower in a subtle way.[^1]

To improve the situation, bytecode compilation can be parallelized
across a pool of processes (or sub-interpreters on Python 3.14). I've
observed a 1.1x to 3x improvement in install step times locally.[^2]

This patch has been written to be relatively comprehensible, but for
posterity, these are the high-level implementation notes:

- We can't use compileall.compile_dir() because it spins up a new worker
  pool on every invocation. If it's used as a "drop-in" replacement for
  compileall.compile_file(), then the pool creation overhead will be
  paid for every package installed. This is bad and kills most of the
  gains. Redesigning the installation logic to compile everything at the
  end was rejected for being too invasive (a key goal was to avoid
  affecting the package installation order).

- A bytecode compiler is created right before package installation
  starts and reused for all packages. Depending on platform and
  workload, either a serial (in-process) compiler or parallel compiler
  will be used. They both have the same interface, accepting a batch of
  Python filepaths to compile.

- This patch was designed to as low-risk as reasonably possible. pip
  does not contain any parallelized code, thus introducing any sort of
  parallelization poses a nontrivial risk. To minimize this risk, the
  only code parallelized is the bytecode compilation code itself (~10
  LOC). In addition, the package install order is unaffected and pip
  will fall back to serial compilation if parallelization is unsupported.

The criteria for parallelization are:

1. There are at least 2 CPUs available. The process CPU count is used
   if available, otherwise the system CPU count. If there is only one
   CPU, serial compilation will always be used because even a parallel
   compiler with one worker will add extra overhead.

2. The maximum amount of workers is at least 2. This is controlled by
   the --install-jobs option.[^3] It defaults to "auto" which uses the
   process/system CPU count.[^4]

3. There is "enough" code for parallelization to be "worth it". This
   criterion exists so pip won't waste (say) 100ms on spinning up a
   parallel compiler when compiling serially would only take 20ms.[^5]
   The limit is set to 1 MB of Python code. This is admittedly rather
   crude, but it seems to work well enough having tested on a variety of
   systems.

[^1]: Basically, if the Python files are installed to a read-only
      directory, then importing those files will be permanently slower
      as the .pyc files will never be cached. This is quite subtle,
      enough so that we can't really expect newbies to recognise and
      know how to address this (there is the PYTHONPYCACHEPREFIX
      envvar, but if you're advanced enough to use it, then you are
      also advanced enough to know when to use uv or pip's
      --no-compile).

[^2]: The 1.1x was on a painfully slow dual-core/HDD-equipped Windows
      install installing simply setuptools. The 3x was observed on my
      main 8-core Ryzen 5800HS Windows machine while installing pip's
      own test dependencies.

[^3]: Yes, this is probably not the best name, but adding an option for
      just bytecode compilation seems silly. Anyway, this will give us
      room if we ever parallelize more parts of the install step.

[^4]: Up to a hard-coded limit of 8 to avoid resource exhaustion. This
      number was chosen arbitrarily, but is definitely high enough to
      net a major improvement.

[^5]: This is important because I don't want to slow down tiny installs
      (e.g., pip install six ... or our own test suite). Creating a new
      process is prohibitively expensive on Windows (and to a lesser
      degree on macOS) for various reasons, so parallelization can't be
      simply used all of time.
  • Loading branch information
ichard26 committed Feb 27, 2025
1 parent 331400c commit 15ce2bf
Show file tree
Hide file tree
Showing 8 changed files with 439 additions and 31 deletions.
4 changes: 4 additions & 0 deletions news/13247.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Bytecode compilation is parallelized to significantly speed up installation of
large/many packages. By default, the number of workers matches the available CPUs
(up to a hard-coded limit), but can be adjusted using the ``--install-jobs``
option. To disable parallelization, pass ``--install-jobs 1``.
33 changes: 33 additions & 0 deletions src/pip/_internal/cli/cmdoptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1070,6 +1070,39 @@ def check_list_path_option(options: Values) -> None:
)


def _handle_jobs(
option: Option, opt_str: str, value: str, parser: OptionParser
) -> None:
if value == "auto":
setattr(parser.values, option.dest, "auto")
return

try:
if (count := int(value)) > 0:
setattr(parser.values, option.dest, count)
return
except ValueError:
pass

msg = "should be a positive integer or 'auto'"
raise_option_error(parser, option=option, msg=msg)


install_jobs: Callable[..., Option] = partial(
Option,
"--install-jobs",
dest="install_jobs",
default="auto",
type=str,
action="callback",
callback=_handle_jobs,
help=(
"Maximum number of workers to use while installing packages. "
"To disable parallelization, pass 1. (default: %default)"
),
)


##########
# groups #
##########
Expand Down
7 changes: 7 additions & 0 deletions src/pip/_internal/commands/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,8 @@ def add_options(self) -> None:
),
)

self.cmd_opts.add_option(cmdoptions.install_jobs())

@with_cleanup
def run(self, options: Values, args: List[str]) -> int:
if options.use_user_site and options.target_dir is not None:
Expand Down Expand Up @@ -416,6 +418,10 @@ def run(self, options: Values, args: List[str]) -> int:
# we're not modifying it.
modifying_pip = pip_req.satisfied_by is None
protect_pip_from_modification_on_windows(modifying_pip=modifying_pip)
if modifying_pip:
# Parallelization will re-import pip when starting new workers
# during installation which is unsafe if pip is being modified.
options.install_jobs = 1

reqs_to_build = [
r
Expand Down Expand Up @@ -465,6 +471,7 @@ def run(self, options: Values, args: List[str]) -> int:
use_user_site=options.use_user_site,
pycompile=options.compile,
progress_bar=options.progress_bar,
workers=options.install_jobs,
)

lib_locations = get_lib_location_guesses(
Expand Down
38 changes: 12 additions & 26 deletions src/pip/_internal/operations/install/wheel.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
"""Support for installing and building the "wheel" binary package format."""

import collections
import compileall
import contextlib
import csv
import importlib
import logging
import os.path
import re
import shutil
import sys
import warnings
from base64 import urlsafe_b64encode
from email.message import Message
from io import StringIO
from itertools import chain, filterfalse, starmap
from typing import (
IO,
Expand Down Expand Up @@ -51,6 +47,7 @@
from pip._internal.models.scheme import SCHEME_KEYS, Scheme
from pip._internal.utils.filesystem import adjacent_tmp_file, replace
from pip._internal.utils.misc import ensure_dir, hash_file, partition
from pip._internal.utils.pyc_compile import BytecodeCompiler
from pip._internal.utils.unpacking import (
current_umask,
is_within_directory,
Expand Down Expand Up @@ -417,12 +414,12 @@ def make(
return super().make(specification, options)


def _install_wheel( # noqa: C901, PLR0915 function is too long
def _install_wheel( # noqa: C901 function is too long
name: str,
wheel_zip: ZipFile,
wheel_path: str,
scheme: Scheme,
pycompile: bool = True,
pycompiler: Optional[BytecodeCompiler],
warn_script_location: bool = True,
direct_url: Optional[DirectUrl] = None,
requested: bool = False,
Expand Down Expand Up @@ -601,25 +598,14 @@ def pyc_source_file_paths() -> Generator[str, None, None]:
continue
yield full_installed_path

def pyc_output_path(path: str) -> str:
"""Return the path the pyc file would have been written to."""
return importlib.util.cache_from_source(path)

# Compile all of the pyc files for the installed files
if pycompile:
with contextlib.redirect_stdout(StringIO()) as stdout:
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
for path in pyc_source_file_paths():
success = compileall.compile_file(path, force=True, quiet=True)
if success:
pyc_path = pyc_output_path(path)
assert os.path.exists(pyc_path)
pyc_record_path = cast(
"RecordPath", pyc_path.replace(os.path.sep, "/")
)
record_installed(pyc_record_path, pyc_path)
logger.debug(stdout.getvalue())
if pycompiler is not None:
for module in pycompiler(pyc_source_file_paths()):
if module.is_success:
pyc_record_path = module.pyc_path.replace(os.path.sep, "/")
record_installed(RecordPath(pyc_record_path), module.pyc_path)
if output := module.compile_output:
logger.debug(output)

maker = PipScriptMaker(None, scheme.scripts)

Expand Down Expand Up @@ -718,7 +704,7 @@ def install_wheel(
wheel_path: str,
scheme: Scheme,
req_description: str,
pycompile: bool = True,
pycompiler: Optional[BytecodeCompiler] = None,
warn_script_location: bool = True,
direct_url: Optional[DirectUrl] = None,
requested: bool = False,
Expand All @@ -730,7 +716,7 @@ def install_wheel(
wheel_zip=z,
wheel_path=wheel_path,
scheme=scheme,
pycompile=pycompile,
pycompiler=pycompiler,
warn_script_location=warn_script_location,
direct_url=direct_url,
requested=requested,
Expand Down
41 changes: 38 additions & 3 deletions src/pip/_internal/req/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
import collections
import logging
from contextlib import nullcontext
from dataclasses import dataclass
from typing import Generator, List, Optional, Sequence, Tuple
from functools import partial
from typing import Generator, Iterable, List, Optional, Sequence, Tuple
from zipfile import ZipFile

from pip._internal.cli.progress_bars import get_install_progress_renderer
from pip._internal.utils.logging import indent_log
from pip._internal.utils.pyc_compile import WorkerSetting, create_bytecode_compiler

from .req_file import parse_requirements
from .req_install import InstallRequirement
Expand Down Expand Up @@ -33,6 +37,28 @@ def _validate_requirements(
yield req.name, req


def _does_python_size_surpass_threshold(
requirements: Iterable[InstallRequirement], threshold: int
) -> bool:
"""Inspect wheels to check whether there is enough .py code to
enable bytecode parallelization.
"""
py_size = 0
for req in requirements:
if not req.local_file_path or not req.is_wheel:
# No wheel to inspect as this is a legacy editable.
continue

with ZipFile(req.local_file_path, allowZip64=True) as wheel_file:
for entry in wheel_file.infolist():
if entry.filename.endswith(".py"):
py_size += entry.file_size
if py_size > threshold:
return True

return False


def install_given_reqs(
requirements: List[InstallRequirement],
global_options: Sequence[str],
Expand All @@ -43,6 +69,7 @@ def install_given_reqs(
use_user_site: bool,
pycompile: bool,
progress_bar: str,
workers: WorkerSetting,
) -> List[InstallationResult]:
"""
Install everything in the given list.
Expand All @@ -68,7 +95,15 @@ def install_given_reqs(
)
items = renderer(items)

with indent_log():
if pycompile:
code_size_check = partial(
_does_python_size_surpass_threshold, to_install.values()
)
pycompiler = create_bytecode_compiler(workers, code_size_check)
else:
pycompiler = None

with indent_log(), pycompiler or nullcontext():
for requirement in items:
req_name = requirement.name
assert req_name is not None
Expand All @@ -87,7 +122,7 @@ def install_given_reqs(
prefix=prefix,
warn_script_location=warn_script_location,
use_user_site=use_user_site,
pycompile=pycompile,
pycompiler=pycompiler,
)
except Exception:
# if install did not succeed, rollback previous uninstall
Expand Down
5 changes: 3 additions & 2 deletions src/pip/_internal/req/req_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
redact_auth_from_url,
)
from pip._internal.utils.packaging import get_requirement
from pip._internal.utils.pyc_compile import BytecodeCompiler
from pip._internal.utils.subprocess import runner_with_spinner_message
from pip._internal.utils.temp_dir import TempDirectory, tempdir_kinds
from pip._internal.utils.unpacking import unpack_file
Expand Down Expand Up @@ -812,7 +813,7 @@ def install(
prefix: Optional[str] = None,
warn_script_location: bool = True,
use_user_site: bool = False,
pycompile: bool = True,
pycompiler: Optional[BytecodeCompiler] = None,
) -> None:
assert self.req is not None
scheme = get_scheme(
Expand Down Expand Up @@ -869,7 +870,7 @@ def install(
self.local_file_path,
scheme=scheme,
req_description=str(self.req),
pycompile=pycompile,
pycompiler=pycompiler,
warn_script_location=warn_script_location,
direct_url=self.download_info if self.is_direct else None,
requested=self.user_supplied,
Expand Down
Loading

0 comments on commit 15ce2bf

Please sign in to comment.