|
| 1 | +========================================================================================== |
| 2 | +Drawbacks of installing source distributions (``sdist``) and how to improve predictability |
| 3 | +========================================================================================== |
| 4 | + |
| 5 | +The ``sdist`` format was one of the first packaging formats to be created by the |
| 6 | +Python community (predating the advent of ``wheel``). Although still very |
| 7 | +useful today to distribute and share Python libraries and applications, |
| 8 | +``sdist``\s are notoriously difficult to work with in circumstances that |
| 9 | +require high build reproducibility and tolerance to disruptions. |
| 10 | + |
| 11 | +This guide reviews the concept of ``sdist``, highlights its potential uses |
| 12 | +and drawbacks and explores potential practices to improve build reproducibility |
| 13 | +when relying on ``sdist``\s. |
| 14 | + |
| 15 | + |
| 16 | +What is an ``sdist``? |
| 17 | +===================== |
| 18 | + |
| 19 | +You can read more about the ``sdist`` format and its ``wheel`` counterpart |
| 20 | +in :doc:`/discussions/package-formats`, but for the sake of this document |
| 21 | +an ``sdist`` can be considered a simple ``.tar.gz`` archive that contains |
| 22 | +all the files necessary to build a Python project that later will be installed |
| 23 | +in the end-user's environment. |
| 24 | + |
| 25 | +The most defining characteristic of the ``sdist`` format is its |
| 26 | +platform-independence, as the distributions do not include binary executable files. |
| 27 | +This format is very flexible and, although usually composed by a simply copy |
| 28 | +of the source code files with some extra metadata files added, it can also include |
| 29 | +platform-independent code automatically generated during the build |
| 30 | +phase [#examples]_. |
| 31 | + |
| 32 | + |
| 33 | +When is an ``sdist`` useful? |
| 34 | +=========================== |
| 35 | + |
| 36 | +Sometimes it can be tricky to distribute Python packages that contain binary |
| 37 | +extensions, especially when they are built for platforms that do not define a |
| 38 | +cross-version stable ABI_. |
| 39 | +Moreover package indexes like PyPI_ may restrict their offer to a handful of |
| 40 | +well-known platforms. |
| 41 | +Finally, for certain edge cases, the build process may require machine specific |
| 42 | +parameters. |
| 43 | + |
| 44 | +In this context, distributing code via ``sdist``\s becomes a valuable fallback. |
| 45 | +It allows users in other platforms to access the source code |
| 46 | +and attempting to recompile the extensions locally. |
| 47 | + |
| 48 | + |
| 49 | +What are the drawbacks of an ``sdist``? |
| 50 | +======================================= |
| 51 | + |
| 52 | +Despite their usefulness, working with ``sdist``\s can be challenging. One |
| 53 | +major difficulty is reconstructing a compatible build environment in which the |
| 54 | +``sdist`` can be processed into a ``wheel``, especially when it comes to build |
| 55 | +dependencies. |
| 56 | + |
| 57 | +While :pep:`518` introduced a standard for declaring build dependencies |
| 58 | +distributed as Python packages (e.g. via PyPI), many projects also rely on |
| 59 | +non-Python dependencies, such as compilers and binary system-level libraries, |
| 60 | +that are not declared as a standard metadata. These dependencies can vary |
| 61 | +significantly across systems and its installation is often not automated and |
| 62 | +undocumented, i.e., simply assumed to be present. |
| 63 | + |
| 64 | +Another issue is *tooling drift*: even if a project was originally buildable |
| 65 | +from its ``sdist``, changes in the build dependencies (e.g., updates, |
| 66 | +deprecations and security fixes) can break compatibility over time [#pinning]_. |
| 67 | +This is a natural tendency of software systems and especially true for older |
| 68 | +projects. |
| 69 | + |
| 70 | +Therefore, mission-critical systems and environments that cannot afford |
| 71 | +unforeseen/unintended interruptions should not rely on ``sdist``\s. |
| 72 | +If your project or product requires high reliability and minimal disruption, |
| 73 | +you should adapt your workflow to increase resiliency and reproducibility or |
| 74 | +disallow ``sdist``\s all together. |
| 75 | + |
| 76 | + |
| 77 | +How to improve reproducibility in your workflow and avoid ``sdist`` drawbacks? |
| 78 | +============================================================================== |
| 79 | + |
| 80 | +The first step to improve your workflow is to determine whether your workflow |
| 81 | +is directly or indirectly relying on ``sdist``\s — and to prevent them from being |
| 82 | +compiled on demand. |
| 83 | + |
| 84 | +Installers like ``pip`` or ``uv`` have options that help with this. |
| 85 | +For example, you can set the environment variable |PIP_ONLY_BINARY|_ with |
| 86 | +the value ``:all:``, to prevent ``sdist``\s from being installed |
| 87 | +(see the corresponding `uv alternative`_). |
| 88 | +When this setting is enabled, any installation that fails will indicate which |
| 89 | +packages are not available as ``wheel``\s, helping you pinpoint installations |
| 90 | +relying on ``sdist``\s. |
| 91 | + |
| 92 | +Once these packages are identified, the next step is to build them in |
| 93 | +a controlled environment. |
| 94 | +You can use ``pip``\'s |PIP_CONSTRAINT|_ environment variable or the |
| 95 | +|build-constraint|_ ``uv``\'s CLI option to enforce specific versions of |
| 96 | +Python packages [#build-isolation]_. |
| 97 | + |
| 98 | +To further improve the consistency of OS-level tools and libraries, |
| 99 | +you can leverage your CI/CD provider's configuration method, for example |
| 100 | +`GitHub Workflows`_, `Bitbucket Pipelines`_, `GitLab CI/CD`_, Jenkins_, |
| 101 | +CircleCI_ or Semaphore_. |
| 102 | + |
| 103 | +Alternatively, you can use containers (e.g. docker_, nerdctl_ or podman_), |
| 104 | +immutable operating system distributions or package managers (e.g. `NixOS/Nix`_) |
| 105 | +or configuration management tools (e.g. Ansible_, chef_ or puppet_) |
| 106 | +to implement `Infrastructure as Code`_ (IaC) and ensure build environments |
| 107 | +are reproducible and version-controlled. |
| 108 | + |
| 109 | +Consider caching the resulting ``wheel``\s |
| 110 | +locally via |wheelhouse directories|_ or hosting them in |
| 111 | +*private package indexes* (such as devpi_). |
| 112 | +This allows you to serve pre-built distributions internally, |
| 113 | +which reduces reliance on external sources, improves build stability, |
| 114 | +and often results in faster workflows as a welcome side effect. |
| 115 | + |
| 116 | +Finally, it's important to regularly audit your pinned or cached (build) |
| 117 | +dependencies for known security vulnerabilities and critical bug fixes and/or |
| 118 | +update them accordingly. |
| 119 | +This can be done through an **out-of-band** workflow —- such as a scheduled job |
| 120 | +or a monthly CI/CD pipeline —- that does not interfere with your |
| 121 | +mission-critical or low-tolerance environments. This approach ensures that your |
| 122 | +systems remain secure and up to date without compromising the stability of your |
| 123 | +primary workflows. |
| 124 | + |
| 125 | + |
| 126 | +.. rubric:: Footnotes |
| 127 | + |
| 128 | +.. [#examples] |
| 129 | + Examples of platform-independent generated code in ``sdist``\s include |
| 130 | + ``.pyx`` files transpiled into ``.c`` and Python code created from |
| 131 | + ``.proto``, JSON schema or grammar files, etc. |
| 132 | +
|
| 133 | +.. [#pinning] |
| 134 | + Although developers can try to minimize the impact of tooling drift by |
| 135 | + locking the version of build dependencies, this approach also has |
| 136 | + its own drawbacks. In fact, it is very common in the Python community to |
| 137 | + avoid specifying version caps. For a deeper discussion on this topic, see: |
| 138 | + https://iscinumpy.dev/post/bound-version-constraints/ and |
| 139 | + https://hynek.me/articles/semver-will-not-save-you/. |
| 140 | +
|
| 141 | +.. [#build-isolation] |
| 142 | + When a virtual environment with hand picked versions of build |
| 143 | + dependencies is crafted (either manually or via tools supporting one of the |
| 144 | + :doc:`/specifications/pylock-toml` or :external+pip:doc:`reference/requirements-file-format`), |
| 145 | + it is also possible to use features like |no-isolation|_, |
| 146 | + |no-build-isolation|_ or the `equivalent uv settings`_ to ensure packages |
| 147 | + are built against the currently active virtual environment. |
| 148 | +
|
| 149 | +
|
| 150 | +.. _ABI: https://en.wikipedia.org/wiki/Application_binary_interface |
| 151 | +.. _PyPI: https://pypi.org |
| 152 | +.. |PIP_ONLY_BINARY| replace:: ``PIP_ONLY_BINARY`` |
| 153 | +.. _PIP_ONLY_BINARY: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-only-binary |
| 154 | +.. _uv alternative: https://docs.astral.sh/uv/reference/settings/#pip_only-binary |
| 155 | +.. |PIP_CONSTRAINT| replace:: ``PIP_CONSTRAINT`` |
| 156 | +.. _PIP_CONSTRAINT: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-c |
| 157 | +.. |build-constraint| replace:: ``--build-constraint`` |
| 158 | +.. _build-constraint: https://docs.astral.sh/uv/concepts/projects/build/#build-constraints |
| 159 | +.. _GitHub Workflows: https://docs.github.com/en/actions/writing-workflows |
| 160 | +.. _Bitbucket Pipelines: https://www.atlassian.com/software/bitbucket/features/pipelines |
| 161 | +.. _GitLab CI/CD: https://docs.gitlab.com/ci/ |
| 162 | +.. _Jenkins: https://www.jenkins.io/doc/ |
| 163 | +.. _CircleCI: https://circleci.com |
| 164 | +.. _Semaphore: https://semaphore.io |
| 165 | +.. _docker: https://www.docker.com |
| 166 | +.. _nerdctl: https://github.com/containerd/nerdctl |
| 167 | +.. _podman: https://podman.io |
| 168 | +.. _NixOS/Nix: https://nixos.org |
| 169 | +.. _Ansible: https://docs.ansible.com |
| 170 | +.. _chef: https://docs.chef.io |
| 171 | +.. _puppet: https://www.puppet.com/docs/index.html |
| 172 | +.. _Infrastructure as Code: https://en.wikipedia.org/wiki/Infrastructure_as_code |
| 173 | +.. |wheelhouse directories| replace:: *"wheelhouse" directories* |
| 174 | +.. _wheelhouse directories: https://pip.pypa.io/en/stable/cli/pip_wheel/#examples |
| 175 | +.. _devpi: https://doc.devpi.net/ |
| 176 | +.. |no-isolation| replace:: ``--no-isolation`` |
| 177 | +.. _no-isolation: https://build.pypa.io/en/stable/#python--m-build---no-isolation |
| 178 | +.. |no-build-isolation| replace:: ``--no-build-isolation`` |
| 179 | +.. _no-build-isolation: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-build-isolation |
| 180 | +.. _equivalent uv settings: https://docs.astral.sh/uv/concepts/projects/config/#build-isolation |
0 commit comments