Skip to content

Commit 0b6d205

Browse files
committed
Add guide about sdist predictability
1 parent b5c0c67 commit 0b6d205

File tree

2 files changed

+181
-0
lines changed

2 files changed

+181
-0
lines changed

source/guides/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,4 @@ introduction to packaging, see :doc:`/tutorials/index`.
1313
section-hosting
1414
tool-recommendations
1515
analyzing-pypi-package-downloads
16+
sdist-drawbacks-and-predictability
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
==========================================================================================
2+
Drawbacks of installing source distributions (``sdist``) and how to improve predictability
3+
==========================================================================================
4+
5+
The ``sdist`` format was one of the first packaging formats to be created by the
6+
Python community (predating the advent of ``wheel``). Although still very
7+
useful today to distribute and share Python libraries and applications,
8+
``sdist``\s are notoriously difficult to work with in circumstances that
9+
require high build reproducibility and tolerance to disruptions.
10+
11+
This guide reviews the concept of ``sdist``, highlights its potential uses
12+
and drawbacks and explores potential practices to improve build reproducibility
13+
when relying on ``sdist``\s.
14+
15+
16+
What is an ``sdist``?
17+
=====================
18+
19+
You can read more about the ``sdist`` format and its ``wheel`` counterpart
20+
in :doc:`/discussions/package-formats`, but for the sake of this document
21+
an ``sdist`` can be considered a simple ``.tar.gz`` archive that contains
22+
all the files necessary to build a Python project that later will be installed
23+
in the end-user's environment.
24+
25+
The most defining characteristic of the ``sdist`` format is its
26+
platform-independence, as the distributions do not include binary executable files.
27+
This format is very flexible and, although usually composed by a simply copy
28+
of the source code files with some extra metadata files added, it can also include
29+
platform-independent code automatically generated during the build
30+
phase [#examples]_.
31+
32+
33+
When is an ``sdist`` useful?
34+
===========================
35+
36+
Sometimes it can be tricky to distribute Python packages that contain binary
37+
extensions, especially when they are built for platforms that do not define a
38+
cross-version stable ABI_.
39+
Moreover package indexes like PyPI_ may restrict their offer to a handful of
40+
well-known platforms.
41+
Finally, for certain edge cases, the build process may require machine specific
42+
parameters.
43+
44+
In this context, distributing code via ``sdist``\s becomes a valuable fallback.
45+
It allows users in other platforms to access the source code
46+
and attempting to recompile the extensions locally.
47+
48+
49+
What are the drawbacks of an ``sdist``?
50+
=======================================
51+
52+
Despite their usefulness, working with ``sdist``\s can be challenging. One
53+
major difficulty is reconstructing a compatible build environment in which the
54+
``sdist`` can be processed into a ``wheel``, especially when it comes to build
55+
dependencies.
56+
57+
While :pep:`518` introduced a standard for declaring build dependencies
58+
distributed as Python packages (e.g. via PyPI), many projects also rely on
59+
non-Python dependencies, such as compilers and binary system-level libraries,
60+
that are not declared as a standard metadata. These dependencies can vary
61+
significantly across systems and its installation is often not automated and
62+
undocumented, i.e., simply assumed to be present.
63+
64+
Another issue is *tooling drift*: even if a project was originally buildable
65+
from its ``sdist``, changes in the build dependencies (e.g., updates,
66+
deprecations and security fixes) can break compatibility over time [#pinning]_.
67+
This is a natural tendency of software systems and especially true for older
68+
projects.
69+
70+
Therefore, mission-critical systems and environments that cannot afford
71+
unforeseen/unintended interruptions should not rely on ``sdist``\s.
72+
If your project or product requires high reliability and minimal disruption,
73+
you should adapt your workflow to increase resiliency and reproducibility or
74+
disallow ``sdist``\s all together.
75+
76+
77+
How to improve reproducibility in your workflow and avoid ``sdist`` drawbacks?
78+
==============================================================================
79+
80+
The first step to improve your workflow is to determine whether your workflow
81+
is directly or indirectly relying on ``sdist``\s — and to prevent them from being
82+
compiled on demand.
83+
84+
Installers like ``pip`` or ``uv`` have options that help with this.
85+
For example, you can set the environment variable |PIP_ONLY_BINARY|_ with
86+
the value ``:all:``, to prevent ``sdist``\s from being installed
87+
(see the corresponding `uv alternative`_).
88+
When this setting is enabled, any installation that fails will indicate which
89+
packages are not available as ``wheel``\s, helping you pinpoint installations
90+
relying on ``sdist``\s.
91+
92+
Once these packages are identified, the next step is to build them in
93+
a controlled environment.
94+
You can use ``pip``\'s |PIP_CONSTRAINT|_ environment variable or the
95+
|build-constraint|_ ``uv``\'s CLI option to enforce specific versions of
96+
Python packages [#build-isolation]_.
97+
98+
To further improve the consistency of OS-level tools and libraries,
99+
you can leverage your CI/CD provider's configuration method, for example
100+
`GitHub Workflows`_, `Bitbucket Pipelines`_, `GitLab CI/CD`_, Jenkins_,
101+
CircleCI_ or Semaphore_.
102+
103+
Alternatively, you can use containers (e.g. docker_, nerdctl_ or podman_),
104+
immutable operating system distributions or package managers (e.g. `NixOS/Nix`_)
105+
or configuration management tools (e.g. Ansible_, chef_ or puppet_)
106+
to implement `Infrastructure as Code`_ (IaC) and ensure build environments
107+
are reproducible and version-controlled.
108+
109+
Consider caching the resulting ``wheel``\s
110+
locally via |wheelhouse directories|_ or hosting them in
111+
*private package indexes* (such as devpi_).
112+
This allows you to serve pre-built distributions internally,
113+
which reduces reliance on external sources, improves build stability,
114+
and often results in faster workflows as a welcome side effect.
115+
116+
Finally, it's important to regularly audit your pinned or cached (build)
117+
dependencies for known security vulnerabilities and critical bug fixes and/or
118+
update them accordingly.
119+
This can be done through an **out-of-band** workflow —- such as a scheduled job
120+
or a monthly CI/CD pipeline —- that does not interfere with your
121+
mission-critical or low-tolerance environments. This approach ensures that your
122+
systems remain secure and up to date without compromising the stability of your
123+
primary workflows.
124+
125+
126+
.. rubric:: Footnotes
127+
128+
.. [#examples]
129+
Examples of platform-independent generated code in ``sdist``\s include
130+
``.pyx`` files transpiled into ``.c`` and Python code created from
131+
``.proto``, JSON schema or grammar files, etc.
132+
133+
.. [#pinning]
134+
Although developers can try to minimize the impact of tooling drift by
135+
locking the version of build dependencies, this approach also has
136+
its own drawbacks. In fact, it is very common in the Python community to
137+
avoid specifying version caps. For a deeper discussion on this topic, see:
138+
https://iscinumpy.dev/post/bound-version-constraints/ and
139+
https://hynek.me/articles/semver-will-not-save-you/.
140+
141+
.. [#build-isolation]
142+
When a virtual environment with hand picked versions of build
143+
dependencies is crafted (either manually or via tools supporting one of the
144+
:doc:`/specifications/pylock-toml` or :external+pip:doc:`reference/requirements-file-format`),
145+
it is also possible to use features like |no-isolation|_,
146+
|no-build-isolation|_ or the `equivalent uv settings`_ to ensure packages
147+
are built against the currently active virtual environment.
148+
149+
150+
.. _ABI: https://en.wikipedia.org/wiki/Application_binary_interface
151+
.. _PyPI: https://pypi.org
152+
.. |PIP_ONLY_BINARY| replace:: ``PIP_ONLY_BINARY``
153+
.. _PIP_ONLY_BINARY: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-only-binary
154+
.. _uv alternative: https://docs.astral.sh/uv/reference/settings/#pip_only-binary
155+
.. |PIP_CONSTRAINT| replace:: ``PIP_CONSTRAINT``
156+
.. _PIP_CONSTRAINT: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-c
157+
.. |build-constraint| replace:: ``--build-constraint``
158+
.. _build-constraint: https://docs.astral.sh/uv/concepts/projects/build/#build-constraints
159+
.. _GitHub Workflows: https://docs.github.com/en/actions/writing-workflows
160+
.. _Bitbucket Pipelines: https://www.atlassian.com/software/bitbucket/features/pipelines
161+
.. _GitLab CI/CD: https://docs.gitlab.com/ci/
162+
.. _Jenkins: https://www.jenkins.io/doc/
163+
.. _CircleCI: https://circleci.com
164+
.. _Semaphore: https://semaphore.io
165+
.. _docker: https://www.docker.com
166+
.. _nerdctl: https://github.com/containerd/nerdctl
167+
.. _podman: https://podman.io
168+
.. _NixOS/Nix: https://nixos.org
169+
.. _Ansible: https://docs.ansible.com
170+
.. _chef: https://docs.chef.io
171+
.. _puppet: https://www.puppet.com/docs/index.html
172+
.. _Infrastructure as Code: https://en.wikipedia.org/wiki/Infrastructure_as_code
173+
.. |wheelhouse directories| replace:: *"wheelhouse" directories*
174+
.. _wheelhouse directories: https://pip.pypa.io/en/stable/cli/pip_wheel/#examples
175+
.. _devpi: https://doc.devpi.net/
176+
.. |no-isolation| replace:: ``--no-isolation``
177+
.. _no-isolation: https://build.pypa.io/en/stable/#python--m-build---no-isolation
178+
.. |no-build-isolation| replace:: ``--no-build-isolation``
179+
.. _no-build-isolation: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-build-isolation
180+
.. _equivalent uv settings: https://docs.astral.sh/uv/concepts/projects/config/#build-isolation

0 commit comments

Comments
 (0)