Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update "install_requires vs. requirements files" discussion #1427

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
111 changes: 62 additions & 49 deletions source/discussions/install-requires-vs-requirements.rst
webknjaz marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,89 +1,102 @@
.. _`install_requires vs requirements files`:
jeanas marked this conversation as resolved.
Show resolved Hide resolved

======================================
install_requires vs requirements files
======================================
============================================
Metadata dependencies vs. requirements files
============================================

There are two main places where you will find list of "needed packages to
install", perhaps with version constraints, like ``requests`` or
``requests==2.31.0``. These are metadata dependencies, typically in a
:file:`pyproject.toml` (or :file:`setup.py`) file, and requirements files, often called
:file:`requirements.txt`. This page breaks down the differences.

install_requires
----------------
Metadata dependencies
=====================

``install_requires`` is a :ref:`setuptools` :file:`setup.py` keyword that
should be used to specify what a project **minimally** needs to run correctly.
When the project is installed by :ref:`pip`, this is the specification that is
used to install its dependencies.
Packages can declare dependencies, i.e. other packages that they need to
function. The standard method to do so is to set the :ref:`dependencies key
<writing-pyproject-toml-dependencies>` in the ``[project]`` section of a
:file:`pyproject.toml` file -- although other :term:`build backends <build backend>`
may use different methods. There can also be groups of optional dependencies,
also called "extras", which are typically specified in the
``optional-dependencies`` key of the ``[project]`` table. Both dependencies and
extras are ultimately written by the build backend to the package's distribution
metadata. On this page, we'll refer to these as "metadata dependencies".

For example, if the project requires A and B, your ``install_requires`` would be
like so:
When installing a package, installers like :ref:`pip` will automatically resolve
the metadata dependencies and install them. They should be used for packages that the project
**minimally** needs to run correctly.

::
For example, suppose the project requires A and B. When using the ``[project]``
table to declare metadata, the :file:`pyproject.toml` would be like so:

install_requires=[
'A',
'B'
]
.. code-block:: toml

[project]
dependencies = ["A", "B"]

Additionally, it's best practice to indicate any known lower or upper bounds.

For example, it may be known, that your project requires at least v1 of 'A', and
v2 of 'B', so it would be like so:
v2 of 'B'.

::
.. code-block:: toml

install_requires=[
'A>=1',
'B>=2'
]
[project]
dependencies = [
"A >= 1",
"B >= 2"
]

It may also be known that project 'A' introduced a change in its v2
that breaks the compatibility of your project with v2 of 'A' and later,
so it makes sense to not allow v2:

::
.. code-block:: toml

install_requires=[
'A>=1,<2',
'B>=2'
]
[project]
dependencies = [
"A >= 1, < 2",
"B >= 2"
]

It is not considered best practice to use ``install_requires`` to pin
dependencies to specific versions, or to specify sub-dependencies
(i.e. dependencies of your dependencies). This is overly-restrictive, and
It is not considered best practice to use metadata dependencies to pin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, it seems weird to read of this as "metadata dependencies". Abstract dependencies can be in requirements files, concrete pins can be in constraint files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your comment, would you mind expanding please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you introduce a concept of metadata requirements (which I feel weird about by itself for some reason), and later on the document suggests that the requirements are opposite because they can contain pip options. The reality is that the requirements aren't opposite. They can have the same properties and be abstract.

I argue that more places need to be updated to reduce such confusion. What you call "metadata" deps refers to the location of specification. Just like "requirements". But that's about it. "requirements" can have the both same property of being abstract and the same semantics, except for the tool-specific additions, while the package metadata is standardized.

Perhaps I should've started this thread in a different place in the diff, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @webknjaz - "Metadata dependencies" aren't a thing. "Dependencies" is the correct term, and remains the correct term regardless of where they are declared.

Requirements files hold requirements - because they are pip-specific, some of the subtleties of what a "requirement" is in this context are pip-dependent, but basically you can consider them as concrete requirements used to build an environment or application. There's a subtle but important difference between "dependencies" and concrete requirements, but this text isn't making it very clear.

Constraints are a whole different thing, very much pip specific. They add limits to specifiers (concrete requirements or abstract dependencies) when resolving them, but they aren't dependencies themselves. And if that last sentence seems hard to follow, that's because we don't have well-defined terms for everything we're discussing here 🙁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @webknjaz - "Metadata dependencies" aren't a thing. "Dependencies" is the correct term, and remains the correct term regardless of where they are declared.

In the abstract, I agree. But I don't want to use just “dependencies” to explain the distinction “dependencies vs. requirements files”, because you can often hear that “the dependencies to do XXX are in this requirements files”, so a more precise term is needed to help Joe User understand that this isn't the same meaning of “dependency”. “Metadata dependencies” is the best I could come up with (I also considered “package dependencies”). Do you have another suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The terms really are confused. We don't have well-defined terminology here, and getting consensus on terminology isn't something we should be doing in a discussion on a single PR.

We had this debate before, a long time ago. The term "package" is horribly overloaded, so we coined the terms "distribution package" and "install package". No-one uses them. They use "package", project", and probably other terms as well, and work out what is meant from context. It sucks, but that's human communication for you 😉

And to make things worse here, requirement files aren't standardised. So how can we have standard terminology for something that's fundamentally tool-specific? Someone using hatch or PDM to manage their "dependencies for running the script that deploys to the internal test server" may be using something that's not a requirement file, but performs the same logical function. But their tool may not (for example) support using local filenames as requirements, unlike requirement files.

The existing text seems to come from a background of requirement-files-as-lockfiles. Which is certainly one valid use case, but by no means the only one.

dependencies to specific versions, or to specify transitive dependencies
(i.e. dependencies of your dependencies). This is overly restrictive, and
prevents the user from gaining the benefit of dependency upgrades.

Lastly, it's important to understand that ``install_requires`` is a listing of
"Abstract" requirements, i.e just names and version restrictions that don't
determine where the dependencies will be fulfilled from (i.e. from what
index or source). The where (i.e. how they are to be made "Concrete") is to
be determined at install time using :ref:`pip` options. [1]_
Lastly, it's important to understand that metadata dependencies are "abstract"
requirements, i.e. just names and version restrictions, but don't determine
where the dependencies will be fulfilled from (from what package index or
source). The where (i.e. how they are to be made "concrete") is to be determined
at install time, e.g. using :ref:`pip` options. [1]_


Requirements files
------------------
==================

:ref:`Requirements Files <pip:Requirements Files>` described most simply, are
:ref:`Requirements Files <pip:Requirements Files>`, described most simply, are
just a list of :ref:`pip:pip install` arguments placed into a file.

Whereas ``install_requires`` defines the dependencies for a single project,
:ref:`Requirements Files <pip:Requirements Files>` are often used to define
the requirements for a complete Python environment.
Whereas metadata dependencies define the dependencies for a single
project, requirements files are often used to define the requirements
for a complete Python environment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just re-wording the original text, but frankly I don't like it.

metadata dependencies define the dependencies for a single project

Well, obviously. It's project metadata. This doesn't seem to be saying anything useful. And it's not a useful contrast with requirement files (IMO).

requirements files are often used to define the requirements for a complete Python environment

Requirement files can be used for many things. They are a mechanism (and a pip-specific one, at that) and don't have any particular "intended use". People use requirement files to define the concrete dependencies of a standalone application, pinning exact versions to offer some level of reproducibility. That's arguably "the dependencies for the (application) project", but in a very different sense than the dependencies in pyproject.toml (if the application even has a pyproject.toml).

What's really being discussed here is the "abstract requirements vs concrete requirements" distinction, which is about concepts, not about how those concepts are implemented. The implementation is (and should be!) irrelevant, as long as the concepts are used correctly. But the implementation is important, because it's tricky to map concrete/abstract requirements onto the features available - whether we restrict ourselves to solely standards-defined features, or allow for tool-specific ones (which may be pip's requirement files, or whatever features hatch/pdm/poetry provide for "adding requirements to a project").

Things are made even harder because of the mess that is projects which are in fact applications, but which are built as if they were libraries (with entry point metadata providing a command line executable). That method of application deployment mixes up abstract and concrete requirements in a way that cannot be disentangled by a general document like this is intended to be.

I don't have a good answer here. In a very real sense, this is way too complex to be appropriate for an introductory document like this is supposed to be. But conversely, new users need some guidance on how to set up their dependencies, if for no other reason than many of the examples available to them are terrible, and won't teach them good practices 🙁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's really being discussed here is the "abstract requirements vs concrete requirements" distinction, which is about concepts, not about how those concepts are implemented.

Ok, I understand better where you're coming from now.

We have packages which can define metadata dependencies which are always abstract, and we have requirements files which define dependencies which can be either abstract or concrete. You're framing the document as “what are abstract vs concrete dependencies”. On the other hand, I am framing it as “what are metadata dependencies vs requirements files” (as in the title).

Yes, abstract vs concrete dependencies is important here, and we need to explain the link, but realistically, the question Joe Random Beginner is going to ask themselves is “I saw package names in pyproject.toml and also in requirements.txt, what's the difference?” and that's what they're going to Google for. There could be a separate document to explain abstract vs concrete dependencies or a new section, but it's not what I'm trying to explain here (and it's not what the page before this PR is explaining either).

The text you quoted said “are often used”, not “are used”, so it's correct, even though it doesn't give details. Those details IMHO don't belong here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements files which define dependencies which can be either abstract or concrete

Nope. I don't agree. Requirement files define requirements. They may not be "dependencies" at all, in that there may be nothing (except in a very abstract sense) that actually depends on them. In pip, a "requirement" is a very real thing. Unfortunately, it's not something that's captured in any standard, so you can't talk about (pip's) requirements in a tool-agnostic context (which this is, at least in the sense that it's not the pip documentation).

realistically, the question Joe Random Beginner is going to ask themselves is “I saw package names in pyproject.toml and also in requirements.txt, what's the difference?”

Understood. But the real answer is "well, it's complicated..."

Do we want to give an over-simplified, and often wrong, answer here, and risk reinforcing an incorrect understanding? Personally, I don't think we do, but the original authors of this page thought we did. Maybe you prefer to just paper over the cracks and do (in effect) a search and replace on the idea install_requires, replacing it with "dependencies as specified in pyproject.toml". I don't think that's a worthwhile thing to spend time on, and I don't have any insight into how to make it "correct"1.

The text you quoted said “are often used”, not “are used”, so it's correct, even though it doesn't give details. Those details IMHO don't belong here.

I have no stomach for nitpicking over this. I don't think2 it's right to tie requirement files to the idea of being "the definition of an environment" like this. Take that opinion or leave it. It's not as if your PR added that text anyway, so you can legitimately say you don't intend to change the sense of the text, just the specific fact that install_requires is out of date.

Footnotes

  1. Because to my mind, it simply isn't correct no matter how you word it.

  2. As a pip maintainer, someone who is fundamentally responsible for defining what requirement files are for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as if your PR added that text anyway, so you can legitimately say you don't intend to change the sense of the text, just the specific fact that install_requires is out of date.

🤷 It was not my intent to do more than that, but reviewers seemed to be against merging without more improvements.

Honestly, I'm having trouble understanding everything you write, probably because I lack context on pip's implementation and the terms used in it. For example, I'm not sure I completely get the distinction you make between dependencies and requirements. (I was using them in the same sense.)

If your problem is with excessive framing of requirements files as “defining environments”, then I'm happy to change that. But fundamentally, yes, I'm really just trying to get rid of setuptools-specific language and trying to do whatever it takes to get this merged.

I'm not as pessimistic as you on the possibility of getting the page to a useful state, though — and, importantly, I don't think we can just delete it without adding a redirect from it to some explanation somewhere (so it might as well be here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is a discussion, not a specification — the level we're trying to take people from is “I am writing a package for PyPI, where should I write my dependencies, pyproject.toml or requirements.txt”. There's a tradeoff we have to make with “little white lies” that are needed to keep the text understandable for laymen even if it occasionally hurts the eyes of a pip maintainer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm having trouble understanding everything you write, probably because I lack context on pip's implementation and the terms used in it.

To a large extent, that's my fault - because we don't have well-defined meanings for the various terms we're using, it's easy to misunderstand each other. I didn't have time to clearly define and explain the concepts I was using so I went with what I thought was "common usage". Clearly it's not (demonstrating nicely the problem here...).

Let me try to clarify. This is simply my view on what needs to be part of everyone's "shared understanding". I don't want to get sucked into the question of "how do we teach this", as long as whatever we do present doesn't lead to a different understanding, or reinforces misconceptions that are commonly held in the community.

For me, the distinction between "concrete requirements" and "abstract requirements" is crucial, and should be a core part of any vocabulary we use when talking about dependencies/requirements/etc. The caremad article linked from this page remains a good description of the concepts - I don't know if any better description has been published since.

Building on that, the next important concept I see is the idea of what pip calls "top level requirements" - the things that the user requests an installer actually installs into an environment. These can be concrete or abstract, the key point here is that the user is explicitly asking for them. Pip's requirement files are simply a way of bundling together a set of such top-level requirements for reusability. Whether you fully pin the contents of a requirement file, or whether you view what's in such a file as concrete or abstract requirements depends entirely on what you plan on using the requirement file for. Note that other tools can have different ideas of what counts as a "top level requirement", or how to enable reusability. One such idea is that of "adding a requirement to a project", which IMO isn't very well-defined, but is nevertheless a common thing for people to want to do1. I can't comment on how the idea of "adding a requirement to a project" fits into this, as I don't work on a tool that supports such an idea, and as a user I've never really understood what tools that do offer such a concept actually mean by it...

At the next level, we have package dependencies, which are stored in the package metadata, and typically defined in the pyproject.toml file in the project source tree. These are abstract requirements, and say what else must be present for that package to function correctly. An installer will add these requirements to the top-level requirements requested by the user, as part of the process of dependency resolution, which turns an install request into an actionable list of packages and versions to install.

And that's it, IMO. With these concepts, a user stands a reasonable chance of making good decisions on how to express their own project's requirements and dependencies, and of understanding how other projects do so.

A note on terminology: In the above, I'm using the specific term "requirement" (which isn't standardised) in the way pip uses it - to mean "something that can be installed". Typically this is what is referred to in the standards as a "dependency specifier", but it's not always a dependency, and individual tools may allow extra options (pip allows local paths, for example). Yes, this is confusing. But I didn't decide on the terminology, and it's IMO too late to expect people to change at this point. So we live with it (and from a teaching point of view, making new users aware that the world isn't entirely perfect is a good learning experience!)

I hope this is a little clearer. As I said, it's just my view on how these ideas hang together. You're free to do whatever you want with it - I don't want to get sucked into editorial discussions on "how do we express this in the packaging guide" because I'll only end up burning out if I do.

Footnotes

  1. Step one should be "clarify the purpose of the requirement" 🙂


Whereas ``install_requires`` requirements are minimal, requirements files
Whereas metadata dependencies requirements are minimal, requirements files
often contain an exhaustive listing of pinned versions for the purpose of
jeanas marked this conversation as resolved.
Show resolved Hide resolved
achieving :ref:`repeatable installations <pip:Repeatability>` of a complete
environment.

Whereas ``install_requires`` requirements are "Abstract", i.e. not associated
with any particular index, requirements files often contain pip
options like ``--index-url`` or ``--find-links`` to make requirements
"Concrete", i.e. associated with a particular index or directory of
packages. [1]_
Whereas metadata dependencies are "abstract", i.e. not associated with any
particular index, requirements files often contain pip options like
``--index-url`` or ``--find-links`` to make requirements "concrete", i.e.
associated with a particular index or directory of packages. [1]_

Whereas ``install_requires`` metadata is automatically analyzed by pip during an
Whereas metadata dependencies are automatically analyzed by pip during an
install, requirements files are not, and only are used when a user specifically
installs them using ``python -m pip install -r``.
installs them using :samp:`python -m pip install -r {requirement_file.txt}`.
jeanas marked this conversation as resolved.
Show resolved Hide resolved

----

.. [1] For more on "Abstract" vs "Concrete" requirements, see
.. [1] For more on "abstract" vs "concrete" requirements, see
https://caremad.io/posts/2013/07/setup-vs-requirement/.
2 changes: 0 additions & 2 deletions source/guides/distributing-packages-using-setuptools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,6 @@ package, set ``py_modules`` to a list of the names of the modules (minus the
minimally needs to run. When the project is installed by :ref:`pip`, this is the
specification that is used to install its dependencies.

For more on using "install_requires" see :ref:`install_requires vs Requirements files`.



.. _`Package Data`:
Expand Down
2 changes: 2 additions & 0 deletions source/guides/writing-pyproject-toml.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ details.
Dependencies and requirements
=============================

.. _writing-pyproject-toml-dependencies:

``dependencies``/``optional-dependencies``
------------------------------------------

Expand Down
Loading