Skip to content

Commit

Permalink
Complete the major update/rewrite of the PEP
Browse files Browse the repository at this point in the history
  • Loading branch information
warsaw committed Sep 25, 2024
1 parent b7863f2 commit 0ae84ac
Showing 1 changed file with 128 additions and 37 deletions.
165 changes: 128 additions & 37 deletions peps/pep-0694.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
PEP: 694
Title: Upload 2.0 API for Python Package Repositories
Author: Donald Stufft <[email protected]>
Author: Donald Stufft <[email protected]>, Barry Warsaw <[email protected]>
Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879
Status: Draft
Type: Standards Track
Expand Down Expand Up @@ -159,7 +159,7 @@ Endpoints
Create an Upload Session
~~~~~~~~~~~~~~~~~~~~~~~~

To create a new upload session, you can send a ``POST`` request to ``/``
To create a new upload session, submit a ``POST`` request to ``/``
(i.e. the root URL), with a payload that looks like:

.. code-block:: json
Expand Down Expand Up @@ -187,7 +187,7 @@ The request includes the following top-level keys:
The version of the project that this session is attempting to add files to.

``nonce`` (**optional**)
An additional client-side string input to the `"session token" <session-token>`_
An additional client-side string input to the :ref:`"session token" <session-token>`
algorithm. Details are provided below, but if this key is omitted, it is equivalent
to passing the empty string.

Expand All @@ -206,9 +206,12 @@ The successful response includes the following JSON content:
},
"urls": {
"upload": "...",
"draft": "...",
"publish": "..."
"stage": "...",
"publish": "...",
"status": "...",
"cancel": "..."
},
"preview-token": "<token-string>",
"valid-for": 604800,
"status": "pending",
"files": {},
Expand All @@ -218,13 +221,19 @@ The successful response includes the following JSON content:
}
Besides the ``meta`` key, which has the same format as the POST JSON, the
Besides the ``meta`` key, which has the same format as the request JSON, the
success response has the following keys:

``urls``
A dictionary mapping :ref:`"identifiers" <url-identifiers>` to related
URLs to this session, the details of which are provided below.

``preview-token``
If the index supports :ref:`previewing staged releases <staged-preview>`, this key
will contain the unique :ref:`"preview token" <session-token>` that can be provided to
installer clients in order to preview the staged release before it's published. If
the index does *not* support stage previewing, this key **MUST** be omitted.

``valid-for``
An integer representing how long, in seconds, until the server itself will
expire this session (and thus all of the URLs contained in it). The
Expand All @@ -240,7 +249,7 @@ success response has the following keys:
``files``
A mapping containing the filenames that have been uploaded to this
session, to a mapping containing details about each :ref:`file referenced
in this session <session-files>`>
in this session <session-files>`.

``notices``
An optional key that points to an array of human-readable informational
Expand All @@ -257,9 +266,10 @@ For the ``urls`` key in the success JSON, the following subkeys are valid:
<file-uploads>` for each file that will be part of this upload session.

``stage``
The endpoint where these files are :ref:`available to be accessed
<staged-access>` prior to publishing the session. This can be used to
download and verify the not-yet-public files.
The endpoint where this staged release can be :ref:`previewed <staged-preview>` prior
to publishing the session. This can be used to download and verify the not-yet-public
files. If the index does not support previewing staged releases, this key **MUST** be
omitted.

``publish``
The endpoint which triggers :ref:`publishing this session <publish-session>`.
Expand All @@ -285,7 +295,7 @@ in this session to a sub-mapping with the following keys:
The *absolute* URL that the client should use to reference this specific file. This
URL is used to retrieve, replace or delete the referenced file. If a ``nonce`` was
provided, the URL **MUST** be obfuscated with a non-guessable token as described in
the `session token <session-token>`_ section.
the :ref:`session token <session-token>` section.

``notices``
An optional key with similar format and semantics as the ``notices``
Expand All @@ -296,6 +306,12 @@ session for that pair is already ``pending``, then the upload server **MUST**
return the already existing session JSON status, along with the ``200 Ok``
status code rather than creating a new, empty session.

If a session is created for a project which has no previous releases, then the index
**MAY** reserve the project name , however it **MUST NOT** be possible to navigate to that
project using the "regular" (i.e. :ref:`unstaged <staged-preview>`) access protocols,
*until* the stage is published. If this first-release stage gets canceled, then the index
**SHOULD** delete the project record, as if it were never uploaded.


.. _file-uploads:

Expand Down Expand Up @@ -378,11 +394,11 @@ as that requires fewer requests and typically has better performance.
However for particularly large files, uploading within a single request may result
in timeouts, so larger files may need to be uploaded in multiple chunks.

In either case, the client **MUST** generate a unique token (or nonce) for each upload for
a file, and **MUST** include that token in each request in the ``Upload-Token``
header. The ``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:``
on either side. Clients **SHOULD** use at least 32 bytes of cryptographically secure
data. For example, the following algorithm can be used:
In either case, the client **MUST** generate a unique token for each upload for a file,
and **MUST** include that token in each request in the ``Upload-Token`` header. The
``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` on either
side. Clients **SHOULD** use at least 32 bytes of cryptographically secure data. For
example, the following algorithm can be used:

.. code-block:: python
Expand All @@ -397,10 +413,10 @@ completely. In that case, they **MAY** omit the ``Upload-Token``, and the file m
successfully uploaded in a single HTTP request. If the non-chunked upload fails, the
entire file must be resent in another single HTTP request.

To upload the file in a single chunk, a client sends a ``POST`` request to the URL from
the session response for that filename. The client **MUST** include a ``Content-Length``
header that is equal to the size of the file in bytes, and this **MUST** match the size
given in the original session creation.
To upload the file in a single chunk, a client sends a ``POST`` request to the
``Location`` header URL from the session response for that filename. The client **MUST**
include a ``Content-Length`` header that is equal to the size of the file in bytes, and
this **MUST** match the size given in the original session creation.

As an example, if uploading a 100,000 byte file, you would send headers like::

Expand All @@ -422,6 +438,8 @@ header **MUST** be set to ``0``.
For example, if uploading a 100,000 byte file in 1000 byte chunks,the first chunk's
headers would be:

.. code-block:: email
Content-Length: 1000
Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=:
Upload-Offset: 0
Expand All @@ -430,6 +448,8 @@ headers would be:
And the second chunk represents bytes 1000 through 1999 would include the following
headers:

.. code-block:: email
Content-Length: 1000
Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=:
Upload-Offset: 1000
Expand All @@ -445,8 +465,8 @@ header, except for the final chunk, which **MUST** be a ``201 Created``, and as
non-chunked uploads, the body has not content.

With both chunked and non-chunked uploads, once completed successfully, the file **MUST**
not be publicly visible in the repository, but merely staged until the upload session has
completed.
not be publicly visible in the repository, but merely staged until the upload session is
:ref:`completed <publish-session>`.

The following constraints are placed on uploads regardless of whether they are
single chunk or multiple chunks:
Expand All @@ -460,7 +480,7 @@ single chunk or multiple chunks:
means that a client **MAY NOT** upload chunks out of order.
- Once an upload has started with a specific token, you may not use another token
for that file without deleting the in-progress upload.
- Once a file has uploaded successfully, you may initiate another upload for
- Once a file upload has completed successfully, you may initiate another upload for
that file, and doing so will replace that file. This is possible until the entire
session is completed, at which point no further file uploads (either creating or
replacing a session file) is accepted.
Expand Down Expand Up @@ -513,9 +533,9 @@ To replace a session file, the file upload **MUST** have been previously complet
deleted. It is not possible to replace a session file if the upload for that file is
incomplete. Clients have two options to replace an incomplete upload:

- `Cancel the in-progress upload <cancel-an-upload>`_ by issuing a ``DELETE`` of that
- :ref:`Cancel the in-progress upload <cancel-an-upload>` by issuing a ``DELETE`` of that
specific file. After this, the new file upload can be initiated.
- `Complete the in-progress upload <complete-the-upload>`_ by uploading a zero-length
- :ref:`Complete the in-progress upload <complete-the-upload>` by uploading a zero-length
chunk omitting the ``Upload-Incomplete`` header. This effectively truncates and
completes the in-progress upload, after which point the new upload can commence.

Expand Down Expand Up @@ -545,17 +565,16 @@ as before. The server then marks the session as canceled, **MAY** purge any data
uploaded as part of that session, and future attempts to access that session URL or any of
the file upload URLs **MAY** return a ``404 Not Found``.

To prevent a lot of dangling sessions, servers may also choose to cancel a
session on their own accord. It is recommended that servers expunge their
sessions after no less than a week, but each server may choose their own
schedule.
To prevent dangling sessions, servers may also choose to cancel timed-out sessions on
their own accord. It is recommended that servers expunge their sessions after no less than
a week, but each server may choose their own schedule.

.. _publish-session:

Session Completion
~~~~~~~~~~~~~~~~~~

To complete a session, and publish the files that have been included in it,
To complete a session and publish the files that have been included in it,
a client **MUST** send a ``POST`` request to the ``publish`` URL in the
session status payload.

Expand All @@ -569,12 +588,84 @@ In either case, the server should include a ``Location`` header pointing
back to the session status url, and if the server returned a ``202 Accepted``,
the client may poll that URL to watch for the status to change.

.. _session-errors:
It is an error to publish a session that has no staged files. In this case, a
``400 Bad Request`` is turned and the session is canceled, just as if an
explicit :ref:`session cancellation <session-cancellation>` was issued.

Session Previewing
~~~~~~~~~~~~~~~~~~
.. _session-token:

Session Token
~~~~~~~~~~~~~

When initiating the staged uploads, clients can provide a ``nonce``, essentially a string
with arbitrary content. The ``nonce`` is optional, and if omitted, is equivalent to
providing an empty string.

In order to support previewing of staged uploads, the package ``name`` and ``version``,
along with this ``nonce`` are used as input into a hashing algorithm to produce a unique
"session token". This session token is valid for the life of the session (i.e., until it
is completed, either by cancellation or publishing), and can be provided to installer
clients such as ``pip`` to gain access to the staged releases.

The use of the ``nonce`` allows clients to decide whether they want to obscure the
visibility of their staged releases or not, and there can be good reasons for either
choice.

The `SHA256 algorithm <https://docs.python.org/3/library/hashlib.html#hashlib.sha256>`_ is
used to turn these inputs into a unique token, in the order ``name``, ``version``,
``nonce``, using the following Python code as an example:

XXX TBD - talk about token
.. code-block:: python
from hashlib import sha256
def gentoken(name: bytes, version: bytes, nonce: bytes = b''):
h = sha256()
h.update(name)
h.update(version)
h.update(nonce)
return h.hexdigest()
It should be evident that if no ``nonce`` is provided in the session initiation request,
then the preview token is easily guessable from the package name and version number alone.
Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if they
want to allow previewing from anybody without access to the preview token. By providing a
non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not
protect staged files behind any kind of authentication.

.. _staged-preview:

Stage Previews
~~~~~~~~~~~~~~

The ability to preview staged releases before they are published is an important feature,
enabling an additional level of last-mile testing before the release is available to the
public. Indexes **MAY** provide this functionality in one or both of the following ways.

* Through the URL provided in the ``stage`` subkey of the :ref:`URL
identifiers <url-identifiers>` returned when the session is created. The
``stage`` URL can be passed to installers such as ``pip`` by setting the
`--extra-index-url
<https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-extra-index-url>`_
flag to this value. Multiple stages can even be previewed by repeating this
flag with multiple values.

* By passing the ``Stage-Token`` header to the `Simple Repository API
<https://packaging.python.org/en/latest/specifications/simple-repository-api/>`_
requests or the :pep:`691` JSON-based Simple API, with the value from the
``preview-token`` subkey of the JSON response to the session creation
request. Multiple ``Stage-Token`` headers are allowed. It is recommended
that installers add a ``--staged <token>`` or similarly named option to set
the ``Stage-Token`` header at the command line.

In both cases, the index will return views that expose the staged releases to the
installer tool, making them available to download and install into a virtual environment
built for that last-mile testing. The former option allows for existing installers to
preview staged releases with no changes, although perhaps in a less user-friendly way.
The latter option can be a better user experience, but the details of this are left to
installer tool maintainers to decide.

.. _session-errors:

Errors
------
Expand Down Expand Up @@ -722,7 +813,7 @@ Multipart Uploads vs tus
------------------------

This PEP currently bases the actual uploading of files on an internet draft
from tus.io that supports resumable file uploads.
from ``tus.io`` that supports resumable file uploads.

That protocol requires a few things:

Expand All @@ -746,7 +837,7 @@ The other benefit is that even if you do want to support resumption, you can
still just ``POST`` the file, and unless you *need* to resume the download,
that's all you have to do.

Another, possibly theoretical, benefit is that for hashing the uploaded files,
Another, possibly theoretical benefit is that for hashing the uploaded files,
the serial chunks requirement means that the server can maintain hashing state
between requests, update it for each request, then write that file back to
storage. Unfortunately this isn't actually possible to do with Python's hashlib,
Expand Down Expand Up @@ -807,7 +898,7 @@ It does have its own downsides:
- See above about whether this is actually a downside in practice, or
if it's just in theory.

I lean towards the tus style resumable uploads as I think they're simpler
I lean towards the ``tus`` style resumable uploads as I think they're simpler
to use and to implement, and the main downside is that we possibly leave
some multi-threaded performance on the table, which I think that I'm
personally fine with?
Expand Down

0 comments on commit 0ae84ac

Please sign in to comment.