-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Complete the major update/rewrite of the PEP
- Loading branch information
Showing
1 changed file
with
128 additions
and
37 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
PEP: 694 | ||
Title: Upload 2.0 API for Python Package Repositories | ||
Author: Donald Stufft <[email protected]> | ||
Author: Donald Stufft <[email protected]>, Barry Warsaw <[email protected]> | ||
Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879 | ||
Status: Draft | ||
Type: Standards Track | ||
|
@@ -159,7 +159,7 @@ Endpoints | |
Create an Upload Session | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To create a new upload session, you can send a ``POST`` request to ``/`` | ||
To create a new upload session, submit a ``POST`` request to ``/`` | ||
(i.e. the root URL), with a payload that looks like: | ||
|
||
.. code-block:: json | ||
|
@@ -187,7 +187,7 @@ The request includes the following top-level keys: | |
The version of the project that this session is attempting to add files to. | ||
|
||
``nonce`` (**optional**) | ||
An additional client-side string input to the `"session token" <session-token>`_ | ||
An additional client-side string input to the :ref:`"session token" <session-token>` | ||
algorithm. Details are provided below, but if this key is omitted, it is equivalent | ||
to passing the empty string. | ||
|
||
|
@@ -206,9 +206,12 @@ The successful response includes the following JSON content: | |
}, | ||
"urls": { | ||
"upload": "...", | ||
"draft": "...", | ||
"publish": "..." | ||
"stage": "...", | ||
"publish": "...", | ||
"status": "...", | ||
"cancel": "..." | ||
}, | ||
"preview-token": "<token-string>", | ||
"valid-for": 604800, | ||
"status": "pending", | ||
"files": {}, | ||
|
@@ -218,13 +221,19 @@ The successful response includes the following JSON content: | |
} | ||
Besides the ``meta`` key, which has the same format as the POST JSON, the | ||
Besides the ``meta`` key, which has the same format as the request JSON, the | ||
success response has the following keys: | ||
|
||
``urls`` | ||
A dictionary mapping :ref:`"identifiers" <url-identifiers>` to related | ||
URLs to this session, the details of which are provided below. | ||
|
||
``preview-token`` | ||
If the index supports :ref:`previewing staged releases <staged-preview>`, this key | ||
will contain the unique :ref:`"preview token" <session-token>` that can be provided to | ||
installer clients in order to preview the staged release before it's published. If | ||
the index does *not* support stage previewing, this key **MUST** be omitted. | ||
|
||
``valid-for`` | ||
An integer representing how long, in seconds, until the server itself will | ||
expire this session (and thus all of the URLs contained in it). The | ||
|
@@ -240,7 +249,7 @@ success response has the following keys: | |
``files`` | ||
A mapping containing the filenames that have been uploaded to this | ||
session, to a mapping containing details about each :ref:`file referenced | ||
in this session <session-files>`> | ||
in this session <session-files>`. | ||
|
||
``notices`` | ||
An optional key that points to an array of human-readable informational | ||
|
@@ -257,9 +266,10 @@ For the ``urls`` key in the success JSON, the following subkeys are valid: | |
<file-uploads>` for each file that will be part of this upload session. | ||
|
||
``stage`` | ||
The endpoint where these files are :ref:`available to be accessed | ||
<staged-access>` prior to publishing the session. This can be used to | ||
download and verify the not-yet-public files. | ||
The endpoint where this staged release can be :ref:`previewed <staged-preview>` prior | ||
to publishing the session. This can be used to download and verify the not-yet-public | ||
files. If the index does not support previewing staged releases, this key **MUST** be | ||
omitted. | ||
|
||
``publish`` | ||
The endpoint which triggers :ref:`publishing this session <publish-session>`. | ||
|
@@ -285,7 +295,7 @@ in this session to a sub-mapping with the following keys: | |
The *absolute* URL that the client should use to reference this specific file. This | ||
URL is used to retrieve, replace or delete the referenced file. If a ``nonce`` was | ||
provided, the URL **MUST** be obfuscated with a non-guessable token as described in | ||
the `session token <session-token>`_ section. | ||
the :ref:`session token <session-token>` section. | ||
|
||
``notices`` | ||
An optional key with similar format and semantics as the ``notices`` | ||
|
@@ -296,6 +306,12 @@ session for that pair is already ``pending``, then the upload server **MUST** | |
return the already existing session JSON status, along with the ``200 Ok`` | ||
status code rather than creating a new, empty session. | ||
|
||
If a session is created for a project which has no previous releases, then the index | ||
**MAY** reserve the project name , however it **MUST NOT** be possible to navigate to that | ||
project using the "regular" (i.e. :ref:`unstaged <staged-preview>`) access protocols, | ||
*until* the stage is published. If this first-release stage gets canceled, then the index | ||
**SHOULD** delete the project record, as if it were never uploaded. | ||
|
||
|
||
.. _file-uploads: | ||
|
||
|
@@ -378,11 +394,11 @@ as that requires fewer requests and typically has better performance. | |
However for particularly large files, uploading within a single request may result | ||
in timeouts, so larger files may need to be uploaded in multiple chunks. | ||
|
||
In either case, the client **MUST** generate a unique token (or nonce) for each upload for | ||
a file, and **MUST** include that token in each request in the ``Upload-Token`` | ||
header. The ``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` | ||
on either side. Clients **SHOULD** use at least 32 bytes of cryptographically secure | ||
data. For example, the following algorithm can be used: | ||
In either case, the client **MUST** generate a unique token for each upload for a file, | ||
and **MUST** include that token in each request in the ``Upload-Token`` header. The | ||
``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` on either | ||
side. Clients **SHOULD** use at least 32 bytes of cryptographically secure data. For | ||
example, the following algorithm can be used: | ||
|
||
.. code-block:: python | ||
|
@@ -397,10 +413,10 @@ completely. In that case, they **MAY** omit the ``Upload-Token``, and the file m | |
successfully uploaded in a single HTTP request. If the non-chunked upload fails, the | ||
entire file must be resent in another single HTTP request. | ||
|
||
To upload the file in a single chunk, a client sends a ``POST`` request to the URL from | ||
the session response for that filename. The client **MUST** include a ``Content-Length`` | ||
header that is equal to the size of the file in bytes, and this **MUST** match the size | ||
given in the original session creation. | ||
To upload the file in a single chunk, a client sends a ``POST`` request to the | ||
``Location`` header URL from the session response for that filename. The client **MUST** | ||
include a ``Content-Length`` header that is equal to the size of the file in bytes, and | ||
this **MUST** match the size given in the original session creation. | ||
|
||
As an example, if uploading a 100,000 byte file, you would send headers like:: | ||
|
||
|
@@ -422,6 +438,8 @@ header **MUST** be set to ``0``. | |
For example, if uploading a 100,000 byte file in 1000 byte chunks,the first chunk's | ||
headers would be: | ||
|
||
.. code-block:: email | ||
Content-Length: 1000 | ||
Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: | ||
Upload-Offset: 0 | ||
|
@@ -430,6 +448,8 @@ headers would be: | |
And the second chunk represents bytes 1000 through 1999 would include the following | ||
headers: | ||
|
||
.. code-block:: email | ||
Content-Length: 1000 | ||
Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: | ||
Upload-Offset: 1000 | ||
|
@@ -445,8 +465,8 @@ header, except for the final chunk, which **MUST** be a ``201 Created``, and as | |
non-chunked uploads, the body has not content. | ||
|
||
With both chunked and non-chunked uploads, once completed successfully, the file **MUST** | ||
not be publicly visible in the repository, but merely staged until the upload session has | ||
completed. | ||
not be publicly visible in the repository, but merely staged until the upload session is | ||
:ref:`completed <publish-session>`. | ||
|
||
The following constraints are placed on uploads regardless of whether they are | ||
single chunk or multiple chunks: | ||
|
@@ -460,7 +480,7 @@ single chunk or multiple chunks: | |
means that a client **MAY NOT** upload chunks out of order. | ||
- Once an upload has started with a specific token, you may not use another token | ||
for that file without deleting the in-progress upload. | ||
- Once a file has uploaded successfully, you may initiate another upload for | ||
- Once a file upload has completed successfully, you may initiate another upload for | ||
that file, and doing so will replace that file. This is possible until the entire | ||
session is completed, at which point no further file uploads (either creating or | ||
replacing a session file) is accepted. | ||
|
@@ -513,9 +533,9 @@ To replace a session file, the file upload **MUST** have been previously complet | |
deleted. It is not possible to replace a session file if the upload for that file is | ||
incomplete. Clients have two options to replace an incomplete upload: | ||
|
||
- `Cancel the in-progress upload <cancel-an-upload>`_ by issuing a ``DELETE`` of that | ||
- :ref:`Cancel the in-progress upload <cancel-an-upload>` by issuing a ``DELETE`` of that | ||
specific file. After this, the new file upload can be initiated. | ||
- `Complete the in-progress upload <complete-the-upload>`_ by uploading a zero-length | ||
- :ref:`Complete the in-progress upload <complete-the-upload>` by uploading a zero-length | ||
chunk omitting the ``Upload-Incomplete`` header. This effectively truncates and | ||
completes the in-progress upload, after which point the new upload can commence. | ||
|
||
|
@@ -545,17 +565,16 @@ as before. The server then marks the session as canceled, **MAY** purge any data | |
uploaded as part of that session, and future attempts to access that session URL or any of | ||
the file upload URLs **MAY** return a ``404 Not Found``. | ||
|
||
To prevent a lot of dangling sessions, servers may also choose to cancel a | ||
session on their own accord. It is recommended that servers expunge their | ||
sessions after no less than a week, but each server may choose their own | ||
schedule. | ||
To prevent dangling sessions, servers may also choose to cancel timed-out sessions on | ||
their own accord. It is recommended that servers expunge their sessions after no less than | ||
a week, but each server may choose their own schedule. | ||
|
||
.. _publish-session: | ||
|
||
Session Completion | ||
~~~~~~~~~~~~~~~~~~ | ||
|
||
To complete a session, and publish the files that have been included in it, | ||
To complete a session and publish the files that have been included in it, | ||
a client **MUST** send a ``POST`` request to the ``publish`` URL in the | ||
session status payload. | ||
|
||
|
@@ -569,12 +588,84 @@ In either case, the server should include a ``Location`` header pointing | |
back to the session status url, and if the server returned a ``202 Accepted``, | ||
the client may poll that URL to watch for the status to change. | ||
|
||
.. _session-errors: | ||
It is an error to publish a session that has no staged files. In this case, a | ||
``400 Bad Request`` is turned and the session is canceled, just as if an | ||
explicit :ref:`session cancellation <session-cancellation>` was issued. | ||
|
||
Session Previewing | ||
~~~~~~~~~~~~~~~~~~ | ||
.. _session-token: | ||
|
||
Session Token | ||
~~~~~~~~~~~~~ | ||
|
||
When initiating the staged uploads, clients can provide a ``nonce``, essentially a string | ||
with arbitrary content. The ``nonce`` is optional, and if omitted, is equivalent to | ||
providing an empty string. | ||
|
||
In order to support previewing of staged uploads, the package ``name`` and ``version``, | ||
along with this ``nonce`` are used as input into a hashing algorithm to produce a unique | ||
"session token". This session token is valid for the life of the session (i.e., until it | ||
is completed, either by cancellation or publishing), and can be provided to installer | ||
clients such as ``pip`` to gain access to the staged releases. | ||
|
||
The use of the ``nonce`` allows clients to decide whether they want to obscure the | ||
visibility of their staged releases or not, and there can be good reasons for either | ||
choice. | ||
|
||
The `SHA256 algorithm <https://docs.python.org/3/library/hashlib.html#hashlib.sha256>`_ is | ||
used to turn these inputs into a unique token, in the order ``name``, ``version``, | ||
``nonce``, using the following Python code as an example: | ||
|
||
XXX TBD - talk about token | ||
.. code-block:: python | ||
from hashlib import sha256 | ||
def gentoken(name: bytes, version: bytes, nonce: bytes = b''): | ||
h = sha256() | ||
h.update(name) | ||
h.update(version) | ||
h.update(nonce) | ||
return h.hexdigest() | ||
It should be evident that if no ``nonce`` is provided in the session initiation request, | ||
then the preview token is easily guessable from the package name and version number alone. | ||
Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if they | ||
want to allow previewing from anybody without access to the preview token. By providing a | ||
non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not | ||
protect staged files behind any kind of authentication. | ||
|
||
.. _staged-preview: | ||
|
||
Stage Previews | ||
~~~~~~~~~~~~~~ | ||
|
||
The ability to preview staged releases before they are published is an important feature, | ||
enabling an additional level of last-mile testing before the release is available to the | ||
public. Indexes **MAY** provide this functionality in one or both of the following ways. | ||
|
||
* Through the URL provided in the ``stage`` subkey of the :ref:`URL | ||
identifiers <url-identifiers>` returned when the session is created. The | ||
``stage`` URL can be passed to installers such as ``pip`` by setting the | ||
`--extra-index-url | ||
<https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-extra-index-url>`_ | ||
flag to this value. Multiple stages can even be previewed by repeating this | ||
flag with multiple values. | ||
|
||
* By passing the ``Stage-Token`` header to the `Simple Repository API | ||
<https://packaging.python.org/en/latest/specifications/simple-repository-api/>`_ | ||
requests or the :pep:`691` JSON-based Simple API, with the value from the | ||
``preview-token`` subkey of the JSON response to the session creation | ||
request. Multiple ``Stage-Token`` headers are allowed. It is recommended | ||
that installers add a ``--staged <token>`` or similarly named option to set | ||
the ``Stage-Token`` header at the command line. | ||
|
||
In both cases, the index will return views that expose the staged releases to the | ||
installer tool, making them available to download and install into a virtual environment | ||
built for that last-mile testing. The former option allows for existing installers to | ||
preview staged releases with no changes, although perhaps in a less user-friendly way. | ||
The latter option can be a better user experience, but the details of this are left to | ||
installer tool maintainers to decide. | ||
|
||
.. _session-errors: | ||
|
||
Errors | ||
------ | ||
|
@@ -722,7 +813,7 @@ Multipart Uploads vs tus | |
------------------------ | ||
|
||
This PEP currently bases the actual uploading of files on an internet draft | ||
from tus.io that supports resumable file uploads. | ||
from ``tus.io`` that supports resumable file uploads. | ||
|
||
That protocol requires a few things: | ||
|
||
|
@@ -746,7 +837,7 @@ The other benefit is that even if you do want to support resumption, you can | |
still just ``POST`` the file, and unless you *need* to resume the download, | ||
that's all you have to do. | ||
|
||
Another, possibly theoretical, benefit is that for hashing the uploaded files, | ||
Another, possibly theoretical benefit is that for hashing the uploaded files, | ||
the serial chunks requirement means that the server can maintain hashing state | ||
between requests, update it for each request, then write that file back to | ||
storage. Unfortunately this isn't actually possible to do with Python's hashlib, | ||
|
@@ -807,7 +898,7 @@ It does have its own downsides: | |
- See above about whether this is actually a downside in practice, or | ||
if it's just in theory. | ||
|
||
I lean towards the tus style resumable uploads as I think they're simpler | ||
I lean towards the ``tus`` style resumable uploads as I think they're simpler | ||
to use and to implement, and the main downside is that we possibly leave | ||
some multi-threaded performance on the table, which I think that I'm | ||
personally fine with? | ||
|