diff --git a/peps/pep-0694.rst b/peps/pep-0694.rst index 30e3a32b8bd..d48402dbfb3 100644 --- a/peps/pep-0694.rst +++ b/peps/pep-0694.rst @@ -1,6 +1,6 @@ PEP: 694 Title: Upload 2.0 API for Python Package Repositories -Author: Donald Stufft +Author: Donald Stufft , Barry Warsaw Discussions-To: https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879 Status: Draft Type: Standards Track @@ -38,26 +38,27 @@ Beyond the above, there are a number of major issues with the current API: not go last, possibly some hard to build packages are attempting to be built from source. -- It has very limited support for communicating back to the user, with no support - for multiple errors, warnings, deprecations, etc. It is limited entirely to the - HTTP status code and reason phrase, of which the reason phrase has been - deprecated since HTTP/2 (:rfc:`RFC 7540 <7540#section-8.1.2.4>`). +- It has very limited support for communicating back to the user, with no + support for multiple errors, warnings, deprecations, etc. It is limited + entirely to the HTTP status code and reason phrase, of which the reason + phrase has been deprecated since HTTP/2 (:rfc:`RFC 7540 + <7540#section-8.1.2.4>`). -- The metadata for a release/file is submitted alongside the file, however this - metadata is famously unreliable, and most installers instead choose to download - the entire file and read that in part due to that unreliability. +- The metadata for a release/file is submitted alongside the file, however + this metadata is famously unreliable, and most installers instead choose to + download the entire file and read that in part due to that unreliability. - There is no mechanism for allowing a repository to do any sort of sanity checks before bandwidth starts getting expended on an upload, whereas a lot of the cases of invalid metadata or incorrect permissions could be checked prior to upload. -- It has no support for "staging" a draft release prior to publishing it to the +- It has no support for "staging" a release prior to publishing it to the repository. - It has no support for creating new projects, without uploading a file. -This PEP proposes a new API for uploads, and deprecates the existing non standard +This PEP proposes a new API for uploads, and deprecates the existing legacy API. @@ -122,10 +123,11 @@ roughly two things: - This is actually fine if used as a pre-check, but it should be validated against the actual ``METADATA`` or similar files within the distribution. -- It supports a single request, using nothing but form data, that either succeeds +- It supports only a single request, using nothing but form data, that either succeeds or fails, and everything is done and contained within that single request. -We then propose a multi-request workflow, that essentially boils down to: +To address these issues, we propose a multi-request workflow, which at a high +level involves these steps: 1. Initiate an upload session. 2. Upload the file(s) as part of the upload session. @@ -136,6 +138,11 @@ All URLs described here will be relative to the root endpoint, which may be located anywhere within the url structure of a domain. So it could be at ``https://upload.example.com/``, or ``https://example.com/upload/``. +Specifically for PyPI, we propose the root URL to be +``https://upload.pypi.org/2.0``. This root URL will be considered provisional +while the feature is being tested, and will be blessed as permanent after +sufficient testing with live projects. + Versioning ---------- @@ -152,8 +159,8 @@ Endpoints Create an Upload Session ~~~~~~~~~~~~~~~~~~~~~~~~ -To create a new upload session, you can send a ``POST`` request to ``/``, -with a payload that looks like: +To create a new upload session, submit a ``POST`` request to ``/`` +(i.e. the root URL), with a payload that looks like: .. code-block:: json @@ -162,23 +169,34 @@ with a payload that looks like: "api-version": "2.0" }, "name": "foo", - "version": "1.0" + "version": "1.0", + "nonce": "" } -This currently has three keys, ``meta``, ``name``, and ``version``. +The request includes the following top-level keys: + +``meta`` (**required**) + Describes information about the payload itself. Currently, the only + defined subkey is ``api-version`` the value of which must be the string ``"2.0"``. + +``name`` (**required**) + The name of the project that this session is attempting to add files to. -The ``meta`` key is included in all payloads, and it describes information about the -payload itself. +``version`` (**required**) + The version of the project that this session is attempting to add files to. -The ``name`` key is the name of the project that this session is attempting to -add files to. +``nonce`` (**optional**) + An additional client-side string input to the :ref:`"session token" ` + algorithm. Details are provided below, but if this key is omitted, it is equivalent + to passing the empty string. -The ``version`` key is the version of the project that this session is attepmting to -add files to. -If creating the session was successful, then the server must return a response -that looks like: +Upon successful session creation, the server returns a ``201 Created`` +response. If an error occurs, the appropriate ``4xx`` code will be returned, +as described in the :ref:`session-errors` section. + +The successful response includes the following JSON content: .. code-block:: json @@ -188,9 +206,12 @@ that looks like: }, "urls": { "upload": "...", - "draft": "...", - "publish": "..." + "stage": "...", + "publish": "...", + "status": "...", + "cancel": "..." }, + "preview-token": "", "valid-for": 604800, "status": "pending", "files": {}, @@ -200,74 +221,112 @@ that looks like: } -Besides the ``meta`` key, this response has five keys, ``urls``, ``valid-for``, -``status``, ``files``, and ``notices``. +Besides the ``meta`` key, which has the same format as the request JSON, the +success response has the following keys: + +``urls`` + A dictionary mapping :ref:`"identifiers" ` to related + URLs to this session, the details of which are provided below. + +``preview-token`` + If the index supports :ref:`previewing staged releases `, this key + will contain the unique :ref:`"preview token" ` that can be provided to + installer clients in order to preview the staged release before it's published. If + the index does *not* support stage previewing, this key **MUST** be omitted. -The ``urls`` key is a dictionary mapping identifiers to related URLs to this -session. +``valid-for`` + An integer representing how long, in seconds, until the server itself will + expire this session (and thus all of the URLs contained in it). The + session **SHOULD** live at least this much longer unless the client itself + has canceled the session. Servers **MAY** choose to *increase* this time, + but should never *decrease* it, except naturally through the passage of time. -The ``valid-for`` key is an integer representing how long, in seconds, until the -server itself will expire this session (and thus all of the URLs contained in it). -The session **SHOULD** live at least this much longer unless the client itself -has canceled the session. Servers **MAY** choose to *increase* this time, but should -never *decrease* it, except naturally through the passage of time. +``status`` + A string that contains one of ``pending``, ``published``, ``error``, or + ``canceled``, this string represents the overall :ref:`status of the + session `. -The ``status`` key is a string that contains one of ``pending``, ``published``, -``errored``, or ``canceled``, this string represents the overall status of -the session. +``files`` + A mapping containing the filenames that have been uploaded to this + session, to a mapping containing details about each :ref:`file referenced + in this session `. -The ``files`` key is a mapping containing the filenames that have been uploaded -to this session, to a mapping containing details about each file. +``notices`` + An optional key that points to an array of human-readable informational + notices that the server wishes to communicate to the end user. These + notices are specific to the overall session, not to any particular file in + the session. -The ``notices`` key is an optional key that points to an array of notices that -the server wishes to communicate to the end user that are not specific to any -one file. +.. _url-identifiers: -For each filename in ``files`` the mapping has three keys, ``status``, ``url``, -and ``notices``. +For the ``urls`` key in the success JSON, the following subkeys are valid: -The ``status`` key is the same as the top level ``status`` key, except that it -indicates the status of a specific file. +``upload`` + The upload endpoint for this session to initiate :ref:`file uploads + ` for each file that will be part of this upload session. -The ``url`` key is the *absolute* URL that the client should upload that specific -file to (or use to delete that file). +``stage`` + The endpoint where this staged release can be :ref:`previewed ` prior + to publishing the session. This can be used to download and verify the not-yet-public + files. If the index does not support previewing staged releases, this key **MUST** be + omitted. -The ``notices`` key is an optional key, that is an array of notices that the server -wishes to communicate to the end user that are specific to this file. +``publish`` + The endpoint which triggers :ref:`publishing this session `. -The required response code to a successful creation of the session is a -``201 Created`` response and it **MUST** include a ``Location`` header that is the -URL for this session, which may be used to check its status or cancel it. +``status`` + The endpoint that can be used to query the :ref:`current status + ` of this session. -For the ``urls`` key, there are currently three keys that may appear: +``cancel`` + The endpoint that can be used to :ref:`cancel the session `. -The ``upload`` key, which is the upload endpoint for this session to initiate -a file upload. +.. _session-files: -The ``draft`` key, which is the repository URL that these files are available at -prior to publishing. +The ``files`` key contains a mapping from the names of the files participating +in this session to a sub-mapping with the following keys: -The ``publish`` key, which is the endpoint to trigger publishing the session. +``status`` + A string with the same values and semantics as the same-named + :ref:`session status key `, except that it indicates the + status of the specific referenced file. +``url`` + The *absolute* URL that the client should use to reference this specific file. This + URL is used to retrieve, replace or delete the referenced file. If a ``nonce`` was + provided, the URL **MUST** be obfuscated with a non-guessable token as described in + the :ref:`session token ` section. -In addition to the above, if a second session is created for the same name+version -pair, then the upload server **MUST** return the already existing session rather -than creating a new, empty one. +``notices`` + An optional key with similar format and semantics as the ``notices`` + session key, except that these notices are specific to the referenced file. +If a second session is created for the same name-version pair while an upload +session for that pair is already ``pending``, then the upload server **MUST** +return the already existing session JSON status, along with the ``200 Ok`` +status code rather than creating a new, empty session. + +If a session is created for a project which has no previous releases, then the index +**MAY** reserve the project name , however it **MUST NOT** be possible to navigate to that +project using the "regular" (i.e. :ref:`unstaged `) access protocols, +*until* the stage is published. If this first-release stage gets canceled, then the index +**SHOULD** delete the project record, as if it were never uploaded. + + +.. _file-uploads: Upload Each File ~~~~~~~~~~~~~~~~ -Once you have initiated an upload session for one or more files, then you have -to actually upload each of those files. +Once an upload session has been created, the response provides the URL you can +use to upload files into that session. There is no predetermined endpoint for +uploading files into the session; the upload URL is given to the client by the +server in the session creation response JSON. Clients **MUST NOT** assume +there is any commonality to those URLs from one session to the next. -There is no set endpoint for actually uploading the file, that is given to the -client by the server as part of the creation of the upload session, and clients -**MUST NOT** assume that there is any commonality to what those URLs look like from -one session to the next. - -To initiate a file upload, a client sends a ``POST`` request to the upload URL -in the session, with a request body that looks like: +To initiate a file upload, a client sends a ``POST`` request to the URL given +in the ``upload`` subkey of the ``urls`` key in the session creation response. +The request body has the following format: .. code-block:: json @@ -282,28 +341,35 @@ in the session, with a request body that looks like: } -Besides the standard ``meta`` key, this currently has 4 keys: +Besides the standard ``meta`` key, the request JSON has the following +additional keys: + +``filename`` + The name of the file being uploaded. + +``size`` + The size in bytes of the file that is being uploaded. -- ``filename``: The filename of the file being uploaded. -- ``size``: The size, in bytes, of the file that is being uploaded. -- ``hashes``: A mapping of hash names to hex encoded digests, each of these digests - are the digests of that file, when hashed by the hash identified in the name. +``hashes`` + A mapping of hash names to hex-encoded digests. Each of these digests are + the checksums of the file being uploaded when hashed by the algorithm + identified in the name. - By default, any hash algorithm available via `hashlib - `_ (specifically any that can - be passed to ``hashlib.new()`` and do not require additional parameters) can - be used as a key for the hashes dictionary. At least one secure algorithm from - ``hashlib.algorithms_guaranteed`` **MUST** always be included. At the time - of this PEP, ``sha256`` specifically is recommended. + By default, any hash algorithm available in `hashlib + `_ can be used as a key + for the hashes dictionary [#fn1]_. At least one secure algorithm from + ``hashlib.algorithms_guaranteed`` **MUST** always be included. At the time + of this PEP, ``sha256`` is specifically recommended. - Multiple hashes may be passed at a time, but all hashes must be valid for the - file. -- ``metadata``: An optional key that is a string containing the file's - `core metadata `_. + Multiple hashes may be passed at a time, but all hashes provided **MUST** + be valid for the file. -Servers **MAY** use the data provided in this response to do some sanity checking -prior to allowing the file to be uploaded, which may include but is not limited -to: +``metadata`` + An optional key with a string value containing the file's `core metadata + `_. + +Servers **MAY** use the data provided in this request to do some sanity checking prior to +allowing the file to be uploaded, which may include but is not limited to: - Checking if the ``filename`` already exists. - Checking if the ``size`` would invalidate some quota. @@ -313,8 +379,8 @@ If the server determines that the client should attempt the upload, it will retu a ``201 Created`` response, with an empty body, and a ``Location`` header pointing to the URL that the file itself should be uploaded to. -At this point, the status of the session should show the filename, with the above url -included in it. +At this point, the status of the session should show the filename, with the above location +URL included in it. Upload Data @@ -328,11 +394,11 @@ as that requires fewer requests and typically has better performance. However for particularly large files, uploading within a single request may result in timeouts, so larger files may need to be uploaded in multiple chunks. -In either case, the client must generate a unique token (or nonce) for each upload -attempt for a file, and **MUST** include that token in each request in the ``Upload-Token`` -header. The ``Upload-Token`` is a binary blob encoded using base64 surrounded by -a ``:`` on either side. Clients **SHOULD** use at least 32 bytes of cryptographically -random data. You can generate it using the following: +In either case, the client **MUST** generate a unique token for each upload for a file, +and **MUST** include that token in each request in the ``Upload-Token`` header. The +``Upload-Token`` is a binary blob encoded using base64 surrounded by a ``:`` on either +side. Clients **SHOULD** use at least 32 bytes of cryptographically secure data. For +example, the following algorithm can be used: .. code-block:: python @@ -341,47 +407,66 @@ random data. You can generate it using the following: header = ":" + base64.b64encode(secrets.token_bytes(32)).decode() + ":" -The one time that it is permissible to omit the ``Upload-Token`` from an upload -request is when a client wishes to opt out of the resumable or chunked file upload -feature completely. In that case, they **MAY** omit the ``Upload-Token``, and the -file must be successfully uploaded in a single HTTP request, and if it fails, the +The one time that it is permissible to omit the ``Upload-Token`` from an upload request is +when a client wishes to opt out of the resumable or chunked file upload feature +completely. In that case, they **MAY** omit the ``Upload-Token``, and the file must be +successfully uploaded in a single HTTP request. If the non-chunked upload fails, the entire file must be resent in another single HTTP request. -To upload in a single chunk, a client sends a ``POST`` request to the URL from the -session response for that filename. The client **MUST** include a ``Content-Length`` -header that is equal to the size of the file in bytes, and this **MUST** match the -size given in the original session creation. +To upload the file in a single chunk, a client sends a ``POST`` request to the +``Location`` header URL from the session response for that filename. The client **MUST** +include a ``Content-Length`` header that is equal to the size of the file in bytes, and +this **MUST** match the size given in the original session creation. As an example, if uploading a 100,000 byte file, you would send headers like:: Content-Length: 100000 Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: -If the upload completes successfully, the server **MUST** respond with a -``201 Created`` status. At this point this file **MUST** not be present in the -repository, but merely staged until the upload session has completed. +If the upload completes successfully, the server **MUST** respond with a ``201 Created`` +status. The response body has no content. + +To upload the file in multiple chunks, a client sends multiple ``POST`` requests to the +same URL as before, one for each chunk. -To upload in multiple chunks, a client sends multiple ``POST`` requests to the same -URL as before, one for each chunk. +For chunked uploads, the ``Content-Length`` is equal to the size, in bytes, of the chunk +that they are sending. The client **MUST** include a ``Upload-Offset`` header which +indicates a byte offset that the content included in this request starts at and a +``Upload-Incomplete`` header set to ``1``. For the first chunk, the ``Upload-Offset`` +header **MUST** be set to ``0``. -This time however, the ``Content-Length`` is equal to the size, in bytes, of the -chunk that they are sending. In addition, the client **MUST** include a -``Upload-Offset`` header which indicates a byte offset that the content included -in this request starts at and a ``Upload-Incomplete`` header set to ``1``. +For example, if uploading a 100,000 byte file in 1000 byte chunks,the first chunk's +headers would be: -As an example, if uploading a 100,000 byte file in 1000 byte chunks, and this chunk -represents bytes 1001 through 2000, you would send headers like:: +.. code-block:: email Content-Length: 1000 Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: - Upload-Offset: 1001 + Upload-Offset: 0 Upload-Incomplete: 1 -However, the **final** chunk of data omits the ``Upload-Incomplete`` header, since -at that point the upload is no longer incomplete. +And the second chunk represents bytes 1000 through 1999 would include the following +headers: + +.. code-block:: email + + Content-Length: 1000 + Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=: + Upload-Offset: 1000 + Upload-Incomplete: 1 + +.. _complete-the-upload: + +The final chunk of data **MUST** omit the ``Upload-Incomplete`` header, since at that +point the upload is complete. For each successful chunk, the server **MUST** respond with a ``202 Accepted`` -header, except for the final chunk, which **MUST** be a ``201 Created``. +header, except for the final chunk, which **MUST** be a ``201 Created``, and as with +non-chunked uploads, the body has not content. + +With both chunked and non-chunked uploads, once completed successfully, the file **MUST** +not be publicly visible in the repository, but merely staged until the upload session is +:ref:`completed `. The following constraints are placed on uploads regardless of whether they are single chunk or multiple chunks: @@ -391,18 +476,21 @@ single chunk or multiple chunks: **MAY** terminate any ongoing ``POST`` request that utilizes the same ``Upload-Token``. - If the offset provided in ``Upload-Offset`` is not ``0`` or the next chunk - in an incomplete upload, then the server **MUST** respond with a 409 Conflict. + in an incomplete upload, then the server **MUST** respond with a ``409 Conflict``. This + means that a client **MAY NOT** upload chunks out of order. - Once an upload has started with a specific token, you may not use another token - for that file without deleting the in progress upload. -- Once a file has uploaded successfully, you may initiate another upload for - that file, and doing so will replace that file. + for that file without deleting the in-progress upload. +- Once a file upload has completed successfully, you may initiate another upload for + that file, and doing so will replace that file. This is possible until the entire + session is completed, at which point no further file uploads (either creating or + replacing a session file) is accepted. Resume Upload +++++++++++++ To resume an upload, you first have to know how much of the data the server has -already received, regardless of if you were originally uploading the file as +already received, regardless of whether you were originally uploading the file as a single chunk, or in multiple chunks. To get the status of an individual upload, a client can make a ``HEAD`` request @@ -417,8 +505,9 @@ Once the client has retrieved the offset that they need to start from, they can upload the rest of the file as described above, either in a single request containing all of the remaining data or in multiple chunks. +.. _cancel-an-upload: -Canceling an In Progress Upload +Canceling an In-Progress Upload +++++++++++++++++++++++++++++++ If a client wishes to cancel an upload of a specific file, for instance because @@ -429,14 +518,29 @@ file in the first place. A successful cancellation request **MUST** response with a ``204 No Content``. -Delete an uploaded File -+++++++++++++++++++++++ +Delete a Partial or Fully Uploaded File ++++++++++++++++++++++++++++++++++++++++ Already uploaded files may be deleted by issuing a ``DELETE`` request to the file upload URL without the ``Upload-Token``. A successful deletion request **MUST** response with a ``204 No Content``. +Replacing a Partially or Fully Uploaded File +++++++++++++++++++++++++++++++++++++++++++++ + +To replace a session file, the file upload **MUST** have been previously completed or +deleted. It is not possible to replace a session file if the upload for that file is +incomplete. Clients have two options to replace an incomplete upload: + +- :ref:`Cancel the in-progress upload ` by issuing a ``DELETE`` of that + specific file. After this, the new file upload can be initiated. +- :ref:`Complete the in-progress upload ` by uploading a zero-length + chunk omitting the ``Upload-Incomplete`` header. This effectively truncates and + completes the in-progress upload, after which point the new upload can commence. + + +.. _session-status: Session Status ~~~~~~~~~~~~~~ @@ -451,26 +555,27 @@ they got when they initially created the upload session, except with any changes to ``status``, ``valid-for``, or updated ``files`` reflected. +.. _session-cancellation: + Session Cancellation ~~~~~~~~~~~~~~~~~~~~ -To cancel an upload session, a client issues a ``DELETE`` request to the -same session URL as before. At which point the server marks the session as -canceled, **MAY** purge any data that was uploaded as part of that session, -and future attempts to access that session URL or any of the file upload URLs -**MAY** return a ``404 Not Found``. +To cancel an upload session, a client issues a ``DELETE`` request to the same session URL +as before. The server then marks the session as canceled, **MAY** purge any data that was +uploaded as part of that session, and future attempts to access that session URL or any of +the file upload URLs **MAY** return a ``404 Not Found``. -To prevent a lot of dangling sessions, servers may also choose to cancel a -session on their own accord. It is recommended that servers expunge their -sessions after no less than a week, but each server may choose their own -schedule. +To prevent dangling sessions, servers may also choose to cancel timed-out sessions on +their own accord. It is recommended that servers expunge their sessions after no less than +a week, but each server may choose their own schedule. +.. _publish-session: Session Completion ~~~~~~~~~~~~~~~~~~ -To complete a session, and publish the files that have been included in it, -a client **MUST** send a ``POST`` request to the ``publish`` url in the +To complete a session and publish the files that have been included in it, +a client **MUST** send a ``POST`` request to the ``publish`` URL in the session status payload. If the server is able to immediately complete the session, it may do so @@ -483,11 +588,89 @@ In either case, the server should include a ``Location`` header pointing back to the session status url, and if the server returned a ``202 Accepted``, the client may poll that URL to watch for the status to change. +It is an error to publish a session that has no staged files. In this case, a +``400 Bad Request`` is turned and the session is canceled, just as if an +explicit :ref:`session cancellation ` was issued. + +.. _session-token: + +Session Token +~~~~~~~~~~~~~ + +When initiating the staged uploads, clients can provide a ``nonce``, essentially a string +with arbitrary content. The ``nonce`` is optional, and if omitted, is equivalent to +providing an empty string. + +In order to support previewing of staged uploads, the package ``name`` and ``version``, +along with this ``nonce`` are used as input into a hashing algorithm to produce a unique +"session token". This session token is valid for the life of the session (i.e., until it +is completed, either by cancellation or publishing), and can be provided to installer +clients such as ``pip`` to gain access to the staged releases. + +The use of the ``nonce`` allows clients to decide whether they want to obscure the +visibility of their staged releases or not, and there can be good reasons for either +choice. + +The `SHA256 algorithm `_ is +used to turn these inputs into a unique token, in the order ``name``, ``version``, +``nonce``, using the following Python code as an example: + +.. code-block:: python + + from hashlib import sha256 + + def gentoken(name: bytes, version: bytes, nonce: bytes = b''): + h = sha256() + h.update(name) + h.update(version) + h.update(nonce) + return h.hexdigest() + +It should be evident that if no ``nonce`` is provided in the session initiation request, +then the preview token is easily guessable from the package name and version number alone. +Clients can elect to omit the ``nonce`` (or set it to the empty string themselves) if they +want to allow previewing from anybody without access to the preview token. By providing a +non-empty ``nonce``, clients can elect for security-through-obscurity, but this does not +protect staged files behind any kind of authentication. + +.. _staged-preview: + +Stage Previews +~~~~~~~~~~~~~~ + +The ability to preview staged releases before they are published is an important feature, +enabling an additional level of last-mile testing before the release is available to the +public. Indexes **MAY** provide this functionality in one or both of the following ways. + +* Through the URL provided in the ``stage`` subkey of the :ref:`URL + identifiers ` returned when the session is created. The + ``stage`` URL can be passed to installers such as ``pip`` by setting the + `--extra-index-url + `_ + flag to this value. Multiple stages can even be previewed by repeating this + flag with multiple values. + +* By passing the ``Stage-Token`` header to the `Simple Repository API + `_ + requests or the :pep:`691` JSON-based Simple API, with the value from the + ``preview-token`` subkey of the JSON response to the session creation + request. Multiple ``Stage-Token`` headers are allowed. It is recommended + that installers add a ``--staged `` or similarly named option to set + the ``Stage-Token`` header at the command line. + +In both cases, the index will return views that expose the staged releases to the +installer tool, making them available to download and install into a virtual environment +built for that last-mile testing. The former option allows for existing installers to +preview staged releases with no changes, although perhaps in a less user-friendly way. +The latter option can be a better user experience, but the details of this are left to +installer tool maintainers to decide. + +.. _session-errors: Errors ------ -All Error responses that contain a body will have a body that looks like: +All error responses that contain content will have a body that looks like: .. code-block:: json @@ -504,22 +687,22 @@ All Error responses that contain a body will have a body that looks like: ] } -Besides the standard ``meta`` key, this has two top level keys, ``message`` -and ``errors``. +Besides the standard ``meta`` key, this has the following top level keys: -The ``message`` key is a singular message that encapsulates all errors that -may have happened on this request. +``message`` + A singular message that encapsulates all errors that may have happened on this + request. -The ``errors`` key is an array of specific errors, each of which contains -a ``source`` key, which is a string that indicates what the source of the -error is, and a ``message`` key for that specific error. +``errors`` + An array of specific errors, each of which contains a ``source`` key, which is a + string that indicates what the source of the error is, and a ``message`` key for that + specific error. The ``message`` and ``source`` strings do not have any specific meaning, and -are intended for human interpretation to figure out what the underlying issue -was. +are intended for human interpretation to aid in diagnosing underlying issue. -Content-Types +Content Types ------------- Like :pep:`691`, this PEP proposes that all requests and responses from the @@ -542,7 +725,7 @@ Unlike :pep:`691`, this PEP does not change the existing ``1.0`` API in any way, so servers will be required to host the new API described in this PEP at a different endpoint than the existing upload API. -Which means that for the new 2.0 API, the content types would be: +Thus for the new 2.0 API, the content type would be: - **JSON:** ``application/vnd.pypi.upload.v2+json`` @@ -553,15 +736,15 @@ that clients be explicit about what versions they support. These content types **DO NOT** apply to the file uploads themselves, only to the other API requests/responses in the upload API. The files themselves should use -the ``application/octet-stream`` content-type. +the ``application/octet-stream`` content type. Version + Format Selection -------------------------- -Again similar to :pep:`691`, this PEP standardizes on using server-driven +Again, similar to :pep:`691`, this PEP standardizes on using server-driven content negotiation to allow clients to request different versions or -serialization formats, which includes the ``format`` url parameter. +serialization formats, which includes the ``format`` URL parameter. Since this PEP expects the existing legacy ``1.0`` upload API to exist at a different endpoint, and it currently only provides for JSON serialization, this @@ -630,7 +813,7 @@ Multipart Uploads vs tus ------------------------ This PEP currently bases the actual uploading of files on an internet draft -from tus.io that supports resumable file uploads. +from ``tus.io`` that supports resumable file uploads. That protocol requires a few things: @@ -654,7 +837,7 @@ The other benefit is that even if you do want to support resumption, you can still just ``POST`` the file, and unless you *need* to resume the download, that's all you have to do. -Another, possibly theoretical, benefit is that for hashing the uploaded files, +Another, possibly theoretical benefit is that for hashing the uploaded files, the serial chunks requirement means that the server can maintain hashing state between requests, update it for each request, then write that file back to storage. Unfortunately this isn't actually possible to do with Python's hashlib, @@ -715,7 +898,7 @@ It does have its own downsides: - See above about whether this is actually a downside in practice, or if it's just in theory. -I lean towards the tus style resumable uploads as I think they're simpler +I lean towards the ``tus`` style resumable uploads as I think they're simpler to use and to implement, and the main downside is that we possibly leave some multi-threaded performance on the table, which I think that I'm personally fine with? @@ -725,6 +908,13 @@ you don't have to try and do any sort of protection against parallel uploads, since they're just supported. That alone might erase most of the server side implementation simplification. +Footnotes +========= +.. [#fn1] Specifically any hash algorithm name that `can be passed to + `_ + ``hashlib.new()`` which does not require additional parameters. + + Copyright =========