From dd84771563ed80e264787515641657e845326a5c Mon Sep 17 00:00:00 2001 From: hannahhoward Date: Mon, 17 Apr 2023 14:29:16 +0200 Subject: [PATCH 01/18] feat(http-gateways): add trustless parameters Add specification for queries for verifiable CAR requests --- src/http-gateways/path-gateway.md | 29 +++++++++++++++++++++++++- src/http-gateways/trustless-gateway.md | 20 ++++++++++++------ 2 files changed, 42 insertions(+), 7 deletions(-) diff --git a/src/http-gateways/path-gateway.md b/src/http-gateways/path-gateway.md index 6fbbea06b..77ba99116 100644 --- a/src/http-gateways/path-gateway.md +++ b/src/http-gateways/path-gateway.md @@ -205,6 +205,30 @@ This is a URL-friendly alternative to sending an [`Accept`](#accept-request-head - `format=json` → `Accept: application/json` - `format=cbor` → `Accept: application/cbor` +## Query Parameters for CAR Requests + +The following query parameters are only available for requests made with either a `format=car` query parameter or an `Accept: application/vnd.ipld.car` request header. These parameters modify shape of the IPLD graph returned within the car file + +### `car-scope` (request query parameter) + +Optional, `car-scope=(block|file|all)` with default value 'all', describes the shape of the dag fetched the terminus of the specified path whose blocks are included in the returned CAR file after the blocks required to traverse path segments. + +`block` - Only the root block at the end of the path is returned After blocks required to verify the specified path segments. + +`file` - For queries that traverse UnixFS data, `file` roughly means return blocks needed to verify the end of the path as a filesystem entity. In other words, all the blocks needed 'cat' a UnixFS file at the end of the specified path, or to 'ls' a UnixFS directory at the end of the specified path. For all queries that do not reference non-UnixFS data, `file` is equivalent to `block` + +`all` - Transmit the entire contiguous DAG that begins at the end of the path query, after blocks required to verify path segments + +### `bytes` (request query parameter) + +Optional, `bytes=x:y` with default value `0:*`. When the entity at the end of the end of the specified path can be intepreted as a contingous array of bytes (such as a UnixFS file), returns only the blocks required to verify the specified byte range of said entity. Put another way, the `bytes` parameters can serve as a trustless form of an HTTP range request. If the entity at the end of the path cannot be interpreted as a continguous array of bytes (such as a CBOR/JSON map), this parameter has no effect. Allowed values for `x` and `y` are positive integers where y >= x, which limit the return blocks to needed to satify the range [x, y]. In addition the following additional values are permitted: + +- `*` can be substituted for end-of-file + - `?bytes=0:*` is the entire file (i.e. to fulfill HTTP Range Request `x-` requests) +- Negative numbers can be used for referring to bytes from the end of a file + - `?bytes=-1024:*` is the last 1024 bytes of a file (i.e. to fulfill HTTP Range Request `-y` requests) + - It is also permissible (unlike with HTTP Range Requests) to ask for the range of 500 bytes from the beginning of the file to 1000 bytes from the end by `?bytes=499:-1000` + + +### Security + +This IPIP allows clients to narrow down the amount of data returned as a CAR, +and introduces a need for defensive programming when the feature set of the +remote gateway is unknown. + +To avoid denial of service, and resource starvation, clients should probe if +the gateway supports features described in this IPIP before requesting data, to +avoid fetching big DAGs when only a small subset of blocks is expected. + +Following the robustness principle, invalid, duplicate or unexpected blocks +should be discarded. + +### Alternatives + +Below are alternate designs that were considered, but decided to be out of scope for this IPIP. + +#### Arbitrary IPLD Selectors + +Passing arbitrary selectors was rejected due to the implementation complexity, +risks, and weak value proposition, as [discussed during IPFS Thing 2022](https://github.com/ipfs/specs/issues/348#issuecomment-1326869509) + +#### Additional "Web" Scope + +A request for +`/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&car-scope=file` +returns all blocks required for enumeration of the big HAMT `/wiki` directory, +and then an additional request for `index.html` needs to be issued. + +Website hosting use case could be made more efficient if gateway returned a CAR +with `index.html` instead of all blocks for directory enumeration. The server +already did the work: it knows the entity is a directory, already parsed it, it +knows it has child entity named `index.html`, and everyone would pay a lower cost due +to lower number of blocks being returned in a single round-trip, instead of two. + +Rhea/Saturn projects requested this to be out of scope for now, but this "web" +entity scope could be added in the future, as a follow-up optimiziation IPIP. + +#### Requesting specific DAG depth + +Blindly requesting specific DAG depth did not translate to any type of +requests web gateways like `ipfs.io` or one in Brave browser have to handle. + +It is impossible to know if some entity on a sub-path is a file or a directory, +without sending a probe for the root block, which introduces one round-trip overhead +per entity. + +This problem is not present in the case of `car-scope=file`, which shifts the +decision to the server, and allows for fetching unknown UnixFS entity with a +single request. + +## Test fixtures + + + +### Testing pathing + +The main utility of this scope is saving round-trips when retrieving a specific +entity as a member of a bigger DAG. + +To test, request a small file that fits in a single block from a sub-path. The +returned CAR MUST include both the block with the `file` data and blocks +necessary for traversing from the root CID to the terminating element (all +parents, root CID and a subdirectory below it). + +Fixtures: + +:::example + +- TODO(gateway-conformance): `/ipfs/dag-pb-cid/parent/file?format=car` (UnixFS file in a subdirectory) + +- TODO(gateway-conformance): `/ipfs/dag-pb-cid/hamt-parent1/file?format=car` (UnixFS file on a path within HAMT-sharded parent directory) + +- TODO(gateway-conformance): `/ipfs/dag-cbor-cid/file?format=car` (UnixFS file on a path with DAG-CBOR root CID) + +::: + +### Testing `car-scope=block` + +The main utility of this scope is resolving content paths. This means a CAR +response with blocks related to path traversal, and the root block of the +terminating entity. + +To test real world use, request UnixFS `file` or a `directory` from a sub-path. +The returned CAR MUST include blocks required for path traversal and ONLY the +root block of the terminating entity. + +Fixtures: + +:::example + +- TODO(gateway-conformance): `/ipfs/cid/parent/directory?format=car&car-scope=block` (UnixFS directory on a path) + +- TODO(gateway-conformance): `/ipfs/cid/parent1/parent2/file?format=car&car-scope=block` (UnixFS file on a path) + +::: + +### Testing `car-scope=file` + +The main utility of this scope is retrieving all blocks related to a meaningful +IPLD entity. Currently, the most popular entity types are: + +- UnixFS `file` + (blocks for all chunks with file data) + +- UnixFS `directory` + (blocks for the directory node, allowing its enumeration; + no root blocks for any of the child entities). + +- `raw` / `dag-cbor` + (block with raw data or DAG-CBOR document, potentially linking to other CIDs) + +Fixtures: + +:::example + +- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&car-scope=file` + - Request a `chunked-dag-pb-file` (UnixFS file encoded with `dag-pb` with + more than one chunk). Returned blocks MUST be enough to deserialize the file. + +- TODO(gateway-conformance): `/ipfs/cid/dag-cbor-with-link?format=car&car-scope=file` + - Request a `dag-cbor-with-link` (DAG-CBOR document with CBOR Tag 42 pointing + at a third-party CID). The response MUST include the terminating entity (DAG-CBOR) + and MUST NOT include the CID from the Tag 42 (IPLD Link). + +- TODO(gateway-conformance): `/ipfs/cid/flat-directory/file?format=car&car-scope=file` + - Request UnixFS `flat-directory`. The response MUST include the minimal set of + blocks required for enumeration of directory contents, and no blocks that + belong to child entities. + +- TODO(gateway-conformance): `/ipfs/cid/hamt-directory/file?format=car&car-scope=file` + - Request UnixFS `hamt-directory`. The response MUST include the minimal set of + blocks required for enumeration of directory contents, and no blocks that + belong to child entities. + +::: + +### Testing `car-scope=all` + +This is the implicit default used when `car-scope` is not present, +and explicitly used in the context of proxy gateway supporting :cite[ipip-0288]. + +Fixtures: + +:::example + +- TODO(gateway-conformance): `/ipfs/cid-of-a-directory?format=car&car-scope=all` + - Request a CID of UnixFS `directory` which contains two files. The response MUST + contain all blocks that can be accessed by recursively traversing all IPLD + Links from the root CID. + +- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&car-scope=all` + - Request a CID of UnixFS `file` encoded with `dag-pb` codec and more than + one chunk. The response MUST contain blocks for all `file` chunks. + +::: + +### Testing `bytes=from:to` + +This type of CAR response is used for facilitating HTTP Range Requests and +byte seek within bigger entities. + +:::warning + +Properly testing this type of response requires synthetic DAG that is only +partially retrievable. This ensures systems that perform internal caching +won't pass the test due to the entire DAG being cached. + +::: + +Use of the below fixture is highly recommended: + +:::example + +- TODO(gateway-conformance): `/ipfs/dag-pb-file?format=car&bytes=40000000000-40000000002` + + - Request a byte range from the middle of a big UnixFS `file`. The response MUST + contain only the minimal set of blocks necessary for fullfilling the range + request. + +::: + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 1fc89151ebd1c30de099af799beef6f66f8d1134 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 12 May 2023 00:33:09 +0200 Subject: [PATCH 05/18] =?UTF-8?q?ipip-0402:=20car-scope=3Dfile=20=E2=86=92?= =?UTF-8?q?=20dag-scope=3Dentity=20&=20bytes=20=E2=86=92=20entity-bytes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change incorporates feedback from Adin, Rod and Juan: - bytes: https://github.com/ipfs/specs/pull/402#pullrequestreview-1398599103 - car-scope: https://github.com/ipfs/specs/pull/402#discussion_r1181360902 I really hope these names will be good enough, but I am running on artisan, recycled electrons so can do this all day :-) --- src/http-gateways/path-gateway.md | 30 ++---------- src/http-gateways/trustless-gateway.md | 59 +++++++++++++++++++++- src/ipips/ipip-0402.md | 68 ++++++++++++++++---------- 3 files changed, 103 insertions(+), 54 deletions(-) diff --git a/src/http-gateways/path-gateway.md b/src/http-gateways/path-gateway.md index 676240fc7..621f493eb 100644 --- a/src/http-gateways/path-gateway.md +++ b/src/http-gateways/path-gateway.md @@ -214,35 +214,13 @@ These are the equivalents: - `format=cbor` → `Accept: application/cbor` - `format=ipns-record` → `Accept: application/vnd.ipfs.ipns-record` -## Query Parameters for CAR Requests +### `dag-scope` (request query parameter) -The following query parameters are only available for requests made with either a `format=car` query parameter or an `Accept: application/vnd.ipld.car` request header. These parameters modify shape of the IPLD graph returned within the car file. +Only used on CAR requests, same as [dag-scope](/http-gateways/trustless-gateway/#dag-scope-request-query-parameter) from :cite[trustless-gateway] -### `car-scope` (request query parameter) +### `entity-bytes` (request query parameter) -Optional, `car-scope=(block|file|all)` with default value 'all', describes the shape of the dag fetched the terminus of the specified path whose blocks are included in the returned CAR file after the blocks required to traverse path segments. - -`block` - Only the root block at the end of the path is returned After blocks required to verify the specified path segments. - -`file` - For queries that traverse UnixFS data, `file` roughly means return blocks needed to verify the end of the path as a filesystem entity. In other words, all the blocks needed to 'cat' a UnixFS file at the end of the specified path, or to 'ls' a UnixFS directory at the end of the specified path. For all queries that do not reference non-UnixFS data, `file` is equivalent to `block` - -`all` - Transmit the entire contiguous DAG that begins at the end of the path query, after blocks required to verify path segments - -### `bytes` (request query parameter) - -Optional, `bytes=x:y` with default value `0:*`. When the entity at the end of the specified path can be intepreted as a contingous array of bytes (such as a UnixFS file), returns only the blocks required to verify the specified byte range of said entity. Put another way, the `bytes` parameters can serve as a trustless form of an HTTP range request. If the entity at the end of the path cannot be interpreted as a continguous array of bytes (such as a CBOR/JSON map), this parameter has no effect. Allowed values for `x` and `y` are positive integers where y >= x, which limit the return blocks to needed to satify the range [x, y]. In addition the following additional values are permitted: - -- `*` can be substituted for end-of-file - - `?bytes=0:*` is the entire file (i.e. to fulfill HTTP Range Request `x-` requests) -- Negative numbers can be used for referring to bytes from the end of a file - - `?bytes=-1024:*` is the last 1024 bytes of a file (i.e. to fulfill HTTP Range Request `-y` requests) - - It is also permissible (unlike with HTTP Range Requests) to ask for the range of 500 bytes from the beginning of the file to 1000 bytes from the end by `?bytes=499:-1000` - - +Only used on CAR requests, same as [entity-bytes](/http-gateways/trustless-gateway/#entity-bytes-request-query-parameter) from :cite[trustless-gateway] # HTTP Response diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index ae8ad3b92..122a0427e 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -59,7 +59,7 @@ Same as GET, but does not return any payload. Same as in :cite[path-gateway], but with limited number of supported response types. -## HTTP Request Headers +## Request Headers ### `Accept` (request header) @@ -74,6 +74,63 @@ Below response types MUST to be supported: - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) – disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) – requests a verifiable :cite[ipns-record] (multicodec `0x0300`). +## Request Query Parameters + +### :dfn[dag-scope] (request query parameter) + +Optional, `dag-scope=(block|entity|all)` with default value `all`, only available for CAR requests. + +Describes the shape of the DAG fetched the terminus of the specified path whose blocks +are included in the returned CAR file after the blocks required to traverse +path segments. + +- `block` - Only the root block at the end of the path is returned after blocks + required to verify the specified path segments. + +- `entity` - For queries that traverse UnixFS data, `entity` roughly means return + blocks needed to verify the terminating element of the requested content path. + For UnixFS, all the blocks needed to read an entire UnixFS file, or enumerate a UnixFS directory. + For all queries that reference non-UnixFS data, `entity` is equivalent to `block` + +- `all` - Transmit the entire contiguous DAG that begins at the end of the path + query, after blocks required to verify path segments + +When present, returned `Etag` must include unique prefix based on the passed scope type. + +### :dfn[entity-bytes] (request query parameter) + +Optional, `entity-bytes=from:to` with the default value `0:*`, only available for CAR requests. +Serves as a trustless form of an HTTP Range Request. + +When the terminating entity at the end of the specified content path can be +interpreted as a continuous array of bytes (such as a UnixFS file), returns +only the minimal set of blocks required to verify the specified byte range of +said entity. + +Allowed values for `from` and `to` are positive integers where `to` >= `from`, which +limit the return blocks to needed to satisfy the range `[from,to]`: + +- `from` value gives the byte-offset of the first byte in a range. +- `to` value gives the byte-offset of the last byte in the range; that is, +the byte positions specified are inclusive. Byte offsets start at zero. + +If the entity at the end of the path cannot be interpreted as a continuous +array of bytes (such as a DAG-CBOR/JSON map, or UnixFS directory), this +parameter has no effect. + +The following additional values are supported: + +- `*` can be substituted for end-of-file + - `entity-bytes=0:*` is the entire file (a verifiable version of HTTP request for `Range: 0-`) +- Negative numbers can be used for referring to bytes from the end of a file + - `entity-bytes=-1024:*` is the last 1024 bytes of a file + (verifiable version of HTTP request for `Range: -1024`) + - It is also permissible (unlike with HTTP Range Requests) to ask for the + range of 500 bytes from the beginning of the file to 1000 bytes from the + end: `entity-bytes=499:-1000` + +When present, returned `Etag` must include unique prefix based on the passed range. + # HTTP Response Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway]. diff --git a/src/ipips/ipip-0402.md b/src/ipips/ipip-0402.md index ac4f69724..451777f9b 100644 --- a/src/ipips/ipip-0402.md +++ b/src/ipips/ipip-0402.md @@ -5,6 +5,10 @@ ipip: proposal editors: - name: Hannah Howard github: hannahhoward + - name: Adin Schmahmann + github: aschmahmann + - name: Rod Vagg + github: rvagg - name: Marcin Rataj github: lidel url: https://lidel.org/ @@ -39,11 +43,15 @@ Save round-trips, allow more efficient resume and parallel downloads. The solution is to allow the :cite[trustless-gateway] to support partial responses by: + - allowing for requesting sub-paths within a DAG, and getting blocks necessary for traversing all path segments for end-to-end verification -- opt-in `car-scope` parameter that allows for narrowing down returned blocks - to a `block`, `file` (aka logical IPLD entity), or `all` (default) -- opt-in `bytes` parameter that allows for returning only a subset of blocks + +- opt-in `dag-scope` parameter that allows for narrowing down returned blocks + to a `block`, `entity` (a logical IPLD entity, such as a file, directory, + CBOR document), or `all` (default) + +- opt-in `entity-bytes` parameter that allows for returning only a subset of blocks within a logical IPLD entity Details are in :cite[trustless-gateway]. @@ -66,14 +74,15 @@ Terse rationale for each feature: - The ability to narrow down CAR response based on logical scope or specific byte range within an entity comes directly from the types of requests existing path gateways need to handle. - - `car-scope=block` allows for resolving content paths to the final CID, and + - `dag-scope=block` allows for resolving content paths to the final CID, and learn its type (unixfs file/directory, or a custom codec) - - `car-scope=file` covers the majority of website hosting needs (returning a - file, or enumerating directory contents) - - `car-scope=all` returns all blocks in a DAG: was the existing behavior and + - `dag-scope=entity` covers the majority of website hosting needs (returning a + file, enumerating directory contents, or any other IPLD entity) + - `dag-scope=all` returns all blocks in a DAG: was the existing behavior and remains the implicit default - - `bytes=from:to` enables efficient, verifiable analog to HTTP Range Requests + - `entity-bytes=from:to` enables efficient, verifiable analog to HTTP Range Requests (resuming downloads or seeking within bigger files, such as videos) + - `from` and `to` match the behavior of HTTP Range Requests. ### User benefit @@ -121,7 +130,7 @@ introduce additional blocks required for verifying. As long the client was written in a trustless manner, and follows ring and was discarding unexpected blocks, this will be a backward-compatible change. -#### CAR format with `bytes` and `car-scope` parameters +#### CAR format with `entity-bytes` and `dag-scope` parameters These parameters are opt-in, which means no breaking changes. @@ -159,7 +168,7 @@ risks, and weak value proposition, as [discussed during IPFS Thing 2022](https:/ #### Additional "Web" Scope A request for -`/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&car-scope=file` +`/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&dag-scope=entity` returns all blocks required for enumeration of the big HAMT `/wiki` directory, and then an additional request for `index.html` needs to be issued. @@ -181,7 +190,7 @@ It is impossible to know if some entity on a sub-path is a file or a directory, without sending a probe for the root block, which introduces one round-trip overhead per entity. -This problem is not present in the case of `car-scope=file`, which shifts the +This problem is not present in the case of `dag-scope=entity`, which shifts the decision to the server, and allows for fetching unknown UnixFS entity with a single request. @@ -197,7 +206,7 @@ The main utility of this scope is saving round-trips when retrieving a specific entity as a member of a bigger DAG. To test, request a small file that fits in a single block from a sub-path. The -returned CAR MUST include both the block with the `file` data and blocks +returned CAR MUST include both the block with the file data and all blocks necessary for traversing from the root CID to the terminating element (all parents, root CID and a subdirectory below it). @@ -213,7 +222,7 @@ Fixtures: ::: -### Testing `car-scope=block` +### Testing `dag-scope=block` The main utility of this scope is resolving content paths. This means a CAR response with blocks related to path traversal, and the root block of the @@ -227,13 +236,13 @@ Fixtures: :::example -- TODO(gateway-conformance): `/ipfs/cid/parent/directory?format=car&car-scope=block` (UnixFS directory on a path) +- TODO(gateway-conformance): `/ipfs/cid/parent/directory?format=car&dag-scope=block` (UnixFS directory on a path) -- TODO(gateway-conformance): `/ipfs/cid/parent1/parent2/file?format=car&car-scope=block` (UnixFS file on a path) +- TODO(gateway-conformance): `/ipfs/cid/parent1/parent2/file?format=car&dag-scope=block` (UnixFS file on a path) ::: -### Testing `car-scope=file` +### Testing `dag-scope=entity` The main utility of this scope is retrieving all blocks related to a meaningful IPLD entity. Currently, the most popular entity types are: @@ -252,48 +261,48 @@ Fixtures: :::example -- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&car-scope=file` +- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&dag-scope=entity` - Request a `chunked-dag-pb-file` (UnixFS file encoded with `dag-pb` with more than one chunk). Returned blocks MUST be enough to deserialize the file. -- TODO(gateway-conformance): `/ipfs/cid/dag-cbor-with-link?format=car&car-scope=file` +- TODO(gateway-conformance): `/ipfs/cid/dag-cbor-with-link?format=car&dag-scope=entity` - Request a `dag-cbor-with-link` (DAG-CBOR document with CBOR Tag 42 pointing at a third-party CID). The response MUST include the terminating entity (DAG-CBOR) and MUST NOT include the CID from the Tag 42 (IPLD Link). -- TODO(gateway-conformance): `/ipfs/cid/flat-directory/file?format=car&car-scope=file` +- TODO(gateway-conformance): `/ipfs/cid/flat-directory/file?format=car&dag-scope=entity` - Request UnixFS `flat-directory`. The response MUST include the minimal set of blocks required for enumeration of directory contents, and no blocks that belong to child entities. -- TODO(gateway-conformance): `/ipfs/cid/hamt-directory/file?format=car&car-scope=file` +- TODO(gateway-conformance): `/ipfs/cid/hamt-directory/file?format=car&dag-scope=entity` - Request UnixFS `hamt-directory`. The response MUST include the minimal set of blocks required for enumeration of directory contents, and no blocks that belong to child entities. ::: -### Testing `car-scope=all` +### Testing `dag-scope=all` -This is the implicit default used when `car-scope` is not present, +This is the implicit default used when `dag-scope` is not present, and explicitly used in the context of proxy gateway supporting :cite[ipip-0288]. Fixtures: :::example -- TODO(gateway-conformance): `/ipfs/cid-of-a-directory?format=car&car-scope=all` +- TODO(gateway-conformance): `/ipfs/cid-of-a-directory?format=car&dag-scope=all` - Request a CID of UnixFS `directory` which contains two files. The response MUST contain all blocks that can be accessed by recursively traversing all IPLD Links from the root CID. -- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&car-scope=all` +- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&dag-scope=all` - Request a CID of UnixFS `file` encoded with `dag-pb` codec and more than one chunk. The response MUST contain blocks for all `file` chunks. ::: -### Testing `bytes=from:to` +### Testing `entity-bytes=from:to` This type of CAR response is used for facilitating HTTP Range Requests and byte seek within bigger entities. @@ -302,7 +311,7 @@ byte seek within bigger entities. Properly testing this type of response requires synthetic DAG that is only partially retrievable. This ensures systems that perform internal caching -won't pass the test due to the entire DAG being cached. +won't pass the test due to the entire DAG being precached, or fetched in full. ::: @@ -310,12 +319,17 @@ Use of the below fixture is highly recommended: :::example -- TODO(gateway-conformance): `/ipfs/dag-pb-file?format=car&bytes=40000000000-40000000002` +- TODO(gateway-conformance): `/ipfs/dag-pb-file?format=car&entity-bytes=40000000000-40000000002` - Request a byte range from the middle of a big UnixFS `file`. The response MUST contain only the minimal set of blocks necessary for fullfilling the range request. +- TODO(gateway-conformance): `/ipfs/10-bytes-cid?format=car&entity-bytes=4:-2` + + - Request a byte range from the middle of a small file, to -2 bytes from the end. + - (TODO confirm we want keep this -- added since it was explicitly stated as a supported thing in path-gateway.md) + ::: ### Copyright From 98191b5fec841d6ed67975efc457744373901b66 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 12 May 2023 00:50:54 +0200 Subject: [PATCH 06/18] ipip-402: remove mentions of ordered CARs for now We have no spec for signaling this, we may add it as opt-in later https://github.com/ipfs/specs/pull/402#discussion_r1174223244 https://github.com/ipfs/specs/pull/402#issuecomment-1544453059 --- src/http-gateways/path-gateway.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/http-gateways/path-gateway.md b/src/http-gateways/path-gateway.md index 621f493eb..efb6f5884 100644 --- a/src/http-gateways/path-gateway.md +++ b/src/http-gateways/path-gateway.md @@ -578,10 +578,11 @@ The following response types require an explicit opt-in, can only be requested w - Raw Block (`?format=raw`) - Opaque bytes, see [application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw). - CAR (`?format=car`) - - A CAR file or a stream that contains all blocks required to trustless verify the expressed query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car). - - Must contain, as the very first block, the block that hashes to the content root CID - - Must contain, immediately following the root block, all blocks encountered while traversing the expressed path in the order they were traversed - - Must contain, immediately following traversed path blocks, appropriate blocks in depth first traversal order required to verify the query expressed at the terminus of the path in [query parameters](#query-parameters-for-car-requests) + - A CAR file or a stream that contains all blocks required to trustlessly verify the requested content path query, see [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) and :cite[trustless-gateway]. + - **Note:** by default, block order in CAR response is not deterministic, + blocks can be returned in different order, depending on implementation + choices (traversal, speed at which blocks arrive from the network, etc). + An opt-in ordered CAR responses MAY be introduced in a future IPIP. - TAR (`?format=tar`) - Deserialized UnixFS files and directories as a TAR file or a stream, see :cite[ipip-0288]. - IPNS Record From a6bfb2cd69b5ff76434ce27d67ff697ec4dbf68d Mon Sep 17 00:00:00 2001 From: Henrique Dias Date: Wed, 17 May 2023 09:50:34 +0200 Subject: [PATCH 07/18] docs: small editorial fixes --- src/http-gateways/path-gateway.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/http-gateways/path-gateway.md b/src/http-gateways/path-gateway.md index efb6f5884..0a7814ba5 100644 --- a/src/http-gateways/path-gateway.md +++ b/src/http-gateways/path-gateway.md @@ -21,6 +21,7 @@ editors: url: https://hacdias.com/ xref: - url + - trustless-gateway tags: ['httpGateways', 'lowLevelHttpGateways'] order: 0 --- @@ -216,11 +217,11 @@ These are the equivalents: ### `dag-scope` (request query parameter) -Only used on CAR requests, same as [dag-scope](/http-gateways/trustless-gateway/#dag-scope-request-query-parameter) from :cite[trustless-gateway] +Only used on CAR requests, same as :ref[dag-scope] from :cite[trustless-gateway]. ### `entity-bytes` (request query parameter) -Only used on CAR requests, same as [entity-bytes](/http-gateways/trustless-gateway/#entity-bytes-request-query-parameter) from :cite[trustless-gateway] +Only used on CAR requests, same as :ref[entity-bytes] from :cite[trustless-gateway]. # HTTP Response From bfb5d3cf1d51225f0748c32e672915169d9eae4f Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 6 Jun 2023 17:27:51 +0200 Subject: [PATCH 08/18] ipip-402: wip clarify car roots expectations Problem identified in https://github.com/ipfs/gateway-conformance/pull/56#discussion_r1219591806 --- src/http-gateways/trustless-gateway.md | 70 ++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 122a0427e..c2a2737c4 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -147,3 +147,73 @@ For example: `Content-Type: application/vnd.ipld.car; version=1` ### `Content-Disposition` (response header) MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser. + +## HTTP Response Payload + +### Block Response + +An opaque bytes matching the requested block CID +([application/vnd.ipld.raw](https://www.iana.org/assignments/media-types/application/vnd.ipld.raw)). + +The Body hash MUST match the Multihash from the requested CID. + +### CAR Response + +A CAR stream +([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) +for the requested content type, path and optional `dag-scope` and `entity-bytes` URL parameters. + +:::note + +By default, block order in CAR response is not deterministic, blocks can +be returned in different order, depending on implementation choices (traversal, +speed at which blocks arrive from the network, etc). An opt-in ordered CAR +responses MAY be introduced in a future, see [IPIP-412](https://github.com/ipfs/specs/pull/412). + +::: + +#### CAR version + +Value returned in `CarV1Header.version` struct MUST match the `version` +parameter returned in `Content-Type` header + +#### CAR roots + +:::issue + +TODO: we need to specify expectations about what should be returned in +[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header). + +##### Option A: always empty + +If the response uses version 1 or 2 of the CAR spec, the +[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct +MUST be empty. + +##### Option B: only CID of the terminating element + +If the response uses version 1 or 2 of the CAR spec, the +[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct +MUST contain CID of the terminating entity. + +##### Option C: only CIDs of fully returned DAGs + +If the response uses version 1 or 2 of the CAR spec, the +[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct +MUST be either empty, or only contain CIDs of complete DAGs present in the response. + +CIDs from partial DAGs, such as parent nodes on the path, or terminating +element returned with `dag-scope=block`, or UnixFS directory returned with +`dag-scope=entity` MUST never be returned in the `CarV1Header.roots` list, as +they may cause overfetching on systems that perform recursive pinning of DAGs +listed in `CarV1Header.roots`. + +##### Option D: CIDs for all logical path segments (same as X-Ipfs-Roots) + +If the response uses version 1 or 2 of the CAR spec, the +[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct +MUST contain all the logical roots related to the requested content path. + +The CIDs here MUST be the same as ones in `X-Ipfs-Roots` header. + +::: From 0639715a6f895c710a682faf00532a21dcb3beff Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 6 Jun 2023 19:04:44 +0200 Subject: [PATCH 09/18] ipip-402: clarify CarV1Header.roots Rationale: https://github.com/ipfs/specs/pull/402#discussion_r1219974718 --- src/http-gateways/trustless-gateway.md | 47 ++++---------------------- 1 file changed, 7 insertions(+), 40 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index c2a2737c4..fd31c9668 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -135,7 +135,7 @@ When present, returned `Etag` must include unique prefix based on the passed ran Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway]. -## HTTP Response Headers +## Response Headers ### `Content-Type` (response header) @@ -148,7 +148,7 @@ For example: `Content-Type: application/vnd.ipld.car; version=1` MUST be returned and set to `attachment` to ensure requested bytes are not rendered by a web browser. -## HTTP Response Payload +## Response Payload ### Block Response @@ -174,46 +174,13 @@ responses MAY be introduced in a future, see [IPIP-412](https://github.com/ipfs/ #### CAR version -Value returned in `CarV1Header.version` struct MUST match the `version` -parameter returned in `Content-Type` header +Value returned in +[`CarV1Header.version`](https://ipld.io/specs/transport/car/carv1/#header) +field MUST match the `version` parameter returned in `Content-Type` header. #### CAR roots -:::issue - -TODO: we need to specify expectations about what should be returned in -[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header). - -##### Option A: always empty - -If the response uses version 1 or 2 of the CAR spec, the -[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct -MUST be empty. - -##### Option B: only CID of the terminating element - If the response uses version 1 or 2 of the CAR spec, the -[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct -MUST contain CID of the terminating entity. +[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field +MUST contain the CID of the terminating entity. -##### Option C: only CIDs of fully returned DAGs - -If the response uses version 1 or 2 of the CAR spec, the -[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct -MUST be either empty, or only contain CIDs of complete DAGs present in the response. - -CIDs from partial DAGs, such as parent nodes on the path, or terminating -element returned with `dag-scope=block`, or UnixFS directory returned with -`dag-scope=entity` MUST never be returned in the `CarV1Header.roots` list, as -they may cause overfetching on systems that perform recursive pinning of DAGs -listed in `CarV1Header.roots`. - -##### Option D: CIDs for all logical path segments (same as X-Ipfs-Roots) - -If the response uses version 1 or 2 of the CAR spec, the -[`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) struct -MUST contain all the logical roots related to the requested content path. - -The CIDs here MUST be the same as ones in `X-Ipfs-Roots` header. - -::: From 56786ff25ed31c027b07f38b6750df00421cdad7 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 7 Jun 2023 14:24:43 +0200 Subject: [PATCH 10/18] ipip-402: relax CAR roots requirement https://github.com/ipfs/specs/pull/402#discussion_r1220738340 --- src/http-gateways/trustless-gateway.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index fd31c9668..aa759aec4 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -182,5 +182,7 @@ field MUST match the `version` parameter returned in `Content-Type` header. If the response uses version 1 or 2 of the CAR spec, the [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field -MUST contain the CID of the terminating entity. +MAY contain the CID of the terminus of the content path. +If implementation prefers to avoid buffering blocks, and return them as soon as +possible, the field MAY be left empty. From 61d108887d9f1564d9d651e44bb990c176eb0c30 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 20 Jun 2023 21:47:07 +0200 Subject: [PATCH 11/18] ipip-404: descope car roots and determinism Discussions in https://github.com/ipfs/specs/pull/402 illustrated deeper problem with CAR determinism, and we made a decision to remove its aspects from IPIP-402. Ref. https://github.com/ipfs/specs/pull/402#issuecomment-1598000900 --- src/http-gateways/trustless-gateway.md | 56 ++++++++++++++++++-------- src/ipips/ipip-0402.md | 39 ++++++++++++++++++ 2 files changed, 78 insertions(+), 17 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index aa759aec4..29ca1bb65 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -4,7 +4,7 @@ description: > Trustless Gateways are a minimal subset of Path Gateways that allow light IPFS clients to retrieve data behind a CID and verify its integrity without delegating any trust to the gateway itself. -date: 2023-04-17 +date: 2023-06-20 maturity: reliable editors: - name: Marcin Rataj @@ -159,18 +159,9 @@ The Body hash MUST match the Multihash from the requested CID. ### CAR Response -A CAR stream -([application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) -for the requested content type, path and optional `dag-scope` and `entity-bytes` URL parameters. - -:::note - -By default, block order in CAR response is not deterministic, blocks can -be returned in different order, depending on implementation choices (traversal, -speed at which blocks arrive from the network, etc). An opt-in ordered CAR -responses MAY be introduced in a future, see [IPIP-412](https://github.com/ipfs/specs/pull/412). - -::: +A CAR stream for the requested +[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) +content type, path and optional `dag-scope` and `entity-bytes` URL parameters. #### CAR version @@ -180,9 +171,40 @@ field MUST match the `version` parameter returned in `Content-Type` header. #### CAR roots -If the response uses version 1 or 2 of the CAR spec, the +The behavior associated with the [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field -MAY contain the CID of the terminus of the content path. +is not currently specified. + +Clients MAY ignore it. + +:::issue + +As of 2023-06-20, the behavior of the `roots` CAR field remains an [unresolved item within the CARv1 specification](https://web.archive.org/web/20230328013837/https://ipld.io/specs/transport/car/carv1/#unresolved-items). + +::: + +#### CAR determinism + +The default CAR header and block order in a CAR response is not specified and is non-deterministic. + +Clients MUST NOT assume that CAR responses are deterministic (byte-for-byte identical) across different gateways. + +Clients MUST NOT assume that CAR includes CIDs and their blocks in the same order across different gateways. + +:::issue + +In controlled environments, clients MAY choose to rely on undocumented CAR determinism, +subject to the agreement of the following conditions between the client and the +gateway: +- CAR version +- content of [`CarV1Header.roots`](https://ipld.io/specs/transport/car/carv1/#header) field +- order of blocks +- status of duplicate blocks + +In the future, there may be an introduction of a convention to indicate aspects +of determinism in CAR responses. Please refer to +[IPIP-412](https://github.com/ipfs/specs/pull/412) for potential developments +in this area. + +::: -If implementation prefers to avoid buffering blocks, and return them as soon as -possible, the field MAY be left empty. diff --git a/src/ipips/ipip-0402.md b/src/ipips/ipip-0402.md index 451777f9b..4a83ab1ba 100644 --- a/src/ipips/ipip-0402.md +++ b/src/ipips/ipip-0402.md @@ -143,6 +143,45 @@ mention feature detection via OPTIONS -- a separate IPIP? OR suggest executing test request and client-side detection the first time a gateway is used. --> +#### CAR roots and determinism + +As of 2023-06-20, the behavior of the `roots` CAR field remains an [unresolved item within the CARv1 specification](https://web.archive.org/web/20230328013837/https://ipld.io/specs/transport/car/carv1/#unresolved-items): + +> Regarding the roots property of the Header block: +> +> - The current Go implementation assumes at least one CID when creating a CAR +> - The current Go implementation requires at least one CID when reading a CAR +> - The current JavaScript implementation allows for zero or more roots +> - Current usage of the CAR format in Filecoin requires exactly one CID +> +> [..] +> +> It is unresolved how the roots array should be constrained. It is recommended +> that only a single root CID be used in this version of the CAR format. +> +> A work-around for use-cases where the inclusion of a root CID is difficult +> but needing to be safely within the "at least one" recommendation is to use +> an empty CID: `\x01\x55\x00\x00` (zero-length "identity" multihash with "raw" +> codec). Since current implementations for this version of the CAR +> specification don't check for the existence of root CIDs +> (see [Root CID block existence](https://web.archive.org/web/20230328013837/https://ipld.io/specs/transport/car/carv1/#root-cid-block-existence)), +> this will be safe as far as CAR implementations are concerned. However, there +> is no guarantee that applications that use CAR files will correctly consume +> (ignore) this empty root CID. + +Due to the inconsistent and non-deterministic nature of CAR implementations, +the gateway specification faces limitations in providing specific +recommendations. Nevertheless, it is crucial for implementations to refrain +from making implicit assumptions based on the legacy behavior of individual CAR +implementations. + +Due to this, gateway specification changes introduced in this IPIP clarify that: +- The CAR `roots` behavior is out of scope and flags that clients MAY ignore it. +- CAR determinism is not present by default, responses may differ across + requests and gateways. +- Opt-in determinism is possible, but standarized signaling mechanism does not + exist until we have IPIP-412 or similar. + ### Security This IPIP allows clients to narrow down the amount of data returned as a CAR, From e1af6e4713e19eb999a0d50a3c3feee79915a2b6 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 4 Jul 2023 01:26:15 +0200 Subject: [PATCH 12/18] ipip-402: clarify error scenarios https://github.com/ipfs/specs/pull/402#pullrequestreview-1498000878 --- src/http-gateways/trustless-gateway.md | 57 ++++++++++++++++++-------- 1 file changed, 41 insertions(+), 16 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 29ca1bb65..ffafdb7c3 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -99,24 +99,29 @@ When present, returned `Etag` must include unique prefix based on the passed sco ### :dfn[entity-bytes] (request query parameter) -Optional, `entity-bytes=from:to` with the default value `0:*`, only available for CAR requests. -Serves as a trustless form of an HTTP Range Request. +The optional `entity-bytes=from:to` parameter is available only for CAR +requests. -When the terminating entity at the end of the specified content path can be -interpreted as a continuous array of bytes (such as a UnixFS file), returns -only the minimal set of blocks required to verify the specified byte range of -said entity. +It implies `dag-scope=entity` and serves as a trustless equivalent of an HTTP +Range Request. -Allowed values for `from` and `to` are positive integers where `to` >= `from`, which -limit the return blocks to needed to satisfy the range `[from,to]`: +When the terminating entity at the end of the specified content path: -- `from` value gives the byte-offset of the first byte in a range. -- `to` value gives the byte-offset of the last byte in the range; that is, -the byte positions specified are inclusive. Byte offsets start at zero. +- can be interpreted as a continuous array of bytes (such as a UnixFS file), a + Gateway MUST return only the minimal set of blocks necessary to verify the + specified byte range of that entity. + +- cannot be interpreted as a continuous array of bytes (such as a DAG-CBOR/JSON + map or UnixFS directory), the parameter MUST be ignored, and the request is + equivalent to `dag-scope=entity`. -If the entity at the end of the path cannot be interpreted as a continuous -array of bytes (such as a DAG-CBOR/JSON map, or UnixFS directory), this -parameter has no effect. +Allowed values for `from` and `to` follow a subset of section 14.1.2 from +:cite[rfc9110], where they are defined as offset integers that limit the +returned blocks to only those necessary to satisfy the range `[from,to]`: + +- `from` value gives the byte-offset of the first byte in a range. +- `to` value gives the byte-offset of the last byte in the range; + that is, the byte positions specified are inclusive. The following additional values are supported: @@ -129,7 +134,28 @@ The following additional values are supported: range of 500 bytes from the beginning of the file to 1000 bytes from the end: `entity-bytes=499:-1000` -When present, returned `Etag` must include unique prefix based on the passed range. +A Gateway MUST augment the returned `Etag` based on the passed `entity-bytes`. + +A Gateway SHOULD return an HTTP 400 Bad Request error when the requested range +cannot be parsed as valid offset positions. + +In more nuanced error scenarios, a Gateway MUST return a valid CAR response +that includes enough blocks for the client to understand why the requested +`entity-bytes` was incorrect or why only a part of the requested byte range was +returned: + +- If the requested `entity-bytes` resolves to a range that partially falls + outside of the entity's byte range, the response MUST include the subset of + blocks within the entity's bytes. + - This allows clients to request valid ranges of the entity without needing + to know its total size beforehand, and it does not require the Gateway to + buffer the entire entity before returning the response. + +- If the requested `entity-bytes` resolves to a zero-length range or falls + fully outside of the entity's bytes, the response is equivalent to + `dag-scope=block`. + - This allows client to produce a meaningful error (e.g, in case of UnixFS, + leverage `Data.blocksizes` information present in the root `dag-pb` block). # HTTP Response @@ -207,4 +233,3 @@ of determinism in CAR responses. Please refer to in this area. ::: - From 4ebbb9642439abf2b97ebc91e53c79f9d979c7b4 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 5 Jul 2023 01:38:56 +0200 Subject: [PATCH 13/18] ipip-402: final test fixtures section This backports CIDs and CARs from gateway conformance to the IPIP-402, and provides some basic hints on how each fixture should be used. --- src/ipips/ipip-0402.md | 127 +++++++++++++++++++++++------------------ 1 file changed, 73 insertions(+), 54 deletions(-) diff --git a/src/ipips/ipip-0402.md b/src/ipips/ipip-0402.md index 4a83ab1ba..9176ae8a9 100644 --- a/src/ipips/ipip-0402.md +++ b/src/ipips/ipip-0402.md @@ -138,11 +138,6 @@ Gateways ignore unknown URL parameters. A client sending them to a gateway that does not implement this IPIP will get all blocks for the requested DAG. - - #### CAR roots and determinism As of 2023-06-20, the behavior of the `roots` CAR field remains an [unresolved item within the CARv1 specification](https://web.archive.org/web/20230328013837/https://ipld.io/specs/transport/car/carv1/#unresolved-items): @@ -235,9 +230,14 @@ single request. ## Test fixtures - +Relevant tests were added to +[gateway-conformance](https://github.com/ipfs/gateway-conformance) test suite +in [#56](https://github.com/ipfs/gateway-conformance/pull/56) and +[#85](https://github.com/ipfs/gateway-conformance/issues/85). +Detailed list of compliance checks for `dag-scope` and `entity-bytes` can be found in +[`v0.2.0/trustless_gateway_car_test.go`](https://github.com/ipfs/gateway-conformance/blob/v0.2.0/tests/trustless_gateway_car_test.go) or later. + +Below are CIDs, CARs, and short summary of each fixture. ### Testing pathing @@ -249,15 +249,24 @@ returned CAR MUST include both the block with the file data and all blocks necessary for traversing from the root CID to the terminating element (all parents, root CID and a subdirectory below it). -Fixtures: - :::example -- TODO(gateway-conformance): `/ipfs/dag-pb-cid/parent/file?format=car` (UnixFS file in a subdirectory) +Sample fixtures: -- TODO(gateway-conformance): `/ipfs/dag-pb-cid/hamt-parent1/file?format=car` (UnixFS file on a path within HAMT-sharded parent directory) +- `bafybeietjm63oynimmv5yyqay33nui4y4wx6u3peezwetxgiwvfmelutzu` + from [`subdir-with-two-single-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/subdir-with-two-single-block-files.car) + for testing `/ipfs/dag-pb-cid/subdir/ascii.txt?format=car` + (UnixFS file in a subdirectory) -- TODO(gateway-conformance): `/ipfs/dag-cbor-cid/file?format=car` (UnixFS file on a path with DAG-CBOR root CID) +- `bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i` + from [`single-layer-hamt-with-multi-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/single-layer-hamt-with-multi-block-files.car) + for testing `/ipfs/dag-pb-hamt-cid/686.txt?format=car` + (UnixFS file on a path within HAMT-sharded parent directory) + +- `bafybeia264q44a3kmfc2otctzu4egp2k235o3t7mslz2yjraymp4nv6asi` + from [`dir-with-dag-cbor-with-links.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/dir-with-dag-cbor-with-links.car) + for testing `/ipfs/dag-cbor-cid/document?format=car` + (UnixFS file on a path with DAG-CBOR root CID) ::: @@ -271,13 +280,23 @@ To test real world use, request UnixFS `file` or a `directory` from a sub-path. The returned CAR MUST include blocks required for path traversal and ONLY the root block of the terminating entity. -Fixtures: - :::example -- TODO(gateway-conformance): `/ipfs/cid/parent/directory?format=car&dag-scope=block` (UnixFS directory on a path) +Sample fixtures: + +- `bafybeietjm63oynimmv5yyqay33nui4y4wx6u3peezwetxgiwvfmelutzu` + from [`subdir-with-two-single-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/subdir-with-two-single-block-files.car) + for testing + - `/ipfs/dag-pb-cid/subdir/ascii.txt?format=car&dag-scope=block` (UnixFS file in a subdirectory) + - `/ipfs/dag-pb-cid?format=car&dag-scope=block` (UnixFS directory) -- TODO(gateway-conformance): `/ipfs/cid/parent1/parent2/file?format=car&dag-scope=block` (UnixFS file on a path) +- `bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i` + from [`single-layer-hamt-with-multi-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/single-layer-hamt-with-multi-block-files.car) + for testing: + - `/ipfs/dag-pb-hamt-cid/1.txt?format=car&dag-scope=block` + (UnixFS multi-block file on a path within HAMT-sharded parent directory) + - `/ipfs/dag-pb-hamt-cid?format=car&dag-scope=block` + (UnixFS HAMT-sharded directory) ::: @@ -296,48 +315,45 @@ IPLD entity. Currently, the most popular entity types are: - `raw` / `dag-cbor` (block with raw data or DAG-CBOR document, potentially linking to other CIDs) -Fixtures: - :::example -- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&dag-scope=entity` - - Request a `chunked-dag-pb-file` (UnixFS file encoded with `dag-pb` with - more than one chunk). Returned blocks MUST be enough to deserialize the file. +Sample fixtures: -- TODO(gateway-conformance): `/ipfs/cid/dag-cbor-with-link?format=car&dag-scope=entity` - - Request a `dag-cbor-with-link` (DAG-CBOR document with CBOR Tag 42 pointing - at a third-party CID). The response MUST include the terminating entity (DAG-CBOR) - and MUST NOT include the CID from the Tag 42 (IPLD Link). +- `bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwu` + from [`subdir-with-mixed-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/subdir-with-mixed-block-files.car) + for testing: + - `/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity` (UnixFS multi-block file in a subdirectory) + - `/ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity` (UnixFS directory) -- TODO(gateway-conformance): `/ipfs/cid/flat-directory/file?format=car&dag-scope=entity` - - Request UnixFS `flat-directory`. The response MUST include the minimal set of - blocks required for enumeration of directory contents, and no blocks that - belong to child entities. +- `bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i` + from [`single-layer-hamt-with-multi-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/single-layer-hamt-with-multi-block-files.car) + for testing: + - `/ipfs/dag-pb-hamt-cid/1.txt?format=car&dag-scope=entity` + (UnixFS multi-block file on a path within HAMT-sharded parent directory, returned blocks MUST be enough to deserialize the file) + - `/ipfs/dag-pb-hamt-cid?format=car&dag-scope=entity` + (UnixFS HAMT-sharded directory, response MUST include the minimal set of blocks required for enumeration of directory contents, and no blocks that belong to child entities) -- TODO(gateway-conformance): `/ipfs/cid/hamt-directory/file?format=car&dag-scope=entity` - - Request UnixFS `hamt-directory`. The response MUST include the minimal set of - blocks required for enumeration of directory contents, and no blocks that - belong to child entities. +- `bafybeia264q44a3kmfc2otctzu4egp2k235o3t7mslz2yjraymp4nv6asi` + from [`dir-with-dag-cbor-with-links.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/dir-with-dag-cbor-with-links.car) + for testing `/ipfs/dag-cbor-cid/document?format=car&dag-scope=entity` + (DAG-CBOR document with IPLD Links must return all necessary blocks to verify the path, the document itself, but not the content behind any of the child entity IPLD Links) ::: ### Testing `dag-scope=all` This is the implicit default used when `dag-scope` is not present, -and explicitly used in the context of proxy gateway supporting :cite[ipip-0288]. - -Fixtures: +and used in the context of deserialized UnixFS TAR responses from :cite[ipip-0288]. :::example -- TODO(gateway-conformance): `/ipfs/cid-of-a-directory?format=car&dag-scope=all` - - Request a CID of UnixFS `directory` which contains two files. The response MUST - contain all blocks that can be accessed by recursively traversing all IPLD - Links from the root CID. +Sample fixtures: -- TODO(gateway-conformance): `/ipfs/cid/chunked-dag-pb-file?format=car&dag-scope=all` - - Request a CID of UnixFS `file` encoded with `dag-pb` codec and more than - one chunk. The response MUST contain blocks for all `file` chunks. +- `bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwu` + from [`subdir-with-mixed-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/subdir-with-mixed-block-files.car) + for testing: + - `/ipfs/dag-pb-cid/subdir?format=car&dag-scope=all` (path parents and the entire UnixFS subdirectory, returned recursively) + - `/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=all` (path parents and the entire UnixFS multi-block file) ::: @@ -358,16 +374,19 @@ Use of the below fixture is highly recommended: :::example -- TODO(gateway-conformance): `/ipfs/dag-pb-file?format=car&entity-bytes=40000000000-40000000002` - - - Request a byte range from the middle of a big UnixFS `file`. The response MUST - contain only the minimal set of blocks necessary for fullfilling the range - request. - -- TODO(gateway-conformance): `/ipfs/10-bytes-cid?format=car&entity-bytes=4:-2` - - - Request a byte range from the middle of a small file, to -2 bytes from the end. - - (TODO confirm we want keep this -- added since it was explicitly stated as a supported thing in path-gateway.md) +- `bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwu` + from [`subdir-with-mixed-block-files.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/subdir-with-mixed-block-files.car) + for testing: + - `/ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity&entity-bytes=0:*` (path blocks and all the blocks for the multi-block UnixFS file) + - `multiblock.txt?format=car&dag-scope=entity&entity-bytes=512:1023` (path blocks and all the blocks for the the range request within multi-block UnixFS file) + - `multiblock.txt?format=car&dag-scope=entity&entity-bytes=512:-256` (path blocks and all the blocks for the the range request within multi-block UnixFS file) + - `/ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity&entity-bytes=0:*` (path blocks and all the blocks to enumerate UnixFS directory) + +- `QmYhmPjhFjYFyaoiuNzYv8WGavpSRDwdHWe5B4M5du5Rtk` + from [`file-3k-and-3-blocks-missing-block.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/file-3k-and-3-blocks-missing-block.car) + for testing: + - `/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=0:1000` (only the blocks needed to fullfill the request, MUST succeed despite the fact that a block after the range is not retrievable) + - `/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=2200:*` (only the blocks needed to fullfill the request, MUST succeed despite the fact that a block before the range is not retrievable) ::: From e3c36f32018e9750039e489d9b920d8ad7db984a Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 5 Jul 2023 23:59:30 +0200 Subject: [PATCH 14/18] chore: fix typos --- src/ipips/ipip-0402.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/ipips/ipip-0402.md b/src/ipips/ipip-0402.md index 9176ae8a9..7de1f054b 100644 --- a/src/ipips/ipip-0402.md +++ b/src/ipips/ipip-0402.md @@ -88,7 +88,7 @@ Terse rationale for each feature: - Trustless HTTP Clients will be able to fetch a CAR with a file, byte range, or a directory enumeration using a way lower number of HTTP requests, which - will translate to improved resouce utilization, longer battery time on + will translate to improved resource utilization, longer battery time on mobile, and lower latency due to lower number of round trips. - CAR files downloaded from HTTP Gateways will always be end-to-end verifiable. @@ -104,7 +104,7 @@ Terse rationale for each feature: - [HTTP retrieval in Boost](https://boost.filecoin.io/http-retrieval) - [bifrost-gateway](https://github.com/ipfs/bifrost-gateway) -- Trustless Gateway is solidifed as the ecosystem wide standard. +- Trustless Gateway is solidified as the ecosystem wide standard. - IPIP tests added to [gateway-conformance](https://github.com/ipfs/gateway-conformance) test @@ -113,7 +113,7 @@ Terse rationale for each feature: - End users are empowered with primitives and tools that reduce retrieval cost, encourage self-hosting, or make validation of conformance claims of - free or comercial gateways possible. + free or commercial gateways possible. ### Compatibility @@ -174,7 +174,7 @@ Due to this, gateway specification changes introduced in this IPIP clarify that: - The CAR `roots` behavior is out of scope and flags that clients MAY ignore it. - CAR determinism is not present by default, responses may differ across requests and gateways. -- Opt-in determinism is possible, but standarized signaling mechanism does not +- Opt-in determinism is possible, but standardized signaling mechanism does not exist until we have IPIP-412 or similar. ### Security @@ -213,7 +213,7 @@ knows it has child entity named `index.html`, and everyone would pay a lower cos to lower number of blocks being returned in a single round-trip, instead of two. Rhea/Saturn projects requested this to be out of scope for now, but this "web" -entity scope could be added in the future, as a follow-up optimiziation IPIP. +entity scope could be added in the future, as a follow-up optimization IPIP. #### Requesting specific DAG depth @@ -234,7 +234,7 @@ Relevant tests were added to [gateway-conformance](https://github.com/ipfs/gateway-conformance) test suite in [#56](https://github.com/ipfs/gateway-conformance/pull/56) and [#85](https://github.com/ipfs/gateway-conformance/issues/85). -Detailed list of compliance checks for `dag-scope` and `entity-bytes` can be found in +A detailed list of compliance checks for `dag-scope` and `entity-bytes` can be found in [`v0.2.0/trustless_gateway_car_test.go`](https://github.com/ipfs/gateway-conformance/blob/v0.2.0/tests/trustless_gateway_car_test.go) or later. Below are CIDs, CARs, and short summary of each fixture. From 0953cb696efcb720a86f45bf6e2d5b528c034275 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 12 Jul 2023 15:54:33 +0200 Subject: [PATCH 15/18] ipip-402: clarify entity-bytes for sharded unixfs file https://github.com/ipfs/specs/pull/402#discussion_r1201514331 --- src/http-gateways/trustless-gateway.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 21e8b9e5e..1bbe96606 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -112,6 +112,10 @@ When the terminating entity at the end of the specified content path: Gateway MUST return only the minimal set of blocks necessary to verify the specified byte range of that entity. + - When dealing with a sharded UnixFS file (`dag-pb`, `0x70`) and a non-zero + `from` value, the UnixFS values `filesize` and `blocksizes` determine the + corresponding starting block for a given `from` offset. + - cannot be interpreted as a continuous array of bytes (such as a DAG-CBOR/JSON map or UnixFS directory), the parameter MUST be ignored, and the request is equivalent to `dag-scope=entity`. From ffb15508de8932bdfb6a4174f10778f3577f1825 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Wed, 12 Jul 2023 16:23:22 +0200 Subject: [PATCH 16/18] ipip-402: clarify status code in errors when streaming car https://github.com/ipfs/specs/pull/402#pullrequestreview-1522069540 --- src/http-gateways/trustless-gateway.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 1bbe96606..bfd8fad12 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -162,6 +162,14 @@ returned: - This allows client to produce a meaningful error (e.g, in case of UnixFS, leverage `Data.blocksizes` information present in the root `dag-pb` block). +- In streaming scenarios, if a Gateway is capable of returning the root block + but lacks prior knowledge of the final component of the requested content + path being invalid or absent in the DAG, a Gateway SHOULD respond with HTTP 200. + - This behavior is a consequence of HTTP streaming limitations: blocks are + not buffered, by the time a related parent block is being parsed and + returned to the client, the HTTP status code has already been sent to the + client. + # HTTP Response Below MUST be implemented **in addition** to "HTTP Response" of :cite[path-gateway]. From 584098d8d0d6f293e0203645105365da7d9af624 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Fri, 21 Jul 2023 17:48:35 +0200 Subject: [PATCH 17/18] ipip-402: clarify unixfs offset --- src/http-gateways/trustless-gateway.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index bfd8fad12..4a8632898 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -113,7 +113,7 @@ When the terminating entity at the end of the specified content path: specified byte range of that entity. - When dealing with a sharded UnixFS file (`dag-pb`, `0x70`) and a non-zero - `from` value, the UnixFS values `filesize` and `blocksizes` determine the + `from` value, the UnixFS data and `blocksizes` determine the corresponding starting block for a given `from` offset. - cannot be interpreted as a continuous array of bytes (such as a DAG-CBOR/JSON From 917efb95b3e2b765b40a211ee70c14d7d2f50447 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 27 Jul 2023 16:09:08 +0200 Subject: [PATCH 18/18] chore: fix typo --- src/ipips/ipip-0402.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/ipips/ipip-0402.md b/src/ipips/ipip-0402.md index 7de1f054b..6164c02bb 100644 --- a/src/ipips/ipip-0402.md +++ b/src/ipips/ipip-0402.md @@ -385,8 +385,8 @@ Use of the below fixture is highly recommended: - `QmYhmPjhFjYFyaoiuNzYv8WGavpSRDwdHWe5B4M5du5Rtk` from [`file-3k-and-3-blocks-missing-block.car`](https://github.com/ipfs/gateway-conformance/raw/v0.2.0/fixtures/trustless_gateway_car/file-3k-and-3-blocks-missing-block.car) for testing: - - `/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=0:1000` (only the blocks needed to fullfill the request, MUST succeed despite the fact that a block after the range is not retrievable) - - `/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=2200:*` (only the blocks needed to fullfill the request, MUST succeed despite the fact that a block before the range is not retrievable) + - `/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=0:1000` (only the blocks needed to fulfill the request, MUST succeed despite the fact that a block after the range is not retrievable) + - `/ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=2200:*` (only the blocks needed to fulfill the request, MUST succeed despite the fact that a block before the range is not retrievable) :::