Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: IPFS retrieval client #243

Merged
merged 25 commits into from
Jun 13, 2023
Merged

feat: IPFS retrieval client #243

merged 25 commits into from
Jun 13, 2023

Conversation

bajtos
Copy link
Member

@bajtos bajtos commented Jun 7, 2023

See space-meridian/roadmap#19

Discussion points:

  • Are we happy with fetch('ipfs://bafycid') API? I believe this API is provided by Brave when the IPFS integration is enabled.
  • Do we want to hide any of the Lassie response headers from Station Modules making the retrieval requests?
    "server-timing": "fetch;dur=0.215625;finished=3242985750,indexer;dur=0.000041;candidates-found=1693332958;candidates-f"... 168 more characters,
    "x-content-type-options": "nosniff",
    "x-ipfs-path": "/ipfs/bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni",
    "x-trace-id": "efaa08c0-56b1-4f30-8201-651983900233"
    

Example use

const response = await fetch(
  "ipfs://bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni"
);
assert(response.ok);
const data = await response.arrayBuffer();
// data contains binary data in the CAR format

Response object (body is a CAR stream):

Response {
  body: ReadableStream { locked: true },
  bodyUsed: true,
  headers: Headers {
  "accept-ranges": "none",
  "cache-control": "public, max-age=29030400, immutable",
  "content-disposition": 'attachment; filename="bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni.car"',
  "content-length": "167",
  "content-type": "application/vnd.ipld.car; version=1",
  date: "Wed, 07 Jun 2023 15:07:13 GMT",
  etag: '"bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni.car.2j3mos0ne09ql"',
  "server-timing": "fetch;dur=0.215625;finished=3242985750,indexer;dur=0.000041;candidates-found=1693332958;candidates-f"... 168 more characters,
  "x-content-type-options": "nosniff",
  "x-ipfs-path": "/ipfs/bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni",
  "x-trace-id": "efaa08c0-56b1-4f30-8201-651983900233"
},
  ok: true,
  redirected: false,
  status: 200,
  statusText: "OK",
  url: "ipfs://bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni"
}

TODO

  • Run Lassie in background
  • API for module builders + tests
    • fetch('ipfs://bafycid')
    • fetch(new URL('ipfs://bafycid'))
    • fetch(new Request(...))
  • docs for module builders
  • Apple Silicon (ARM64) support

Out of scope

Signed-off-by: Miroslav Bajtoš <[email protected]>
@bajtos bajtos marked this pull request as ready for review June 8, 2023 15:31
@bajtos
Copy link
Member Author

bajtos commented Jun 8, 2023

Are we happy with fetch('ipfs://bafycid') API? I believe this API is provided by Brave when the IPFS integration is enabled.

I did a test in Brave:

  1. Enable IPFS integration in settings

  2. Load ipfs://bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni (otherwise ipfs:// requests are blocked by the built-in CORS rules)

  3. Open Dev Tools > Console

  4. Run the following code:

    await (await fetch(
      "ipfs://bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni"
    )).text()
  5. We get back the content of the file (not CAR):

    My most famous drawing, and one of the first I did for the site
    
    Screenshot 2023-06-08 at 17 30 41
  6. Append ?format=car to the URL to get back the CAR file

    await (await fetch(
      "ipfs://bafybeib36krhffuh3cupjml4re2wfxldredkir5wti3dttulyemre7xkni?format=car"
    )).arrayBuffer()
    Screenshot 2023-06-08 at 17 35 06

My conclusion is that IPFS Gateways and Brave's built-in IPFS node (presumably based on Kubo) has different default behaviour from Lassie:

  • Lassie defaults to CAR
  • Kubo & GW default to autodetect the file's content type and return the file's content (decoded from CAR)

If we feel this can be confusing to Zinnia users, then I am proposing to enforce modules to explicitly include either ?format=car in the query string or set Accept: application/vnd.ipld.car request header when fetching ipfs:// URLs. This allows us to implement GW-like behaviour without breaking existing code.

Having said that, considering the early alpha status of Zinnia, I think we can afford to introduce breaking changes, and it's better to invest our time in things adding more value.

@juliangruber WDYT?

@bajtos bajtos requested a review from juliangruber June 8, 2023 15:51
@bajtos
Copy link
Member Author

bajtos commented Jun 8, 2023

Reminder for myself: after we land this PR, I need to update our Windows release builds to include golassie.dll in the zip archives.

@@ -21,6 +21,8 @@ deno_fetch = "0.129.0"
deno_url = "0.105.0"
deno_web = "0.136.0"
deno_webidl = "0.105.0"
lassie = "0.3.0"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. I didn't know there is a Rust client!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a thin Rust wrapper embedding the original Go Lassie, I started the project three weeks ago :)

https://github.com/filecoin-station/rusty-lassie

Copy link
Member

@juliangruber juliangruber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with fetch('ipfs://bafycid') API? I believe this API is provided by Brave when the IPFS integration is enabled.

It's intuitive and already established, I'd also say let's keep it until we require a different strategy (which could be added in addition to the always nice fetch() API)

Do we want to hide any of the Lassie response headers from Station Modules making the retrieval requests?

I'm not sure whether keeping or removing has advantages. Let's see what happens?

cli/main.rs Outdated

let lassie_daemon = Arc::new(
lassie::Daemon::start(lassie::DaemonConfig {
temp_dir: None, // TODO: Should we use something like ~/.cache/zinnia/lassie?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Station Core will pass $CACHE_ROOT:

https://github.com/filecoin-station/core/blob/7f07e5203c71366fa9bde713b0d28dee9ea0c51d/lib/zinnia.js#L48-L55.

Can we make Zinnia use that if it is set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am already using CACHE_ROOT in zinniad, see here:

https://github.com/filecoin-station/zinnia/pull/243/files#diff-c74dac62db9c80f1be22978c93249f7b304e05ddb38131e7969efa13effaeb1eR45

The file cli/main.js implements zinnia, the CLI people use locally when building Station modules. Let's explore together what a good developer experience would look like?

IMO:

  • We should not force zinnia users to always provide CACHE_ROOT. We don't ask them for FIL_WALLET_ADDRESS either. This way, users can type zinnia run main.js, and all works out of the box.
  • I guess allowing CLI users to control the CACHE ROOT can be helpful. I am not sure, though, if an env var provides good ergonomy. Would a project-specific config file be easier to use?
  • How important is this? Can we leave the current solution and open a follow-up GH issue to discuss what would a good (and easy-to-implement) solution look like?
  • Note: if we tell Lassie to use a specific temp dir that's not automatically cleaned by the operating system, we will need to clean any leftover files ourselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: if we tell Lassie to use a specific temp dir that's not automatically cleaned by the operating system, we will need to clean any leftover files ourselves.

I'll be implementing this cleanup in zinnia as part of #245

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting always needing to pass CACHE_ROOT, I thought if it's not passed then lassie shall pick its own temp dir, if it is passed (as in Station Core) it shall use that, just to keep all of the files together.

The primary use case for changing lassie's temp dir to me is not CLI usage but inside Station.

Note: if we tell Lassie to use a specific temp dir that's not automatically cleaned by the operating system, we will need to clean any leftover files ourselves.

That's a great point! What does Lassie use by default rn? If it uses an OS cleaned up dir, I'd suggest to leave it at that and add a comment summarizing our discussion here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point! What does Lassie use by default rn? If it uses an OS cleaned up dir, I'd suggest to leave it at that and add a comment summarizing our discussion here

Quoting from Lassie comments:
https://github.com/filecoin-project/lassie/blob/afc2ee5a4bc6f5e22ef2cc69396cc9b25f57b854/pkg/lassie/lassie.go#L199-L201

// WithTempDir allows you to specify a custom temp directory for bitswap
// retrievals, used for a temporary block store for the preloader. The default
// is the system temp directory.

I think that should be good enough for now, even if we may end up leaving some temporary files behind when zinnia exists unexpectedly.

docs/building-modules.md Outdated Show resolved Hide resolved
docs/building-modules.md Outdated Show resolved Hide resolved
docs/building-modules.md Outdated Show resolved Hide resolved
runtime/js/fetch.js Outdated Show resolved Hide resolved

export function fetch(resource, options) {
let request = new Request(resource, options);
// Fortunately, Request#url is a string, not an instance of URL class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is fortunate about that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! (The original answer got lost in the git history.)

The fetch API accepts a wide range of types for the "resource" argument. Quoting from https://developer.mozilla.org/en-US/docs/Web/API/fetch#parameters:

resource

This defines the resource that you wish to fetch. This can either be:

  • A string or any other object with a stringifier — including a URL object — that provides the URL of the resource you want to fetch.
  • A Request object.

If request#url was preserving the original value, then I would need to figure out how our custom fetch wrapper can detect an object with a stringier and call that stringier to obtain the resource URL as a string.

What's fortunate: the conversion from "resource in one of the many supported formats" to "resource URL as a string" is already handled by the Request constructor.

Can you suggest how to improve my code comment to make this matter easier to understand for future readers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha!

What do you think about

// Fortunately Request#url is always a string, no matter what was used to construct it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to write a longer comment, see 7db4ee8

  // The `resource` arg can be a string or any other object with a stringifier — including a URL
  // object — that provides the URL of the resource you want to fetch; or a Request object.
  // See https://developer.mozilla.org/en-US/docs/Web/API/fetch#parameters
  // Fortunately, Request's constructor handles the conversions, and Request#url is always a string.
  // See https://developer.mozilla.org/en-US/docs/Web/API/Request/url

runtime/tests/js/ipfs_retrieval_tests.js Outdated Show resolved Hide resolved
@juliangruber
Copy link
Member

If we feel this can be confusing to Zinnia users, then I am proposing to enforce modules to explicitly include either ?format=car in the query string or set Accept: application/vnd.ipld.car request header when fetching ipfs:// URLs. This allows us to implement GW-like behaviour without breaking existing code.

Having said that, considering the early alpha status of Zinnia, I think we can afford to introduce breaking changes, and it's better to invest our time in things adding more value.

I couldn't agree more!

bajtos and others added 6 commits June 12, 2023 12:03
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
Signed-off-by: Miroslav Bajtoš <[email protected]>
@bajtos bajtos requested a review from juliangruber June 13, 2023 11:55
@bajtos bajtos merged commit 5b0a12e into main Jun 13, 2023
@bajtos bajtos deleted the feat-lassie branch June 13, 2023 12:29
@bajtos bajtos mentioned this pull request Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants