Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: indexer support for cbor encoding #139

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

NikolasHaimerl
Copy link
Contributor

Add support for DAG-CBOR encoded entries

This PR adds support for DAG-CBOR encoded entries (CIDs starting with "bafy", codec 0x71) in addition to the existing DAG-JSON format. This enables the piece-indexer to process advertisements from providers like Curio that use DAG-CBOR encoding.

The implementation:

  • Detects entry format from CID codec (297 for DAG-JSON, 113 for DAG-CBOR)
  • Handles both encoded formats appropriately
  • Includes comprehensive tests for both formats

Closes #119

Implementation Plan: #119 (comment)
Parent Issue: #132

@NikolasHaimerl NikolasHaimerl marked this pull request as ready for review March 3, 2025 12:04
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!

} catch (err) {
debug('Error processing entries: %s', err)
return {
error: /** @type {const} */('ENTRIES_NOT_RETRIEVABLE'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a new error code for this error case. We were able to retrieve the entries, but we are not able to extract the payload CID from the retrieved chunk. The code ENTRIES_NOT_RETRIEVABLE does not describe such situation correctly.

const codec = parsedCid.code

switch (codec) {
case 297: // DAG-JSON: https://github.com/multiformats/multicodec/blob/master/table.csv#L113
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a permalink. If a new entry is added to table.csv, the line L113 may no longer point to DAG-JSON entry. You can create a permalink by pressing y - GitHub web UI will change the URL to point to the commit SHA you are viewing.

I think it's not necessary to link to specific lines in the multicodec table. I would simply add a link to https://github.com/multiformats/multicodec/blob/master/table.csv before the switch statement.

// List of codecs:
// https://github.com/multiformats/multicodec/blob/master/table.csv
switch (codec)  {
  case 297: // DAG-JSON
  case 113: // DAG-CBOR
}

I'll let you decide which approach you prefer. Please update line 364 below in sync with this line.

*/
export function processEntries (entriesCid, entriesChunk) {
if (!entriesChunk.Entries || !entriesChunk.Entries.length) {
throw new Error('No entries found in DAG-CBOR response')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's only DAG-CBOR response that can have no entries? Cannot it happen with DAG-JSON response too?

Suggested change
throw new Error('No entries found in DAG-CBOR response')
throw new Error(`No entries found in the response for ${entriesCid}`)

// Verify the '/' property is an object with 'bytes' property
// In DAG-JSON, CIDs are represented as objects with a '/' property that contains 'bytes'
if (!entry['/'] || typeof entry['/'] !== 'object' || !('bytes' in entry['/'])) {
throw new Error('DAG-JSON entry\'s "/" property must be a CID object with a bytes property')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, IPNI entries are multihashes, not CIDs. (That's why we have to convert them to CIDs by adding a Raw codec.) In that case, the error message is not correct.

Suggested change
throw new Error('DAG-JSON entry\'s "/" property must be a CID object with a bytes property')
throw new Error('DAG-JSON entry\'s "/" property must be an object with a "bytes" property')

}

const entryHash = entry['/'].bytes
entryBytes = Buffer.from(String(entryHash), 'base64')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to convert entryHash to a String? I believe it is already a string.

The original version was using this conversion:

Buffer.from(entryHash, 'base64')

What am I missing?

If you want to better handle non-string bytes values, then we should reject them.

if (typeof entryHash !== 'string') {
  throw new Error('DAG-JSON entry\'s ["/"]["bytes"] property must be a string')
}

That should satisfy the TypeScript checker to allow you to call Buffer.from(entryHash).

Comment on lines +670 to +671
// Restore the original fetch function after each test
globalThis.fetch = originalFetch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not mock the global fetch function and instead inject it to fetchCid function. It's a pattern we already use in other repositories.

The trick is to define the fetch parameter as optional and defaulting to the global fetch function.

export async function fetchCid (providerBaseUrl, cid, { fetchTimeout, fetch } = { globalThis.fetch }) {
  // ...
}

// in tests
const fetch = async () => {
  ok: true,
  status: 200,
  json: () => Promise.resolve(jsonResponse),
  arrayBuffer: () => { throw new Error('Should not call arrayBuffer for JSON') }
  })
})
const result = await fetchCid('http://example.com', dagJsonCid, { fetch })
// ... 

Also notice that the following line can be simplified:

let fn
fn = () => { return Promise.resolve(/* result */) }
// is the same as
fn = () => Promise.resolve(/* result */)
// can be also written as
fn = async () => /* result */

})

it('uses DAG-CBOR codec (0x71) to parse response as CBOR', async () => {
const cborData = cbor.encode(jsonResponse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unexpected - jsonResponse encoded using cbor encoding. I was expecting to see cbor.encode(cborResponse).

After reading the rest of the test, I see that it does not matter what is the structure of the response body. Let's rename jsonResponse to something like someResponse or testResponse to make it more clear this object is not related to whether we use JSON or CBOR encoding.

const unknownCodecCid = 'bafkreigrnnl64xuevvkhknbhrcqzbdvvmqnchp7ae2a4ulninsjoc5svoq'
const parsedCid = CID.parse(unknownCodecCid)
assert.strictEqual(parsedCid.code, 85)
const errorMessage = 'To parse non base32, base36 or base58btc encoded CID multibase decoder must be provided'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this error message. The CID in unknownCodecCid is encoded using base32 as you can verify here: https://cid.ipfs.tech/#bafkreigrnnl64xuevvkhknbhrcqzbdvvmqnchp7ae2a4ulninsjoc5svoq

I am expecting an error message telling us that we encountered an unsupported codec 85.

Perhaps the test is triggering a different error path than you thought?

Comment on lines +754 to +760
// @ts-ignore
const result = await pRetry(
() =>
(
fetchCid(curioProviderUrl, dagCborCid)
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to disable type-checking for this statement?

Let's try to find a way how to tell TypeScript what we are trying to accomplish.

Comment on lines +767 to +773
// @ts-ignore
const entriesChunk = await pRetry(
() =>
(
fetchCid(curioProviderUrl, entriesCid)
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DAG-CBOR encoded entries
2 participants