Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Potentially) only announce Manifest Cid on the DHT #227

Open
dryajov opened this issue Aug 23, 2022 · 5 comments
Open

(Potentially) only announce Manifest Cid on the DHT #227

dryajov opened this issue Aug 23, 2022 · 5 comments
Assignees

Comments

@dryajov
Copy link
Contributor

dryajov commented Aug 23, 2022

Right now, we're announcing every single hash/cid from a dataset on the DHT, this has significant overhead and can be potentially overcome by limiting announcements to only the Manifest cid. This would require adding a way to identifying/indexing the Manifests in the repo.

The listBlocks from the BlockStore, will take an additional argument with the multicodec that we want to retrieve. The datastore has to allow querying based on the multicodec.

@Bulat-Ziganshin
Copy link
Contributor

Bulat-Ziganshin commented Aug 23, 2022

Maybe classify them by type rather than codec? Note that putBlock should provide this info to the store, we can introduce putBlockEx or add an optional parameter to putBlock.

Namespaces in Nim-DataStore is the solution that was meant for this usecase from the start.

@dryajov
Copy link
Contributor Author

dryajov commented Aug 23, 2022

multicodec is the type, we will eventually add a manifest multicodec.

@michaelsbradleyjr
Copy link
Contributor

michaelsbradleyjr commented Aug 24, 2022

Namespaces in Nim-DataStore is the solution that was meant for this usecase from the start.

Yes, exactly. In the PR description for codex-storage/nim-datastore#19, you can see I gave an example of how key structure might be used to distinguish manifests from blocks, and possibly associate blocks with a particular manifest

for block in ds.query(Query.init("manifest:abcd1234/block:*")):
  let
    # key will look like "manifest:abcd1234/block:[cid]"
    (key, data) = await block
    
  ...

How we actually do it should take into account the overhead associated with substrings manifest: and block: being in the key of every manifest and block, respectively, i.e. in every row in the database of nim-codex's SQLiteDatastore instance.

It would be even more overhead if every block-key had substrings manifest:, the manifest CID, /, and block:. It's probably not worth it.

We could shorten to m:[cid] for manifests and b:[cid] for blocks.

@Bulat-Ziganshin
Copy link
Contributor

As the first step, I will modify API so datastore and network part will know whether Cid defines a dataset or a datablock

@Bulat-Ziganshin
Copy link
Contributor

Talking about advertiseQueueLoop(), I see these ideas:

  1. optimize it: use batches of CIDs in every place (onBlock, asyncQ.put, network send)
  2. send only diff. updates most of the time (require timestamp in BlockStore entries)
  3. advertize only manifest CIDs. This will require node.retrieve to request data from NetworkStore by ManifestCID+BlockCID: getBlock(cid, manifestCID = ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants