Skip to content

provider: duplicated CIDs sent to provide queue #901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gammazero opened this issue Mar 27, 2025 · 2 comments
Open

provider: duplicated CIDs sent to provide queue #901

gammazero opened this issue Mar 27, 2025 · 2 comments
Assignees
Labels
P1 High: Likely tackled by core team if no one steps up

Comments

@gammazero
Copy link
Contributor

gammazero commented Mar 27, 2025

part of https://github.com/ipshipyard/roadmaps/issues/6

The provide queue is getting many duplicate CIDs. It turns out that normal ipfs add and mfs are both adding blocks to the blockservice, and causing duplicated CIDs to be provided.

This may be tolerable since the CIDs are deduplicated after being read from the queue, here, but it does mean the queue has to hold double the CIDs that are actually provided.

It cases where things are added in such a way that follows both call paths, it may be worth passing some parameter that suppresses providing in one of the paths.

Call stack from normal ipfs-add path:

github.com/ipfs/boxo/provider.(*reprovider).Provide(0x14001585080?, {0x10535d940?, 0x14001589e90?}, {{0x14000635ef0?, 0x1?}}, 0x1?)
        github.com/ipfs/[email protected]/provider/reprovider.go:464 +0x2c
github.com/ipfs/boxo/exchange/providing.(*Exchange).NotifyNewBlocks(0x14000d4e6a0, {0x10535d940, 0x14001589e90}, {0x140015ba5a0, 0x1, 0x1?})
        github.com/ipfs/[email protected]/exchange/providing/providing.go:41 +0xbc
github.com/ipfs/boxo/blockservice.(*blockService).AddBlocks(0x1400158cc40, {0x10535d978, 0x14000b67b80}, {0x140015ba5a0, 0x1, 0x1})
        github.com/ipfs/[email protected]/blockservice/blockservice.go:223 +0x3ac
github.com/ipfs/boxo/ipld/merkledag.(*dagService).AddMany(0x14000b3f5a8, {0x10535d978, 0x14000b67b80}, {0x140015ba580, 0x1, 0x10327cdd4?})
        github.com/ipfs/[email protected]/ipld/merkledag/merkledag.go:71 +0xe4
github.com/ipfs/go-ipld-format.(*Batch).asyncCommit.func1(...)
        github.com/ipfs/[email protected]/batch.go:100
created by github.com/ipfs/go-ipld-format.(*Batch).asyncCommit in goroutine 380
        github.com/ipfs/[email protected]/batch.go:98 +0x194
(edited)

Call stack from mfs path:

github.com/ipfs/boxo/provider.(*reprovider).Provide(0x140014ce480?, {0x106c3d940?, 0x14001685860?}, {{0x140016a0930?, 0x1?}}, 0x1?)
        github.com/ipfs/[email protected]/provider/reprovider.go:464 +0x2c
github.com/ipfs/boxo/exchange/providing.(*Exchange).NotifyNewBlocks(0x14000f77ca0, {0x106c3d940, 0x14001685860}, {0x140016a62c0, 0x1, 0x1?})
        github.com/ipfs/[email protected]/exchange/providing/providing.go:41 +0xbc
github.com/ipfs/boxo/blockservice.(*blockService).AddBlock(0x14001409140, {0x106c3d978, 0x14001421c20}, {0x14fcc9740, 0x140014ce480})
        github.com/ipfs/[email protected]/blockservice/blockservice.go:177 +0x250
github.com/ipfs-shipyard/nopfs/ipfs.(*BlockService).AddBlock(0x14001424b70, {0x106c3d978, 0x14001421c20}, {0x14fcc9740, 0x140014ce480})
        github.com/ipfs-shipyard/nopfs/[email protected]/blockservice.go:78 +0x1a8
github.com/ipfs/boxo/ipld/merkledag.(*dagService).Add(0x14001424ba0?, {0x106c3d978?, 0x14001421c20?}, {0x106c53a10?, 0x140014ce480?})
        github.com/ipfs/[email protected]/ipld/merkledag/merkledag.go:63 +0x94
github.com/ipfs/boxo/mfs.(*Directory).getNode(0x140014ce4e0, 0xa8?)
        github.com/ipfs/[email protected]/mfs/dir.go:404 +0x110
github.com/ipfs/boxo/mfs.(*Directory).GetNode(...)
        github.com/ipfs/[email protected]/mfs/dir.go:387
github.com/ipfs/boxo/mfs.(*Root).Close(0x140014d8c50)
        github.com/ipfs/[email protected]/mfs/root.go:197 +0x34
github.com/ipfs/kubo/core/node.Files.func2({0x10675f880?, 0x106c27180?})
        github.com/ipfs/kubo/core/node/core.go:208 +0x20
@gammazero gammazero added the need/triage Needs initial labeling and prioritization label Mar 27, 2025
@gammazero gammazero changed the title Duplicated CIDs sent to provide queue provider: duplicated CIDs sent to provide queue Mar 27, 2025
@hsanjuan hsanjuan self-assigned this Apr 1, 2025
@gammazero gammazero added P2 Medium: Good to have, but can wait until someone steps up P1 High: Likely tackled by core team if no one steps up and removed need/triage Needs initial labeling and prioritization P2 Medium: Good to have, but can wait until someone steps up labels Apr 1, 2025
@gammazero
Copy link
Contributor Author

Providing is happening 3X

  • Provides from adding the file to a buffered DAG service
  • Provides from patching the file into a temp MFS tree
  • Provides from Pinning the MFS tree

In the case of folders they are provided 3x too, also the wrapping folder that is discarded when not wrapping. But instead of the buffered provide, the last provide is when closing the MFS tree.

@hsanjuan
Copy link
Contributor

hsanjuan commented Apr 2, 2025

Ok, so we have multiple issues the way things work right now:

  • The exchange interface must implement a NotifyNewBlocks
  • We have a "providing" exchange which plugs the reprovider to anything that uses the exchange. Particulary the blockservice calls this on AddBlock() and when getting new blocks.
  • The Blockservice is the underlying layer for DAGService which are used for adding...

This means we enqueue to announce:

  • Any block that is fetched for whatever reason.
  • Any block that is added via DagService
  • Any block that is part of MFS
  • Any MFS directory that is flushed
  • Any block that is part of temporary MFS DAGs when adding

When adding particularly, the root CID, and the temp-MFS root CID are submitted several times to the DAG service. If the DAG has folders, they are also submitted several times due to cacheSync operations on MFS (for example to print added directories while adding, which requires reading nodes, which in turns syncs the cache to disk). The issue is not so bad when adding big files (internal blocks are written only once).

All of this is regardless on whether our providing strategy is "all" or "roots" or (soon) "mfs"... so we will be all blocks added even if not intended.

Potential approach

Seeing that most of the issue when adding comes from the temp-MFS, I believe

  • we should create the temporary-MFS DAGs on an offline blockservice that does not to provide, immediately. Once we have finalized adding, there is an outputDirs function which recursively visits the temp-MFS tree to print-out directories (leaving out raw-leaves). We can take the chance that we are reading the temp-MFS tree directories to provide them as we go only once.

  • Since we know whether the added content will be pinned or put on MFS, we can additionally use an offline dagservice when adding content that is not going to be pinned when our strategy is "pinned" or "roots" etc. As a way to avoid providing blocks unnecessarily and take into account the configured strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High: Likely tackled by core team if no one steps up
Projects
None yet
Development

No branches or pull requests

2 participants