Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check cids in denylist before providing #215

Merged
merged 11 commits into from
Jul 5, 2023
Merged

feat: check cids in denylist before providing #215

merged 11 commits into from
Jul 5, 2023

Conversation

olizilla
Copy link
Contributor

@olizilla olizilla commented Jun 20, 2023

  • cap the max number of cids we'll accept in a single message.
    • We're seeing 20k spikes in our bitswap-pending-entries metrics per node, so I'm putting in a hard cap of 500 wanted cids per message that we'll process. The caller can ask again if they need more. This also means i can put a sensible cap on how many cids the denylist service should handle in a batch.
  • check batches of cids against our denylist api.
  • cache entries that are on the denylist; they are rarely removed.
  • use cache to avoid asking about items we already know, and as a fallback if denylist service cannot be reached.
  • add bitswap-denied counter metric to see how many CIDs we skip due to being on the denylist

see: batch endpoint for denylist api – storacha/reads#166
see: set DENYLIST_URL in env - elastic-ipfs/bitswap-peer-deployment#99

License: MIT

- cap the max number of cids we'll accept in a single message.
- check batches of cids against our denylist api.
- cache entries that are on the denylist; they are rarely removed.

License: MIT
Signed-off-by: Oli Evans <[email protected]>
and lint

License: MIT
Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
@olizilla olizilla marked this pull request as ready for review June 21, 2023 11:09
src/deny.js Outdated
if (res.ok) {
const denylist = new Set(await res.json())
for (const cidStr of denylist.values()) {
denylistCache.set(cidToKey(CID.parse(cidStr)), true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these cids are never pulled out of the cache... so items that do get removed from a badbits list won't get updated here... but our bitswap peer nodes tend to get restarted every day, so i'm not going to do extra work here... if this PR magically makes our pods less crashy, then we can review this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this comment be also added in the code file? I feel it would be nice to have context there, and maybe an issue created

License: MIT
Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
@@ -28,6 +28,8 @@
"libp2p": "0.42.2",
"lru-cache": "7.14.1",
"mnemonist": "0.39.5",
"multiformats": "^10.0.3",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 because this code already depends on it at that version but previously didn't declare that dep, so would use a random one of 3 different major versions that are in the dep tree. Version 11+ doesn't work with the test helpers due to https://github.com/multiformats/js-multiformats/blob/4a36fb7ee49edb4300267b90301ef0e4300cbc46/src/cid.js#L305

src/deny.js Outdated
denylistCache.set(cidToKey(CID.parse(cidStr)), true)
}
// all `cancel`s go through, and items not on the denylist
return entries.filter(entry => entry.cancel || !denylist.has(entry.cid.toString()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just incase you ever update a previously cahced item to false...

Suggested change
return entries.filter(entry => entry.cancel || !denylist.has(entry.cid.toString()))
return entries.filter(entry => entry.cancel || !denylist.get(entry.cid.toString()))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using get here to update the last used time. In this LRU impl has tests do not update that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, confusing. denylist here is the set of CIDs returned from the denylist api, that represents the CIDs that should not be served. Is a Set not a Map

elsewhere we have denylistCache which is an LRU, and requires us to store a value for a our CID key, and that has a Map-like api, but we must call .get to update the last used time. I will tweak the names to make clearer.

allowReadinessTweak: process.env.ALLOW_READINESS_TWEAK === 'true'
allowReadinessTweak: process.env.ALLOW_READINESS_TWEAK === 'true',

denylistUrl: process.env.DENYLIST_URL ? new URL(process.env.DENYLIST_URL) : undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be added to the .env.sample?
We also need to get this set in the deployment

src/deny.js Outdated
if (res.ok) {
const denylist = new Set(await res.json())
for (const cidStr of denylist.values()) {
denylistCache.set(cidToKey(CID.parse(cidStr)), true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this comment be also added in the code file? I feel it would be nice to have context there, and maybe an issue created

olizilla added a commit to storacha/reads that referenced this pull request Jun 27, 2023
Send an array of CID strings to check if they are on the denylist.
Response is the subset of the array that is on the denylist.

```http
POST /
Content-Type: application/json

["cid1","cid2"]
```

This will be used by e-ipfs to check a batch of cids at once, see:
elastic-ipfs/bitswap-peer#215

License: MIT

---------

Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
License: MIT
Signed-off-by: Oli Evans <[email protected]>
@olizilla
Copy link
Contributor Author

olizilla commented Jul 5, 2023

@vasco-santos @alanshaw this PR changed a little since you last looked so please take a look if you have a moment.

  • added bitswap-denied counter metric to track how many CIDs we ignore due to being on the denylist
  • made the logic in deny.js more readable, particularly around the nuances of how we use the cached info vs the current denylist we get back from the api
  • simplified the cache keys to just use cid strings. trades off catching alt cid formulations at the cache layer for simplicity and not having to cid.parse every item in the response list to dig out the multihash.

olizilla added a commit to elastic-ipfs/bitswap-peer-deployment that referenced this pull request Jul 5, 2023
Ensure DENYLIST_URL is set in env.

see: elastic-ipfs/bitswap-peer#215

License: MIT
Signed-off-by: Oli Evans <[email protected]>
@olizilla olizilla merged commit 0f0a4a3 into main Jul 5, 2023
@olizilla olizilla deleted the denylist branch July 5, 2023 14:57
olizilla added a commit to elastic-ipfs/bitswap-peer-deployment that referenced this pull request Jul 5, 2023
olizilla added a commit to elastic-ipfs/bitswap-peer-deployment that referenced this pull request Jul 5, 2023
Ensure DENYLIST_URL is set in env when deploying bitswap-peer

see: elastic-ipfs/bitswap-peer#215

License: MIT

---------

Signed-off-by: Oli Evans <[email protected]>
olizilla added a commit to elastic-ipfs/bitswap-peer-deployment that referenced this pull request Jul 6, 2023
olizilla added a commit to elastic-ipfs/bitswap-peer-deployment that referenced this pull request Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants