Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP: TipSync #75

Open
oed opened this issue Nov 15, 2020 · 3 comments
Open

CIP: TipSync #75

oed opened this issue Nov 15, 2020 · 3 comments
Assignees

Comments

@oed
Copy link
Member

oed commented Nov 15, 2020

cip: 75
title: TipSync
author: Joel Thorstensson (@oed)
discussions-to: https://github.com/ceramicnetwork/CIP/issues/75
status: Idea
category: Standards
type: Core
created: 2020-11-15

Simple Summary

A scalable approach to syncing stream tips in Ceramic using a libp2p protocol and the libp2p DHT.

Abstract

By utilizing the libp2p DHT along with a new libp2p protocol (inspired by bitswap), we design a system that allows a peer to find all other peers that currently pin a given stream and exchange tips with them. This is achieved by each peer telling the network which streams that they pin using the DHT in combination with a protocol for querying tips from any peer is introduced.

Motivation

Currently Ceramic uses a libp2p pubsub topic to publish and query tips of all streams in the network. As the number of queries in the network is expected to be very large it's expected that this approach will soon face scalability issues. This CIP suggest an alternative approach to query stream tips in order to mitigate the issue.

Specification

The TipSync protocol consists of two components: TipExchange itself and TipDiscovery. The former describes a libp2p protocol for querying tips from connected peers, the latter how to discover peers that hold the tip of any iven stream.

TipExchange

TipExchange is a libp2p protocol with the following protocol id:

/ceramic/tipx/1.0.0

The algorithm can be described in two simple steps:

  1. When a Ceramic peer want to query a specific StreamId it sends a want-tip message to all of its peers.
  2. Peers that currently pin the given stream respond with a have-tip message along with the CID of the tip they have.

tipexchange protocol
In the graphic above Peer A sends a want-tip message to Peer B,C,D. Peer B,D has the given stream pinned and thus responds with the tip. In this case they respond with different CIDs (the reason they are out of sync and how that is resolved is out of scope here) and its now up to Peer A to do conflict resolution.

Message formats

The structure of the want-tip and have-tip messages are specified below.

interface StreamQuery {
  stream: string
  paths?: Array<string>
}

interface WantTip {
  typ: 3
  id: string
  streams: Array<StreamQuery>
}

interface TipMap {
  [docid: string]: string
}

interface HaveTip {
  typ: 4
  id: string
  tips: TipMap
}

TipDiscovery

The TipExchange libp2p protocol described above is great for getting the latest tips from already connected peers. However, the given stream might be pinned on a peer which we are not connected to. The TipDiscovery protocol uses the libp2p DHT to find all peers that pin any given stream. The basic idea of the DHT peer lookup is simple. When a Ceramic peer pins a stream it tells the DHT that they provide this stream. They also look up all other nodes that are providers of this stream and query them for the latest tip of the stream.

The libp2p DHT can be used to announce to the network that your node provides content for a given CID. In the ipfs network this is primarily used to signal that you hold the data of the given CID. However, we can create a CID that represents the StreamId of a Ceramic stream and thus have a way signaling which Ceramic peers pin any given stream.

Representing a StreamId in the DHT

To represent the StreamId as a CID we simply use the identity multihash along with the raw multicodec, then simply put the bytes of the StreamId after that. The resulting CID bytes should be constructed like follows:

<CIDv1-multicodec><raw-multicodec><multihash-multicodec><StreamId-length><StreamId-bytes>

0x01 | 0x55 | 0x00 | <StreamId-length> | <StreamId-bytes> 

Providing the document

Use the DHT Provide method to provide the CID representing the StreamId. The timeout option should be set to a reasonably short time interval since there is no way to manually remove the DHT record. A Ceramic peer should republish the DHT record before the timeout ends, given that the stream is still pinned.

Finding providers of the document

Use the DHT FindProvs method to look up peers that provide the given stream. Connect to each (or a subset of) the found peers and send the document lookup query to them.

Querying a stream

The full algorithm for querying a stream would look something like this:

  1. Run the TipSync protocol on currently connected peers
  2. Traverse the DHT to find all peers pinning the stream
  3. Connect to peers as they are found and run TipSync with them

Note that we can't be completely sure that we have the most up to date state of a stream before our peer has connected to and run the TipSync protocol with all peers which pin the given stream. However, it might be reasonable to optimistically respond to a query before that.

Open questions

  • A node that keeps track of a lot of documents would now potentially need to connect to more nodes, however nodes can be more certain that they can find the latest state of documents. Is there a trade off to be made here?
  • When publishing updates to a given document, should nodes just push this update to peers that care about it? Probably keep publishing to the Ceramic pubsub topic for now.

Future work

  • Extend the protocol to stream commit data from peers that have responded with a have-tip message. This could significantly improve performance of Ceramic

Rationale

Rationale goes here.

Backwards Compatibility

This feature needs to be rolled out in stages. First adding DHT publishing and support for responding to queries. The want-tip message over TipSync is however gated behind a feature flag. Once some time has passed, e.g. a month, such that most nodes have upgraded the feature can be turned on in a new release.

Ceramic peers still connect to the Ceramic pubsub topic to publish updates to streams. They can also respond to queries made by older nodes for some period of time until this old query method is completely phased out.

Implementation

No implementation yet.

Security Considerations

To be completely sure that a query results in the latest state of a stream the query protocol must find all peers in the DHT and get the tip from them. Even if one peer is left out that peer may have a more recent update (even if this is unlikely). In order to improve this situation over time it can make sense to consider the TipSync protocol in parallel with the way tips are published on updates. Note however that some tradeoff may be possible where the result of the query is returned before we have a result from all peers that pin the given stream.

Copyright

Copyright and related rights waived via CC0.

@oed oed self-assigned this Nov 15, 2020
@oed oed changed the title DHT Document Lookup CIP: TipSync Oct 11, 2021
@stbrody
Copy link
Contributor

stbrody commented Oct 12, 2021

When a Ceramic peer pins a stream it tells the DHT that they provide this stream. They also look up all other nodes that are providers of this stream and query them for the latest tip of the stream.

This would presumably have to happen at node startup as well, right? So we'd have to iterate over the entire pin store at startup to inform the DHT of what streams we are providing.

The timeout option should be set to a reasonably short time interval since there is no way to manually remove the DHT record. A Ceramic peer should republish the DHT record before the timeout ends, given that the stream is still pinned.

This would also have to be done for every stream in the pinset. Could get expensive if a node has many streams pinned?

@stbrody
Copy link
Contributor

stbrody commented Oct 12, 2021

The want-tip message over TipSync is however gated behind a feature flag. Once some time has passed, e.g. a month, such that most nodes have upgraded the feature can be turned on in a new release.

We might not need to do this. At the cost of some extra bandwidth, we could have a period of time where we simply use both query protocols simultaneously and consider any tips found via either lookup approach. Eventually we could start logging warnings if the pubsub lookup is finding better tips than the libp2p lookup. Once no one is seeing those warnings anymore we can do a release that turns off pubsub lookups by default

@oed
Copy link
Member Author

oed commented Oct 13, 2021

This would presumably have to happen at node startup as well, right? So we'd have to iterate over the entire pin store at startup to inform the DHT of what streams we are providing.

Yes, good point.

This would also have to be done for every stream in the pinset. Could get expensive if a node has many streams pinned?

I believe publishing to the DHT is quite cheap. IPFS peers already do this for all of the CIDs which they pin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants