Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add protocol buffers support for messages and gRPC services #1271

Closed
sainad2222 opened this issue Dec 2, 2024 · 11 comments
Closed

Add protocol buffers support for messages and gRPC services #1271

sainad2222 opened this issue Dec 2, 2024 · 11 comments
Assignees

Comments

@sainad2222
Copy link
Contributor

Problem statement
Currently, Openraft does not use Protocol Buffers (proto) messages; instead, it passes Rust structs for internal communication. This necessitates that anyone writing a gRPC network layer must rewrite proto files like the example found in databend's source code

Solution
Openraft can have proto file and generated pb file in the repo along with From and Into trait converters that people can use in writing thier network component
Example from kv-memstore:
Instead of

#[post("/add-learner")]
pub async fn add_learner(app: Data<App>, req: Json<(NodeId, String)>) -> actix_web::Result<impl Responder> {
    let node_id = req.0 .0;
    let node = BasicNode { addr: req.0 .1.clone() };
    let res = app.raft.add_learner(node_id, node, true).await;
    Ok(Json(res))
}

it will look something like this

#[post("/add-learner")]
pub async fn add_learner(app: Data<App>, req: add_learner_request) -> add_learner_response {
    let node_id = req.node_id;
    let node = BasicNode { addr: req.addr.clone() };
    let res = app.raft.add_learner(node_id, node, true).await;
    Ok(res.into())

Scope and tools:
The plan is to utilize prost and tonic for handling proto files and generating Rust code. As I delve deeper into the codebase, I will update this section with more details.

Open questions:

  1. Should protobuf generator be pluggable? Ex: rust-protobuf instead of prost
Copy link

github-actions bot commented Dec 2, 2024

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@sainad2222
Copy link
Contributor Author

/assignme

@schreter
Copy link
Collaborator

schreter commented Dec 2, 2024

The plan is to utilize prost and tonic for handling proto files and generating Rust code.

Though I understand that some people might want to use gRPC, this is way too slow/heavyweight for our use case. Further, the mentioned tools currently pretty much hard-code tokio runtime, to which we painstakingly removed the dependencies from the core.

Please ensure while doing so to put such things into a separate crate not polluting the openraft core itself. It is not a core functionality. But, I assume, this was your plan from the beginning, right? :-)

@sainad2222
Copy link
Contributor Author

I'm unassigning myself from this GitHub issue for now, as I'm unsure whether adding serialization and conversions with an additional layer on top will remain performant, given that the core only works with Rust structs. Furthermore, making changes to the core without introducing breaking changes is proving to be difficult. I've attached my progress so far. Please let me know if there's an alternative approach I could take to achieve this without modifying the core of OpenRaft.
#1272

@sainad2222 sainad2222 removed their assignment Dec 7, 2024
@schreter
Copy link
Collaborator

schreter commented Dec 7, 2024

with an additional layer on top will remain performant, given that the core only works with Rust structs

I didn't really look into your change, but I don't understand this statement.

Adding gRPC to openraft core would certainly make it quite inperformant, since gRPC is way too bloated.

Instead, openraft adopted a possibility to give you almost full control to pass requests in any form you want - including reusable gRPC objects. Just pass your gRPC objects as-is through entries pushed to openraft by defining a thin wrapper satisfying required interfaces - you'll get them back for logging, replication and applying to the state machine.

Similarly, for communication with the remote side, the network traits allow your implementation to cache whatever you like, including connections and gRPC objects to reuse for the next communication (e.g., a gRPC shell for AppendEntries, where you add your gRPC entries). Thus, you have all the possibilities to use the full power of gRPC libraries there.

The only serialization you need to add is for control plane stuff like voting or membership - but that's hardly performance-relevant.

For example, in your change, for each communication you are opening a new connection, formatting some URLs in process. Why not put this connection to a member of NetworkConnection, likely under an Option, so you can lazily connect and reuse properly in case it's killed? In our project, we have exactly that - reusable connection, which is take()n out of the Option member when a request is started and returned back into the Option when the request is done (this is needed, since the request can be aborted by openraft on timeout - then you don't want to reuse a connection which is in an unknown state).

Hope this helps.

Can you elaborate where else do you see an issue?

@sainad2222
Copy link
Contributor Author

In our project, we have exactly that

Could you share the link to your project if it’s open source and you’re comfortable sharing it? I referred to Databend, as it seemed like the only gRPC-based OpenRaft implementation I could find. However, I may have misunderstood how OpenRaft integrates with Databend, given the size and complexity of Databend’s codebase.

That said, I’ll take some time to go through the codebase in more detail and attempt this again. Thank you for your input so far! If possible, I’d really appreciate it if you could leave comments on my PR with suggestions for how I could approach things differently.

@schreter
Copy link
Collaborator

schreter commented Dec 7, 2024

I can't speak for Databend, since I never looked into its codebase. Unfortunately, our project is not open-source, so I can't share the code. Further, I'm not that familiar with gRPC, since we use much leaner zero-copy Cap'n Proto-based serialization in our services, so I can't give you more pointers beyond the obvious - passing references to serialized objects via entries and caching existing connections in the network layer, as I already mentioned above (probably the connection caching will make the biggest difference).

I remember Protobufs (which are behind gRPC) allow keeping materialized instances of serialized objects for later reuse to prevent re-allocating of memory, at least in C++ implementation, but since I'm not familiar with the interfaces, I can't help you much there.

Maybe @drmingdrmer can give you some pointers into Databend's codebase how the serialization and caching is done there.

@drmingdrmer
Copy link
Member

I'm unsure whether adding serialization and conversions with an additional layer on top will remain performant, given that the core only works with Rust structs.

As @schreter mentioned, for most of the types, you do not need to add conversion to use them. But instead, just assign the protobuf generated types to Openraft with declare_raft_types. For example in your code, instead of using all default types with:

openraft::declare_raft_types!(pub TypeConfig);

You should tell Openraft to use the type defined in protobuf directly using:

openraft::declare_raft_types!(
    pub TypeConfig:
        Node  = openraft_proto::protobuf::BasicNode
);

Currently, only the BasicNode type is defined in Protocol Buffers. Similar definitions need to be added for other types including D, R, and Entry.

And in order to allow Openraft to use these protobuf types, these types need to implement corresponding traits as RaftTypeConfig defines. For example the above BaiscNode need to implement trait openraft::node::Node.

There are a few types that can not be configured with declare_raft_types! such as Membership, require manually conversion. Taking Membership as example, the conversion should be implemented when implementing trait RaftEntry for your protobuf defeind Entry:

pub trait RaftEntry<C>: RaftPayload<C> + RaftLogId<C::NodeId>
where
C: RaftTypeConfig,
Self: OptionalSerde + Debug + Display + OptionalSend + OptionalSync,
{
/// Create a new blank log entry.
///
/// The returned instance must return `true` for `Self::is_blank()`.
fn new_blank(log_id: LogId<C::NodeId>) -> Self;
/// Create a new membership log entry.
///
/// The returned instance must return `Some()` for `Self::get_membership()`.
fn new_membership(log_id: LogId<C::NodeId>, m: Membership<C>) -> Self;
}

The other issue with caching a connection in Network is quite straightforward. Refer to databend codebase in:

https://github.com/datafuselabs/databend/blob/8b59fb096c1c684816c70a4c56ad4677384d499f/src/meta/service/src/network.rs#L162

@sainad2222
Copy link
Contributor Author

/assignme

@sainad2222
Copy link
Contributor Author

Hi, felt like we don't need a separate package for grpc inside openraft now that I understood how it works as a library :). So just added an example for grpc. Let me know your thoughts. I'll update readme if everything looks good
We can still move internal_service if we want but I don't see much advantages. Please correct me if I am wrong

@drmingdrmer
Copy link
Member

It looks like an example would be good enough.

Some of the data type in internal_service.proto still need to be defined as protobuf types. Such as AppendEntriesRequest is on the hot path and using message RaftRequestBytes { bytes value = 1; } as the RPC payload results in a double encoding/decoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants