-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specialized nodes #8
Comments
That's a good question, and it's not something I have a great answer for--there are lots of easy ways we could build node specialization, but I'm not sure which way would be easiest and most general for folks going forward. I also know that later on we're going to want some notion of "what datacenter does this node belong to", and that might be communicated in the same way as node roles. One option is to add datacenter and roles fields to the Node roles are tricky because they have semantic meaning not only for the nodes themselves but also for the workload! In your bookkeeper test, you want to have clients only send requests to specific roles. But different algorithms are going to have different ideas about what kinds of roles there are. It'd be weird to, say, hardcode bookkeeper-specific roles into a general-purpose ledger workload. We might consider having two separate-but-overlapping concepts of roles, where maelstrom has some concept of, say, "nodes which can write", "nodes which can read", "nodes which accept no requests", and then the implementation is free to extend that structure from there, but it's sort of... tricky, right? How many of each should you assign? How should they be distributed across datacenters? We need some way to specify that sort of thing, and I'm not sure what the right shape is. So... in light of this uncertainty, I have two possible paths that you might want to follow. One is that if Maelstrom's workloads are working for you as-is, you just... do the proxying. It's like ten lines of code, and that frees you to assign node roles internally, as opposed to having to inform Maelstrom of how to assign and interpret those roles. Note that you can reply to a client's message from ANY node, so it's perfectly valid to wrap up the whole request you receive in a The other option is that you modify Maelstrom to define a workload specifically for testing Bookkeeper, and follow the |
Indeed generalising node roles is not an easy thing to design well. I think for now I will play with using existing workloads and proxying. BookKeeper is after all, log storage, and I can project anything on top of a log, like a KV store. In terms of checkers, there are things like checking that metadata, bookies and clients do not diverge, but any divergence that I care about (data loss and ordering) would be visible from existing checkers. |
I've thought a little more about this--I think that for model-checking purposes, it might be nice to be able to require maelstrom itself as a library, and hook in your own custom workload somewhere. I'm not totally sure what that's going to look like, yet, but I'm keeping it on the back burner. |
I was interesting in modelling Apache BookKeeper, to compare the experience of my modelling it with TLA+. One slight difficulty with BK is that there are 3 node types: metadata_store (zk), bookie (storage node) and bk_client (serving layer where all the consensus logic is). The bk_clients are the only externally visible nodes of the protocol, all requests go to the bk_clients which in turn interact with the metadata_store and bookies.
Currently all nodes are treated the same by Maelstrom, meaning I will need to use proxying of messages, and use node ID to determine node type. When a node initializes I could ensure that the first is a metadata_store, the second two are bk_clients and the rest are bookies. Metadata_store and bookie nodes will need to proxy maelstrom messages to bk_client nodes. But this would mean adding some communication between nodes that does not exist in the real protocol, so each node would always be able to forward messages to the right node.
An alternative is allow node specialization, or at least, mark which nodes form the public API surface of the protocol, in order to avoid this proxying. For my purposes, simply restricting the public API calls to particular nodes would be enough. Using node ID to determine node type is not an issue for me.
The text was updated successfully, but these errors were encountered: