Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending log entries without hearbeat (was:MonadLog vs CLogEntries) #8

Open
qrilka opened this issue Nov 6, 2013 · 17 comments
Open

Comments

@qrilka
Copy link
Contributor

qrilka commented Nov 6, 2013

While thinking about a patch for #7 I've stumbled upon a problem: at the moment I don't really understand current kontiki API.
I.e. I don't understand whether client code should use MonadLog or CLogEntries. The first seems to provide a way to log any Entry appearing on a node. But if CLogEntries does not seem to be uniforrm - it could be received only on Followers and for Leaders some special handling should be used.
Is it a problem with my API understanding or there is some inconsistencies in this API?
BTW haddocks could help here too :)

@NicolasT
Copy link
Owner

NicolasT commented Nov 6, 2013

The MonadLog interface is used by the state machine to access the log as if it's an immutable/pure datastructure somewhere, although in real code this will need IO etc. While the state-machine performs a single transition/step, the log is actually frozen.

Similar to other Commands, modifications to the log could be the result of a state transition, hence the CLogEntries and CTruncateLog Commands.

So:

  • MonadLog is read-only/'pure' access to the log from within the state machine
  • CLogEntries and CTruncateLog are modifications to the log as a result of a state transition
  • When new log entries should be distributed across a cluster, these should be added to the log (and as such returned by the corresponding MonadLog actions), but this is not (and should not be) handled by Kontiki.

The 'udp' demo might give some more insight.

You're right about docs, I should get to #6 asap.

@qrilka
Copy link
Contributor Author

qrilka commented Nov 7, 2013

Actually 'udp' demo seems to be the main source if my current undestanding of kontiki.
And it is quite confusing that new values are being generated inside the Leader itself - it is quite far from Raft intended usage.
And for #7 correction the problem with CLogEntries uniformity (which I describe above) compicate the situation: handleHeartbeatTimeout generates CLogEntries and could generate something like CCommitted but it is code from kontiki library and so Leader can not observe this CCommitted otherwise than to process CSend and take information about commited index out of it. And that seems to be very unintuitive control flow.
The main problem is that every node (including Leader) needs both log entries and commited index for correct Raft use. Log entries are easy as Leader is receiving it (out of somewhere) but commited index is controlled by Raft implementation so Leader have to get it somehow from Raft itself.

@jkozlowski
Copy link
Contributor

I forgot to watch this repo when I forked, so I missed this discussion.

That was my confusion as well that a client request to log an entry does not seem to be encoded in the API, so the leader's log is updated as if under his feet, implicitly by the driver that uses the model.

Also, from my current understanding, there can only be a single transaction in flight really, as there is no notion of "issue a write" and "write completed", so the driver needs to synchronously perform all commands. After reading the paper, I'm not sure if it's the Raft's limitation or the current implementation. I could see that it should be possible to implement.

So how do we proceed? I have this weekend free for some Haskell hacking, I was thinking about testing in the spirit of http://www.haskellforall.com/2013/11/test-stream-programming-using-haskells.html, but if the APIs going to change, I suppose it's a bit pointless...

@NicolasT
Copy link
Owner

NicolasT commented Nov 9, 2013

Regarding CSetCommitIndex: this is issued upon every AppendEntriesRequest received on a follower, or on a leader whenever an AppendEntriesResponse is received (the leader can recalculate the commitindex in there, since that's where it keeps track of the log indices stored on followers). The discussion about this should go into #7 though :-)

@NicolasT
Copy link
Owner

NicolasT commented Nov 9, 2013

Regarding the adding of log entries in the udp demo within handleHeartbeatTimeout: this is obviously on ly for demonstrational purposes, and I agree it's most likely not the right place, adding confusion. I guess this should be handled by some other 'client' thread.

@NicolasT
Copy link
Owner

NicolasT commented Nov 9, 2013

@jkozlowski is right: new entries to be distributed in the log are added to the log on the leader outside this library (and I really want to keep itlike that). Raft is all about distrbuting a (potentially) growing log, but from the protocol perspective this log is static.

Sorry for the multiple comments, working from a tablet, not that easy...

@NicolasT
Copy link
Owner

NicolasT commented Nov 9, 2013

Regarding multiple concurrent transactions: that's certainly possible, and nor the Raft protocol, nor the library design (if #7 is fixed) prohibit this.

Here's the trick: whenever some code appends a transaction to the log, it knows the log index (and can e.g. connect this to some client request). When the driver (in which collaborates with whichever subsystem is manipulating the log) receives a CSetCommitIndex command, it can know which transactions have been distributed correctly (i.e. all transactions with a lower log index).

It is not within the scope of this library to handle this. All Kontiki should provide is a 'model' implementation of the basic Raft consensus algorithm which is all about replication a (most likely growing over time) log/journal.

@jkozlowski
Copy link
Contributor

Yep, that now makes sense, thanks.

@NicolasT
Copy link
Owner

NicolasT commented Nov 9, 2013

@jkozlowski It'd be great to have some testing in line with what @Gabriel439 did there. I don't think the overall API of this library should change too much, except for a new command being added as part of #7, and even without that, the current implementation can be tested partially, even though a commit-index is not exposed (e.g. the master election part of the protocol is fairly distinct from the log).

@qrilka
Copy link
Contributor Author

qrilka commented Nov 9, 2013

@NicolasT, you are right that descussions of CSetCommitIndex should go to #7
And as for your words on udp demo and handleHeartbeatTimeout this code is not in the demo itself and is the part of current kontiki. And I don't see any other way to pass new entries into kontiki so it will take them into account and generate CSend as a result.
Do you see any such way?

@NicolasT
Copy link
Owner

I'm not sure I understand what you want to achieve. You mean you want to trigger a replication manually after extending the log, not waiting for a heartbeat timeout?

@qrilka
Copy link
Contributor Author

qrilka commented Nov 14, 2013

Looking into Raft paper I see:

Once a leader has been elected, it begins servicing
client requests. Each client request contains a command
to be executed by the replicated state machines. The
leader appends the command to its log as a new entry, then
issues AppendEntries RPCs in parallel to each of the other
servers to replicate the entry. 

So client request is supposed be replicated immediately.
But looking into a Go implementation at https://github.com/goraft/raft/blob/master/peer.go#L124 they also send entries only on heartbeat.
But its better to look into original implementation. And there I see append function - https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1549 and that function triggers stateChanged which will result in entries being sent (almost) immediately in peerThreadMain https://github.com/logcabin/logcabin/blob/master/Server/RaftConsensus.cc#L1397

@qrilka
Copy link
Contributor Author

qrilka commented Nov 14, 2013

BTW if looks like this details are not actually about "MonadLog vs CLogEntries" not sure if we should split it into another issue or issue should be renamed. Looks like the second option is better as github does not allow to move comments between issues.

@NicolasT
Copy link
Owner

Aha! I think I know what's going on.

The Raft paper has gone through a couple of revisions (@ongardie might know more). The Kontiki code is based on an early draft version of the paper, and so is @goraft. In that version of the paper, the leader didn't provide write-access to the log, it simply replicated 'some log'. Whilst the newer approach might be easier to explain, I tend to like the old one more (simplifies things a lot!) and think we should stick to that approach (unless there's an important reason not to).

@qrilka
Copy link
Contributor Author

qrilka commented Nov 14, 2013

Maybe you have a link to that draft? (Current paper on https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf also has status of draft)
Commit 51773c3af76c503d8a336cb03aa24ee1b9d7e4f3 in logcabin (which includes append) is from 2012-09-10 i.e. 8 months before kontiki was even started.
Postponing sending to hearbeat could improve throughput but would affect latency negatively.
I don't see much simpification in prohibiting single entry - there should be no problem in appending just a single entry and invoking almost the same code as in handleHeartbeatTimeout (or should it be exactly the same?).
I think that limiting log entries sending only to hearbeat events is to restrictive for a library dealing with Raft

@NicolasT
Copy link
Owner

Agree, adding code to inject/trigger a heartbeat-timeout (or something along those lines) would make sense. But it's strictly speaking not required ;-)

@qrilka
Copy link
Contributor Author

qrilka commented Dec 22, 2013

A the moment I'm not sure if kontiki actually need separate function to implement it as it seems that injecting heartbeats could be enough. I was thinking on implement it outside of kontiki (around actual network protocol the way you have it in udp.hs demo) to see how it will look like. But looks like I have just forgot about it :) Hope to do it when I'll have some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants