-
Notifications
You must be signed in to change notification settings - Fork 25
Towards being able to handle crashing processes. #317
Conversation
@stevana does this resolve the error |
@kderme: no, that error should never happen -- please open a ticket if you see it. |
, reason2 | ||
, env1 <> env2 | ||
) pairs | ||
go hchan (Ok, ExceptionThrown, env) (Pair cmds1 _cmds2 : pairs) = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevana is there any reason to continue execution if there is an ExceptionThrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequential execution also stops if there is an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stop execution for a specific process (or Pid
) if there's an ExceptionThrown
on that process (but continue executing on the other processes). I believe that's what I've implemented here.
This is also why we need your genernalisation to allow execution on more than two processes, so that we can introduce faults (exceptions) without stopping all execution altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I guess it depends on how we want to simulate user usage of the api and since this gives more user option I think it's on the right direction. @stevana do you think this is close to being merged? I see some errors like executeCommands: impossible
and some false positives tests, which I believe will be solved with this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another important thing that this PR adds is the ability to complete a history which contains crashed processes. We do so by appending a user specified response to the end of the history. For example in the case of the memory reference example, if a Write
fails then we can have a history like this:
Process 1: |-- x <- Create --| |---- Read x ---|
Process 2: |---- Write 5 x --...
In this case Read
can return both 0
(the default value of create) or 5
depending on if Write
crashed before or after writing.
To account of this we complete the above history as follows:
Process 1: |-- x <- Create --| |---- Read x ---|
Process 2: |---- Write 5 x -------------------------|
Notice that the Write
is concurrent with the Read
. Now because of the way linearise
works (it tries all possible sequential interleavings of the action calls) it will accept Read
returning both 0
and 5
.
To complete for Write
is easy as it's just an Ack
, and it doesn't really matter what we complete Read
with as it doesn't change the model. If a Create
crashes, we are kind of screwed that's why in this PR I also allowed the pre-condition failed exception to be thrown. No doubt will there be examples for which complete will be harder...
A possible alternative would be to have the user catch all exceptions (due to fault injection) and account of the non-determinism, e.g. have not just a single value for each memory reference, but a set of values. This complicates the model and is a lot more work for the user though.
Yet another possibility might be to have the fault injection be more precise, so that we know if the Write
failed before or after it wrote to the memory. That way we know if it should return 0
or 5
. I'm not sure if having this precision is always possible. It also complicates the model because we need to keep track of exactly what faults are injected and return the correct response in the presence of the fault.
Does this make sense? If not, let me know where I lost you and I can try to explain further.
It's my current understanding of the Linearizability paper and what Jepsen does. It took me a while to get here, and I'm still not certain if I understand things correctly. So I think it would be great if: 1) I could convince you that this approach makes sense, and 2) that we develop more examples that confirm that it indeed works.
d66cb09
to
27e4138
Compare
See #162 for details.