Skip to content
This repository has been archived by the owner on Jun 18, 2021. It is now read-only.

Towards being able to handle crashing processes. #317

Merged
merged 3 commits into from
Jun 19, 2019

Conversation

stevana
Copy link
Collaborator

@stevana stevana commented Jun 11, 2019

See #162 for details.

@kderme
Copy link
Collaborator

kderme commented Jun 12, 2019

@stevana does this resolve the error simplify: impossible, because of the structure of linearise. which sometimes appear when there is some Exception?

@stevana
Copy link
Collaborator Author

stevana commented Jun 13, 2019

@kderme: no, that error should never happen -- please open a ticket if you see it.

, reason2
, env1 <> env2
) pairs
go hchan (Ok, ExceptionThrown, env) (Pair cmds1 _cmds2 : pairs) = do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevana is there any reason to continue execution if there is an ExceptionThrown?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequential execution also stops if there is an exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stop execution for a specific process (or Pid) if there's an ExceptionThrown on that process (but continue executing on the other processes). I believe that's what I've implemented here.

This is also why we need your genernalisation to allow execution on more than two processes, so that we can introduce faults (exceptions) without stopping all execution altogether.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I guess it depends on how we want to simulate user usage of the api and since this gives more user option I think it's on the right direction. @stevana do you think this is close to being merged? I see some errors like executeCommands: impossible and some false positives tests, which I believe will be solved with this pr.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another important thing that this PR adds is the ability to complete a history which contains crashed processes. We do so by appending a user specified response to the end of the history. For example in the case of the memory reference example, if a Write fails then we can have a history like this:

Process 1: |-- x <- Create --|                           |---- Read x ---|
Process 2:                       |---- Write 5 x  --...        

In this case Read can return both 0 (the default value of create) or 5 depending on if Write crashed before or after writing.

To account of this we complete the above history as follows:

Process 1: |-- x <- Create --|                           |---- Read x ---|
Process 2:                       |---- Write 5 x  -------------------------|    

Notice that the Write is concurrent with the Read. Now because of the way linearise works (it tries all possible sequential interleavings of the action calls) it will accept Read returning both 0 and 5.

To complete for Write is easy as it's just an Ack, and it doesn't really matter what we complete Read with as it doesn't change the model. If a Create crashes, we are kind of screwed that's why in this PR I also allowed the pre-condition failed exception to be thrown. No doubt will there be examples for which complete will be harder...

A possible alternative would be to have the user catch all exceptions (due to fault injection) and account of the non-determinism, e.g. have not just a single value for each memory reference, but a set of values. This complicates the model and is a lot more work for the user though.

Yet another possibility might be to have the fault injection be more precise, so that we know if the Write failed before or after it wrote to the memory. That way we know if it should return 0 or 5. I'm not sure if having this precision is always possible. It also complicates the model because we need to keep track of exactly what faults are injected and return the correct response in the presence of the fault.

Does this make sense? If not, let me know where I lost you and I can try to explain further.

It's my current understanding of the Linearizability paper and what Jepsen does. It took me a while to get here, and I'm still not certain if I understand things correctly. So I think it would be great if: 1) I could convince you that this approach makes sense, and 2) that we develop more examples that confirm that it indeed works.

@stevana stevana force-pushed the feat/complete-crashing-history branch from d66cb09 to 27e4138 Compare June 19, 2019 09:19
@stevana stevana merged commit 1bec806 into master Jun 19, 2019
@stevana stevana deleted the feat/complete-crashing-history branch June 19, 2019 12:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants