Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to check file contents causes node to crash during start up; Public facing issue #25

Open
jenafermiller opened this issue Sep 10, 2015 · 0 comments

Comments

@jenafermiller
Copy link

A public tracking for bug presented in basho/riak_repl#708. To summarize:

In lines 1364-1375 of gen_leader.erl, line 1366 verifies that the file needed to start up the replication leader exists and line 1371 reads the file. However, if the file is empty, corrupted, or otherwise contains bytes that do not form a valid external term format line 1372 will throw an exception. This exception causes the gen_leader start up process to be killed which, in turn, causes the riak_repl process to fail to start, and eventually the entire node is taken down during start up.

To fix this issue, the current version of line 1372 could be replaced with the following line so that any non-integer value or error is ignored. This will prevent Riak from starting when the disk is full, when the data directory is read only, etc. however will not ignore file read errors or file write errors.

Incarn = case catch binary_to_term(Bin) of
            I when is_integer(I) -> I;
            _ -> 1
         end,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant