Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acks not written #21

Open
SurfingNerd opened this issue Jun 21, 2021 · 2 comments
Open

Acks not written #21

SurfingNerd opened this issue Jun 21, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@SurfingNerd
Copy link
Collaborator

we experience scenarios, where the node was unable to write it's ACKS.
With the automatic testing it became reproducable.
It happens in the following Setup:

Count of Validators: 4
Testcase: 1 Validator get's shut down and becomes unavailable.
Expected result:

3 / 4 Nodes write their Acks. 
1 / 4 Nodes can't because it is offline.
The System flags the 1 node as unavailable, 
starts a new round for key generations, and this succeeds.

Actual behavior:

The key generation does not success. 3/3 Node manage to write their PART, 
but 0/3 manage to write their ACK.
In the regression test, this results in a never ending epoch,  that waits until 1-3 manage to write it's part.

i have identified the potential cause so far: This code runs into a CallError::ReturnValueInvalid

    let mut acks = Vec::new();
    for v in vmap.keys().sorted() {
        acks.push(
            match part_of_address(&*client, *v, &vmap, &mut synckeygen, BlockId::Latest)? {
                Some(ack) => ack,
                None => return Err(CallError::ReturnValueInvalid),
            },
        );
    }
@SurfingNerd
Copy link
Collaborator Author

the node failes to read it's own part under specific circumstances.

2021-07-15 10:10:45  Verifier #0 TRACE engine  preparing to send PARTS for upcomming epoch: 31
2021-07-15 10:10:45  Verifier #0 TRACE engine  checking for acks...
2021-07-15 10:10:45  Verifier #0 WARN parity_rpc::v1::helpers::engine_signer  Unable to decrypt message: SStore(EthCryptoPublicKey(InvalidMessage))
2021-07-15 10:10:45  Verifier #0 ERROR engine  could not retrieve part for 0x1fe6…ff0c call failed. Error: ReturnValueInvalid
2021-07-15 10:10:45  Verifier #0 ERROR engine  Error sending keygen transactions ReturnValueInvalid
2021-07-15 10:10:45  Verifier #0 TRACE consensus  calling reward function for block 384 isEpochEnd? false on address: 0x2000…0001

@SurfingNerd
Copy link
Collaborator Author

it it the same ?? #30

@SurfingNerd SurfingNerd added the bug Something isn't working label Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant