Skip to content

Commit

Permalink
fix: sanity/restart.py: don't kick out validators (#10151)
Browse files Browse the repository at this point in the history
The test
[`tests/sanity/restart.py`](https://github.com/jancionear/nearcore/blob/7c0b58fd02a7ecc401736628cc977f6c8cff5fbf/pytest/tests/sanity/restart.py)
is failing because one of the validators gets kicked out. Let's disable
kickout to make the test pass again.

The test sets up a network with two validator nodes and epoch size equal
to 10 blocks. It lets the network run for 20 blocks, then stops both
nodes and starts them again. After the restart it checks that the
blockchain still produces blocks.

The problem is that validators might miss some blocks due to the
restart, and then they're kicked out for it. Once one of the two
validators gets kicked out, the blockchain stops producing blocks. The
kickout threshold was set at 80%, so if a validator misses even one
block out of four, that will fall under the threshold (3/4 = 75%) and
it'll get kicked out. Epoch size is 10, so 4 is a typical number of
blocks that a validator is asked to produce.

Missing a single block shouldn't cause kickout. Usually this isn't a
problem as the epoch size is 43200, not 10, so a few seconds of downtime
don't cause any issues. Here a few seconds of downtime are equal to half
of the epoch.

To fix the issue let's disable kickout in this test. By setting the
kickout thershold to 0 we can ensure that no validator will get kicked
out during the test.

Here are logs from an example failed test run. I cleaned them up a bit
by removing spammy messages:
[test.log](https://github.com/near/nearcore/files/13320226/test.log)
[node0.log](https://github.com/near/nearcore/files/13320231/node0.log)
[node1.log](https://github.com/near/nearcore/files/13320233/node1.log)

In this run the validator `test0` gets kicked out at height 24 because
it produced 3 blocks out of 4:
```
handle{handler="ApplyChunksDoneMessage" actor="ClientActor"}:postprocess_block{height=24}: epoch_manager: All proposals: [], Kickouts: {AccountId("test0"): NotEnoughBlocks { produced: 3, expected: 4 }}, Block Tracker: {1: ValidatorStats { produced: 9, expected: 9 }, 0: ValidatorStats { produced: 3, expected: 4 }}, Shard Tracker: {1: {1: ValidatorStats { produced: 11, expected: 12 }}, 0: {0: ValidatorStats { produced: 11, expected: 12 }}}
```

Then in the next epoch the blockchain is unable to produce the block at
height 34 and the test fails:
```
check_triggers:handle_block_production: client: Cannot produce any block: not enough approvals beyond 34
```

Setting the kickout threshold to 0 makes the test pass.
  • Loading branch information
jancionear authored Nov 10, 2023
1 parent cadf11d commit a32d8a2
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion pytest/tests/sanity/restart.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,12 @@
BLOCKS1 = 20
BLOCKS2 = 40

# The epoch size is 10 and the validators will miss a few blocks during restarts, so the kickout threshold has to be adjusted.
# Otherwise validators will get kicked out and the blockchain will grind to a halt. The threshold is set to 0 to avoid any such problems.
# On production network this isn't a problem because the epoch size is a few orders of magnitude larger.
nodes = start_cluster(
2, 0, 2, None,
[["epoch_length", 10], ["block_producer_kickout_threshold", 80]], {})
[["epoch_length", 10], ["block_producer_kickout_threshold", 0]], {})

started = time.time()

Expand Down

0 comments on commit a32d8a2

Please sign in to comment.