-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sanction/reward peer for connect #105
base: master
Are you sure you want to change the base?
feat: sanction/reward peer for connect #105
Conversation
closes Neptune-Crypto#35 High Level Changes: * add doc-comments describing how sanctioning system works * Identify PeerStanding by SocketAddr instead of IP because peer's can share IP (not a unique ID) * Add Unsanction (reward) capability * Sanction peer for connect failure * Unsanction peer for connect success * Establish min peer_standing score equal to 0 - cli.peer_tolerance A peer is banned when the min score is reached. * Establish max peer_standing score equal to cli.peer_tolerance * A peer can become banned for ConnectFail but after a 5 minute period is eligible to attempt connect again and becomes unbanned if successful. see: networking_state::CONNECT_FAILED_TIMEOUT_SECS Cleanups/Tweaks: * rename punish_peer to sanction_peer * rename PeerStanding::standing to score * impl strum::Display for PeerSanctionReason * move some existing methods into GlobalState, NetworkingState * wrap db iterator with block_in_place() to make it async-friendly Dependencies: * adds direct dep on thiserror; it already existed in our dep-tree New Tests: * can_track_peer_standing_by_port * ban_peer_connect_fail_and_unban_connect_success
About the banning of IPs/ports. It was actually intentional that I put the bans on IP. I thought that it was better to ban a whole IP address in case of shenanigans. I would rather use a big hammer than open up for repeated, costly attacks, like sharing invalid blocks and transactions. In the absence of further info, I lean towards banning whole IPs as I think protecting the network against malicious actors is more important than convenience in integration test environments. What do other blockchains do? The main thing I want to protect against is the sharing of invalid blocks and transactions, consensus-related shenanigans. |
yeah, I figured it was intentional, which is why I said it might be a controversial change. ;-) I wrestled with it quite a bit myself. You did not comment on how it impacts running/testing multiple nodes on localhost. If we go back to storing PeerStanding by IP, then about the only way to test standing behavior on localhost would be to use 127.0.0.2, 127.0.0.3, etc. Which people are not used to. So it would need very clear documentation. It's an interesting question what other cryptocurrencies do. I will check into bitcoin and monero's behavior. |
Here is something probably tangential but also probably worth thinking about: using proof-of-work and/or verifiable delay functions as an alternative to or as a soft version of banning. A node could have a policy of accepting connection requests from peers but only if they can prove to have devoted more than x amount of energy or time to that connection request. Suspicious IP range? High requirement. Benign connection failure with an otherwise impeccable standing? Low requirement. |
I looked into bitcoin's behavior a bit. Here's a summary of my understanding:
I like that they store a timestamp for unbanning and have a bantime setting. I think we should do the same. But that can be in a future PR. I haven't checked into monero's behavior yet. It might be interesting to check Eth's behavior also. |
The big picture
At a high-level this PR enables the following scenario:
A
andA
's standing score improvesA
goes offline and our connect attempts fail. Each failed attempt worsensA
standing score until it reaches the minimum score,0 - cli.peer_tolerance
and is banned.A
's standing improves, resulting inA
becoming unbanned, but still with a low score. A single failed connect at this point would getA
banned again.A
again in the future,A
's standing score improves. EventuallyA
's score reaches the maximum0 + cli.peer_tolerance
and is capped.To achieve this, some changes were made to the existing Peer Standing/Sanction system:
Uniquely Identifying Peers
An issue arose early in dev/testing with multiple peers on the same machine, invoked with the
run-multiple-instances.sh
script. Sanctions for all 3 peers were being stored in the same DB record, identified only by IP, which in this case is127.0.0.1
. The Database type was defined as:This equates an IP with a peer, however that doesn't reflect reality, where it is clearly possible to run nodes on different ports, and those nodes could be running different code, different network configs, etc.
As such, I modified to:
and adapted APIs to match. So
PeerStanding
are identified bySocketAddr,
ieIP
+port
.I added a test case
can_track_peer_standing_by_port
that verifiesPeerStanding
can be independently tracked for Peers on the same IP. This test would fail on the older code, if ever adapted for it.note: the node operator can still use the cli
--ban
arg to ban entire IP(s) without specifying ports.I realize this change might be controversial. It can be reverted if necessary, but I feel it is "more correct" and has great utility in enabling PeerStanding to be correctly tracked for multiple nodes running on localhost with different ports.
Open Qs:
How long should we wait after a peer is banned before we can attempt to connect again? Presently it is set to 5 minutes, which may be too short, since our check-for-peers interval is 2 minutes. Perhaps 1 hour? But the other consideration, is that when a node goes offline, it will shortly become banned for ConnectFailed, and when it comes back online, we'd like to connect to it without too much delay. There's kind of a tension. So maybe 10 mins, with that in mind?
If a peer is banned and the LatestSanction is NOT a ConnectFailed, then we never try to connect again. This node is effectively perma-banned with no chance to improve. I did this because it seems to mirror existing behavior. However, maybe we should give every banned node an opportunity to redeem itself once in a while.
We don't have to resolve these q's now. Both could easily be addressed in a future PR.
Commit Msg
closes #35
High Level Changes:
PeerStanding
bySocketAddr
instead ofIpAddr
because peer's can share IP (not a unique ID)Cleanups/Tweaks:
Dependencies:
New Tests:
Testing Performed
Implemented test
ban_peer_connect_fail_and_unban_connect_success
which simulates a full sequence of sanctioning, banning, unsanctioning, unbanning and verifies details at each point.ran 3 localhost nodes. procedure:
run-multiple-instances.sh
script.Here is an edited-for-readability log from node 0, that demonstrates the detailed logging:
peer2-readable.log
Problems after rebase to master
The above testing was performed on my branch created from master at 0d414d9 on Jan 25. Today I rebased it to master @ 9fb265c Feb 5. I again attempted to run 3 nodes with
run-multiple-instances.sh
but now the peers are unable to pass eachother blocks and disconnect from eachother. I get a log error "In order to be transferred, a Block must have a non-None proof field". Here's an example:This appears to be caused by the intervening changes to master, unrelated to this PR.