Skip to content

Commit

Permalink
add autonat v2 spec (#538)
Browse files Browse the repository at this point in the history
  • Loading branch information
sukunrt authored Oct 31, 2024
1 parent 2e881f1 commit acd5c31
Show file tree
Hide file tree
Showing 7 changed files with 345 additions and 3 deletions.
7 changes: 7 additions & 0 deletions autonat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# NAT Discovery <!-- omit in toc -->
> How we detect if we're behind a NAT.

Specifications:
- [autonat v1](autonat-v1.md)
- [autonat v2](autonat-v2.md)
3 changes: 0 additions & 3 deletions autonat/autonat-v1.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
# NAT Discovery <!-- omit in toc -->
> How we detect if we're behind a NAT.
| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|----------------|--------|-----------------|
| 3A | Recommendation | Active | r1, 2023-02-16 |
Expand Down
17 changes: 17 additions & 0 deletions autonat/autonat-v2-amplification-attack-prevention.plantuml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
@startuml
participant Cli
participant Srv

skinparam sequenceMessageAlign center
skinparam defaultFontName monospaced


== Amplification Attack Prevention ==

Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}
Srv -> Cli: [dial] DialDataRequest:{addrIdx: 1, numBytes: 120k}
Cli -> Srv: [dial] {120k bytes}
Srv -> Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd}
Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)}

@enduml
1 change: 1 addition & 0 deletions autonat/autonat-v2-amplification-attack-prevention.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
299 changes: 299 additions & 0 deletions autonat/autonat-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,299 @@
# AutonatV2: spec


| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|--------------------------|--------|-----------------|
| 1A | Working Draft | Active | r2, 2023-04-15 |

Authors: [@sukunrt]

Interest Group: [@marten-seemann], [@marcopolo], [@mxinden]

[@sukunrt]: https://github.com/sukunrt
[@marten-seemann]: https://github.com/marten-seemann
[@mxinden]: https://github.com/mxinden
[@marcopolo]: https://github.com/marcopolo


## Overview

A priori, a node cannot know if it is behind a NAT / firewall or if it is
publicly reachable. Moreover, the node may be publicly reachable on some of its
addresses and not on others. Knowing reachability for its addresses is essential
for the node to be well-behaved in the network: A node doesn't need to advertise
its unreachable addresses to the rest of the network, preventing superfluous
dials from other peers. Furthermore, in case it has no publicly reachable
addresses, it might actively seek to improve its connectivity by finding a relay
server, which would allow other peers to establish a relayed connection.

In `autonat v2` client sends a request with a priority ordered list of addresses
and a nonce. On receiving this request the server dials the first address in the
list that it is capable of dialing and provides the nonce. Upon completion of
the dial, the server responds to the client with the response containing the
dial outcome.

As the server dials _exactly_ one address from the list, `autonat v2` allows
nodes to determine reachability for individual addresses. Using `autonat v2`
nodes can build an address pipeline where they can test individual addresses
discovered by different sources like identify, upnp mappings, circuit addresses
etc for reachability. Having a priority ordered list of addresses provides the
ability to verify low priority addresses. Implementations can generate low
priority address guesses and add them to requests for high priority addresses as
a nice to have. This is especially helpful when introducing a new transport.
Initially, such a transport will not be widely supported in the network.
Requests for verifying such addresses can be reused to get information about
other addresses

The client can verify the server did successfully dial an address of the same
transport as it reported in the response by checking the local address of the
connection on which the nonce was received on.

Compared to `autonat v1` there are three major differences
1. `autonat v1` allowed testing reachability for the node. `autonat v2` allows
testing reachability for an individual address.
2. `autonat v2` provides a mechanism for nodes to verify whether the peer
actually successfully dialled an address.
3. `autonat v2` provides a mechanism for nodes to dial an IP address different
from the requesting node's observed IP address without risking amplification
attacks. `autonat v1` disallowed such dials to prevent amplification attacks.


## AutoNAT V2 Protocol

![Autonat V2 Interaction](autonat-v2.svg)

A client node wishing to determine reachability of its addresses sends a
`DialRequest` message to a server on a stream with protocol ID
`/libp2p/autonat/2/dial-request`. Each `DialRequest` is sent on a new stream.

This `DialRequest` message has a list of addresses and a fixed64 `nonce`. The
list is ordered in descending order of priority for verification. AutoNAT V2 is
primarily for testing reachability on Public Internet. Client SHOULD NOT send any
private address as defined in [RFC
1918](https://datatracker.ietf.org/doc/html/rfc1918#section-3) in the list. The Server SHOULD NOT dial any private address.

Upon receiving this request, the server selects an address from the list to
dial. The server SHOULD use the first address it is willing to dial. The server
MUST NOT dial any address other than this one. If this selected address has an
IP address different from the requesting node's observed IP address, server
initiates the Amplification attack prevention mechanism (see [Amplification
Attack Prevention](#amplification-attack-prevention) ). On completion, the
server proceeds to the next step. If the selected address has the same IP
address as the client's observed IP address, server proceeds to the next step
skipping Amplification Attack Prevention steps.

The server dials the selected address, opens a stream with Protocol ID
`/libp2p/autonat/2/dial-back` and sends a `DialBack` message with the nonce
received in the request. The client on receiving this message replies with
a `DialBackResponse` message with the status set to `OK`. The client MUST
close this stream after sending the response. The dial back response provides
the server assurance that the message was delivered so that it can close the
connection.

Upon completion of the dial back, the server sends a `DialResponse` message to
the client node on the `/libp2p/autonat/2/dial-request` stream. The response
contains `addrIdx`, the index of the address the server selected to dial and
`DialStatus`, a dial status indicating the outcome of the dial back. The
`DialStatus` for an address is set according to [Requirements for
DialStatus](#requirements-for-dialstatus). The response also contains an
appropriate `ResponseStatus` set according to [Requirements For
ResponseStatus](#requirements-for-responsestatus).

The client MUST check that the nonce received in the `DialBack` is the same as
the nonce it sent in the `DialRequest`. If the nonce is different, it MUST
discard this response.

The server MUST close the stream after sending the response. The client MUST
close the stream after receiving the response.


### Requirements for DialStatus

On receiving a `DialRequest`, the server first selects an address that it will
dial.

If server chooses to not dial any of the requested addresses, `ResponseStatus`
is set to `E_DIAL_REFUSED`. The fields `addrIdx` and `DialStatus` are
meaningless in this case. See [Requirements For
ResponseStatus](#requirements-for-responsestatus).

If the server selects an address for dialing, `addrIdx` is set to the
index(zero-based) of the address on the list and the `DialStatus` is set
according to the following consideration:

If the server was unable to connect to the client on the selected address,
`DialStatus` is set to `E_DIAL_ERROR`, indicating the selected address is not
publicly reachable.

If the server was able to connect to the client on the selected address, but an
error occured while sending an nonce on the `/libp2p/autonat/2/dial-back`
stream, `DialStatus` is set to `E_DIAL_BACK_ERROR`. This might happen in case of
resource limited situations on client or server, or when either the client or
the server is misconfigured.

If the server was able to connect to the client and successfully send a nonce on
the `/libp2p/autonat/2/dial-back` stream, `DialStatus` is set to `OK`.

### Requirements for ResponseStatus

The `ResponseStatus` sent by the server in the `DialResponse` message MUST be
set according to the following requirements

`E_REQUEST_REJECTED`: The server didn't serve the request because of rate
limiting, resource limit reached or blacklisting.

`E_DIAL_REFUSED`: The server didn't dial back any address because it was
incapable of dialing or unwilling to dial any of the requested addresses.

`E_INTERNAL_ERROR`: Error not classified within the above error codes occured on
server preventing it from completing the request.

`OK`: the server completed the request successfully. A request is considered
completed successfully when the server either completes a dial(successfully or
unsuccessfully) or rejects all addresses in the request as undialable.

Implementations MUST discard responses with status codes they do not understand.

### Amplification Attack Prevention

![Interaction](autonat-v2-amplification-attack-prevention.svg)

When a client asks a server to dial an address that is not the client's observed
IP address, the server asks the client to send some non trivial amount of bytes
as a cost to dial a different IP address. To make amplification attacks
unattractive, servers SHOULD ask for 30k to 100k bytes. Since most handshakes
cost less than 10k bytes in bandwidth, 30kB is sufficient to make attacks
unattractive.

On receiving a `DialRequest`, the server selects the first address it is capable
of dialing. If this selected address has a IP different from the client's
observed IP, the server sends a `DialDataRequest` message with the selected
address's index(zero-based) and `numBytes` set to a sufficiently large value on
the `/libp2p/autonat/2/dial-request` stream

Upon receiving a `DialDataRequest` message, the client decides whether to accept
or reject the cost of dial. If the client rejects the cost, the client resets
the stream and the `DialRequest` is considered aborted. If the client accepts
the cost, the client starts transferring `numBytes` bytes to the server. The
client transfers these bytes wrapped in `DialDataResponse` protobufs where the
`data` field in each individual protobuf is limited to 4096 bytes in length.
This allows implementations to use a small buffer for reading and sending the
data. Only the size of the `data` field of `DialDataResponse` protobufs is
counted towards the bytes transferred. Once the server has received at least
numBytes bytes, it proceeds to dial the selected address. Servers SHOULD allow
the last `DialDataResponse` message received from the client to be larger than
the minimum required amount. This allows clients to serialize their
`DialDataResponse` message once and reuse it for all Requests.


If an attacker asks a server to dial a victim node, the only benefit the
attacker gets is forcing the server and the victim to do a cryptographic
handshake which costs some bandwidth and compute. The attacker by itself can do
a lot of handshakes with the victim without spending any compute by using the
same key repeatedly. The only benefit of going via the server to do this attack
is not spending bandwidth required for a handshake. So the prevention mechanism
only focuses on bandwidth costs. There is a minor benefit of bypassing IP
blocklists, but that's made unattractive by the fact that servers may ask 5x
more data than the bandwidth cost of a handshake.

## Implementation Suggestions

For any given address, client implementations SHOULD do the following
- Periodically recheck reachability status.
- Query multiple servers to determine reachability.

The suggested heuristic for implementations is to consider an address reachable
if more than 3 servers report a successful dial and to consider an address
unreachable if more than 3 servers report unsuccessful dials. Implementations
are free to use different heuristics than this one

Servers SHOULD NOT reuse their listening port when making a dial back. In case
the client has reused their listen port when dialing out to the server, not
reusing the listen port for attempts prevents accidental hole punches. Clients
SHOULD only rely on the nonce and not on the peerID for verifying the dial back
as the server is free to use a separate peerID for the dial backs.

Servers SHOULD determine whether they have IPv6 and IPv4 connectivity. IPv4 only servers SHOULD refuse requests for dialing IPv6 addresses and IPv6 only
servers SHOULD refuse requests for dialing IPv4 addresses.


## RPC Messages

All RPC messages sent over a stream are prefixed with the message length in
bytes, encoded as an unsigned variable length integer as defined by the
[multiformats unsigned-varint spec][uvarint-spec].

All RPC messages on stream `/libp2p/autonat/2/dial-request` are of type
`Message`. A `DialRequest` message is sent as a `Message` with the `msg` field
set to `DialRequest`. `DialResponse` and `DialDataRequest` are handled
similarly.

On stream `/libp2p/autonat/2/dial-back`, a `DialAttempt` message is sent
directly

```proto3
message Message {
oneof msg {
DialRequest dialRequest = 1;
DialResponse dialResponse = 2;
DialDataRequest dialDataRequest = 3;
DialDataResponse dialDataResponse = 4;
}
}
message DialRequest {
repeated bytes addrs = 1;
fixed64 nonce = 2;
}
message DialDataRequest {
uint32 addrIdx = 1;
uint64 numBytes = 2;
}
enum DialStatus {
UNUSED = 0;
E_DIAL_ERROR = 100;
E_DIAL_BACK_ERROR = 101;
OK = 200;
}
message DialResponse {
enum ResponseStatus {
E_INTERNAL_ERROR = 0;
E_REQUEST_REJECTED = 100;
E_DIAL_REFUSED = 101;
OK = 200;
}
ResponseStatus status = 1;
uint32 addrIdx = 2;
DialStatus dialStatus = 3;
}
message DialDataResponse {
bytes data = 1;
}
message DialBack {
fixed64 nonce = 1;
}
message DialBackResponse {
enum DialBackStatus {
OK = 0;
}
DialBackStatus status = 1;
}
```

[uvarint-spec]: https://github.com/multiformats/unsigned-varint

20 changes: 20 additions & 0 deletions autonat/autonat-v2.plantuml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
@startuml
participant Cli
participant Srv

skinparam sequenceMessageAlign center
skinparam defaultFontName monospaced


== Dial Request Success==

Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}
Srv -> Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd}
Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)}

== Dial Request Failure==

Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}
Srv ->x Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd}
Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, E_DIAL_ERROR)}
@enduml
Loading

0 comments on commit acd5c31

Please sign in to comment.