Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concept docs: nat traversal and relay #24

Merged
merged 9 commits into from
Mar 18, 2019
80 changes: 80 additions & 0 deletions content/concepts/circuit-relay.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
title: Circuit Relay
weight: 3
---

Circuit relay is a [transport protocol](/concepts/transport/) that routes traffic between two peers over a third-party "relay" peer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to start by simply stating what the circuit relay transport is. Something like: The circuit relay transport routes traffic between two peers over a third-party "relay" peer.

In many cases, peers will be unable to [traverse their NAT](/concepts/nat/) in a way that makes them publicly accessible. Or they may not share common [transport protocols](/concepts/transport/) that would allow them to communicate directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we may not need to mention this but... well, there's an issue in libp2p we'll simply reject multiaddrs containing protocols we've never heard of (multiformats/multiaddr#70). This comment is mostly correct but, I figured I'd bring this up.

To enable peer-to-peer architectures in the face of connectivity barriers like NAT, libp2p [defines a protocol called p2p-circuit][spec_relay]. When a peer isn't able to listen on a public address, it can dial out to a relay peer, which will keep a long-lived connection open. Other peers will be able to dial through the relay peer using a `p2p-circuit` address, which will forward traffic to its destination.

The circuit relay protocol is inspired by [TURN](https://tools.ietf.org/html/rfc5766), which is part of the [Interactive Connectivity Establishment](https://tools.ietf.org/html/rfc8445) collection of NAT traversal techniques.

{{% notice "note" %}}
Relay connections are end-to-end encrypted, which means that the peer acting as the relay is unable to read or tamper with any traffic that flows through the connection.
{{% /notice %}}

An important aspect of the relay protocol is that it is not "transparent". In other words, both the source and destination are aware that traffic is being relayed. This is useful, since the destination can see the relay address used to open the connection and can potentially use it to construct a path back to the source. It is also not anonymous - all participants are identified using their peer id, including the relay node.

#### Relay addresses

A relay circuit is identified using a [multiaddr][definition_muiltiaddress] that includes the [peer id](/concepts/peer-id/) of the peer whose traffic is being relayed (the listening peer or "relay target").

Let's say that I have a peer with the peer id `QmAlice`. I want to give out my address to my friend `QmBob`, but I'm behind a NAT that won't let anyone dial me directly.

The most basic `p2p-circuit` address I can construct looks like this:

`/p2p-circuit/p2p/QmAlice`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really avoid telling people to use this. It's useful for demos but isn't practical otherwise.


The address above is interesting, because it doesn't include any [transport](/concepts/transport/) addresses for either the peer we want to contact (`QmAlice`) or for the relay peer that will convey the traffic. Without that information, the only chance a peer has of dialing me is to discover a relay and hope they have a connection to me.

A better address would be something like `/p2p/QmRelay/p2p-circuit/p2p/QmAlice`. This includes the identity of a specific relay peer, `QmRelay`. If a peer already knows how to open a connection to `QmRelay`, they'll be able to reach us.

Better still is to include the transport addresses for the relay peer in the address. Let's say that I've established a connection to a specific relay with the peer id `QmRelay`. They told me via the identify protocol that they're listening for TCP connections on port `55555` at IPv4 address `7.7.7.7`. I can construct an address that describes a path to me through that specific relay over that transport:

`/ip4/7.7.7.7/tcp/55555/p2p/QmRelay/p2p-circuit/p2p/QmAlice`

Everything prior to the `/p2p-circuit/` above is the address of the relay peer, which includes the transport address and their peer id `QmRelay`. After `/p2p-circuit/` is the peer id for my peer at the other end of the line, `QmAlice`.

By giving the full relay path to my friend `QmBob`, they're able to quickly establish a relayed connection without having to "ask around" for a relay that has a route to `QmAlice`.

{{% notice "tip" %}}
When [advertising your address](/concepts/peer-routing/), it's best to provide relay addresses that include the transport address of the relay peer. If the relay has many transport addresses, you can advertise a `p2p-circuit` through each of them.
{{% /notice %}}

#### Autorelay

The circuit relay protocol is only effective if peers can discover willing relay peers that are accessible to both sides of the relayed connection.

While it's possible to simply "hard-code" a list of well-known relays into your application, this adds a point of centralization to your architecture that you may want to avoid. This kind of bootstrap list is also a potential point of failure if the bootstrap nodes become unavailable.

Autorelay is a feature (currently implemented in go-libp2p) that a peer can enable to attempt to discover relay peers using libp2p's [content routing](/concepts/content-routing/) interface.

When Autorelay is enabled, a peer will try to discover one or more public relays and open relayed connections. If successful, the peer will advertise the relay addresses using libp2p's [peer routing](/concepts/peer-routing/) system.

{{% notice "warning" %}}
Autorelay is under active development and should be considered experimental. There are currently no protections against malicious or malfunctioning relays which could advertise relay services and refuse to provide them.
{{% /notice %}}

##### How Autorelay works

The Autorelay service is responsible for:

1. discovering relay nodes around the world,
2. establishing long-lived connections to them, and
3. advertising relay-enabled addresses for ourselves to our peers, thus making ourselves routable through delegated routing.

When [AutoNAT service](/concepts/nat/#autonat) detects we're behind a NAT that blocks inbound connections, Autorelay jumps into action, and the following happens:

1. We locate candidate relays by running a DHT provider search for the `/libp2p/relay` namespace.
2. We select three results at random, and establish a long-lived connection to them (`/libp2p/circuit/relay/0.1.0` protocol). Support for using latency as a selection heuristic will be added soon.
3. We enhance our local address list with our newly acquired relay-enabled multiaddrs, with format: `/ip4/1.2.3.4/tcp/4001/p2p/QmRelay/p2p-circuit`, where:
`1.2.3.4` is the relay's public IP address, `4001` is the libp2p port, and `QmRelay` is the peer ID of the relay.
Elements in the multiaddr can change based on the actual transports at use.
4. We announce our new relay-enabled addresses to the peers we're already connected to via the `IdentifyPush` protocol.

The last step is crucial, as it enables peers to learn our updated addresses, and in turn return them when another peer looks us up.

[spec_relay]: https://github.com/libp2p/specs/tree/master/relay
[definition_muiltiaddress]: /reference/glossary/#mulitaddress
8 changes: 8 additions & 0 deletions content/concepts/content-routing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Content Routing
weight: 5
---

This article is coming soon!

Please [refer to this issue](https://github.com/libp2p/docs/issues/23) to track the progress and make suggestions.
73 changes: 71 additions & 2 deletions content/concepts/nat.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,75 @@ title: NAT Traversal
weight: 2
---

This article is coming soon!
The internet is composed of countless networks, bound together into shared address spaces by foundational [transport protocols](/concepts/transport/).

Please [refer to this issue](https://github.com/libp2p/docs/issues/14) to track the progress and make suggestions.
As traffic moves between network boundaries, it's very common for a process called Network Address Translation to occur. Network Address Translation (NAT) maps an address from one address space to another.

NAT allows many machines to share a single public address, and it is essential for the continued functioning of the IPv4 protocol, which would otherwise be unable to serve the needs of the modern networked population with its 32-bit address space.

For example, when I connect to my home wifi, my computer gets an IPv4 address of `10.0.1.15`. This is part of a range of IP addresses reserved for internal use by private networks. When I make an outgoing connection to a public IP address, the router replaces my internal IP with its own public IP address. When data comes back from the other side, the router will translate back to the internal address.

While NAT is usually transparent for outgoing connections, listening for incoming connections requires some configuration. The router listens on a single public IP address, but any number of machines on the internal network could handle the request. To serve requests, your router must be configured to send certain traffic to a specific machine, usually by mapping one or more TCP or UDP ports from the public IP to an internal one.

While it's usually possible to manually configure routers, not everyone that wants to run a peer-to-peer application or other network service will have the ability to do so.

We want libp2p applications to run everywhere, not just in data centers or on machines with stable public IP addresses. To enable this, here are the main approaches to NAT traversal available in libp2p today.

### Automatic router configuration

Many routers support automatic configuration protocols for port forwarding, most commonly [UPnP][wiki_upnp] or [nat-pmp.][wiki_nat-pmp]

If you router supports one of those protocols, libp2p will attempt to automatically configure a port mapping that will allow it to listen for incoming traffic. This is usually the simplest option if supported by the network and libp2p implementation.

{{% notice "info" %}}
Support for automatic NAT configuration varies by libp2p implementation.
Check the [current implementation status](https://libp2p.io/implementations/#nat-traversal) for details.
{{% /notice %}}

### Hole-punching (STUN)

When an internal machine "dials out" and makes a connection to a public address, the router will map a public port to the internal IP address to use for the connection. In some cases, the router will also accept *incoming* connections on that port and route them to the same internal IP.

libp2p will try to take advantage of this behavior when using IP-backed transports by using the same port for both dialing and listening, using a socket option called [`SO_REUSEPORT`](https://lwn.net/Articles/542629/).

If our peer is in a favorable network environment, they will be able to make an outgoing connection and get a publicly-reachable listening port "for free," but they might never know it. Unfortunately, there's no way for the dialing program to discover what port was assigned to the connection on its own.

However, an external peer can can tell us what address they observed us on. We can then take that address and advertise it to other peers in our [peer routing network](/concepts/peer-routing/) to let them know where to find us.

This basic premise of peers informing each other of their observed addresses is the foundation of [STUN][wiki_stun] (Session Traversal Utilities for NAT), which [describes][rfc_stun] a client / server protocol for discovering publicly reachable IP address and port combinations.

One of libp2p's core protocols is the [identify protocol][spec_identify], which allows one peer to ask another for some identifying information. When sending over their [public key](/concepts/peer-id/) and some other useful information, the peer being identified includes the set of addresses that it has observed for the peer asking the question.

This external discovery mechanism serves the same role as STUN, but without the need for a set of "STUN servers".

The identify protocol allows some peers to communicate across NATs that would otherwise be impenetrable.

### AutoNAT

While the [identify protocol][spec_identify] described above lets peers inform each other about their observed network addresses, not all networks will allow incoming connections on the same port used for dialing out.

Once again, other peers can help us observe our situation, this time by attempting to dial us at our observed addresses. If this succeeds, we can rely on other peers being able to dial us as well and we can start advertising our listen address.

A libp2p protocol called AutoNAT lets peers request dial-backs from peers providing the AutoNAT service.

{{% notice "info" %}}
AutoNAT is currently implemented in go-libp2p via [go-libp2p-autonat](https://github.com/libp2p/go-libp2p-autonat).
{{% /notice %}}


### Circuit Relay (TURN)

In some cases, peers will be unable to traverse their NAT in a way that makes them publicly accessible.

libp2p provides a [Circuit Relay protocol](/concepts/circuit-relay/) that allows peers to communicate indirectly via a helpful intermediary peer.

This serves a similar function to the [TURN protocol](https://tools.ietf.org/html/rfc5766) in other systems.

[wiki_upnp]: https://en.wikipedia.org/wiki/Universal_Plug_and_Play
[wiki_nat-pmp]: https://en.wikipedia.org/wiki/NAT_Port_Mapping_Protocol
[wiki_stun]: https://en.wikipedia.org/wiki/STUN
[rfc_stun]: https://tools.ietf.org/html/rfc3489
[lwn_reuseport]: https://lwn.net/Articles/542629/

<!-- TODO: update identify spec link after PR merge -->
[spec_identify]: https://github.com/libp2p/specs/pull/97
5 changes: 3 additions & 2 deletions content/reference/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,9 @@ Multiplexing (or "muxing"), refers to the process of combining multiple streams

Multiplexing allows peers to offer many [protocols](#protocol) over a single connection, which reduces network overhead and makes [NAT traversal](#nat-traversal) more efficient and effective.

Applications built with libp2p get multiplexing "for free" via the [mplex specification](https://github.com/libp2p/specs/tree/master/mplex).
libp2p supports several implementations of stream multiplexing. The [mplex specification](https://github.com/libp2p/specs/tree/master/mplex) defines a simple protocol with implementations in several languages. Other supported multiplexing protocols include [yamux](https://github.com/hashicorp/yamux) and [spdy](https://www.chromium.org/spdy/spdy-whitepaper).

See [Stream Muxer Implementations](https://libp2p.io/implementations/#stream-muxers) for status of multiplexing across libp2p language implementations.

### multistream

Expand Down Expand Up @@ -175,7 +176,7 @@ Kademlia routing algorithm to efficiently locate peers.

### Peer-to-peer (p2p)

A peer-to-peer (p2p) network is one in which the participants (referred to as [peers][#peer] or [nodes](#node)) communicate with one another directly, on more or less "equal footing". This does not necessarily mean that all peers are identical; some may have different roles in the overall network. However, one of the defining characteristics of a peer-to-peer network is that they do not require a privileged set of "servers" which behave completely differently from their "clients", as is the case in the the predominant [client / server model](#client-server).
A peer-to-peer (p2p) network is one in which the participants (referred to as [peers](#peer) or [nodes](#node)) communicate with one another directly, on more or less "equal footing". This does not necessarily mean that all peers are identical; some may have different roles in the overall network. However, one of the defining characteristics of a peer-to-peer network is that they do not require a privileged set of "servers" which behave completely differently from their "clients", as is the case in the the predominant [client / server model](#client-server).


### Pubsub
Expand Down