Skip to content

Commit c9587ab

Browse files
committed
connectd: don't start connecting in parallel in peer_conn_closed.
The crash below from @zerofeerouting left me confused. The invalid value in fmt_wireaddr_internal is a telltale sign of use-after-free. This backtrace shows us destroying the conn *twice*: what's happening? Well, tal carefully protects against destroying twice: it's not that unusual to free something in a destructor which has already been freed. So this indicates that there are *two* io_conn hanging off one struct connecting, which isn't supposed to happen! We deliberately call try_connect_one_addr() initially, then inside the io_conn destructor. But due to races in connectd vs lightningd connection state, we added a fix which allows a connect command to sit around while the peer is cleaning up (6cc9f37) and get fired off when it's done. But what if, in the chaos, we are already connecting again? Now we'll end up with *two* connections. Fortunately, we have a `conn` pointer inside struct connecting, which (with a bit of additional care) we can ensure is only non-NULL while we're actually trying to connect. This lets us check that before firing off a new connection attempt in peer_conn_closed. ``` lightning_connectd: FATAL SIGNAL 6 (version v0.11.2rc2-2-g8f7e939) 0x5614a4915ae8 send_backtrace common/daemon.c:33 0x5614a4915b72 crashdump common/daemon.c:46 0x7ffa14fcd72f ??? ???:0 0x7ffa14dc87bb ??? ???:0 0x7ffa14db3534 ??? ???:0 0x5614a491fc71 fmt_wireaddr_internal common/wireaddr.c:255 0x5614a491fc7a fmt_wireaddr_internal_ common/wireaddr.c:257 0x5614a491ea6b type_to_string_ common/type_to_string.c:32 0x5614a490beaa destroy_io_conn connectd/connectd.c:754 0x5614a494a2f1 destroy_conn ccan/ccan/io/poll.c:246 0x5614a494a313 destroy_conn_close_fd ccan/ccan/io/poll.c:252 0x5614a4953804 notify ccan/ccan/tal/tal.c:240 0x5614a49538d6 del_tree ccan/ccan/tal/tal.c:402 0x5614a4953928 del_tree ccan/ccan/tal/tal.c:412 0x5614a4953e07 tal_free ccan/ccan/tal/tal.c:486 0x5614a4908b7a try_connect_one_addr connectd/connectd.c:870 0x5614a490bef1 destroy_io_conn connectd/connectd.c:759 0x5614a494a2f1 destroy_conn ccan/ccan/io/poll.c:246 0x5614a494a313 destroy_conn_close_fd ccan/ccan/io/poll.c:252 0x5614a4953804 notify ccan/ccan/tal/tal.c:240 0x5614a49538d6 del_tree ccan/ccan/tal/tal.c:402 0x5614a4953e07 tal_free ccan/ccan/tal/tal.c:486 0x5614a4948f08 io_close ccan/ccan/io/io.c:450 0x5614a4948f59 do_plan ccan/ccan/io/io.c:401 0x5614a4948fe1 io_ready ccan/ccan/io/io.c:417 0x5614a494a8e6 io_loop ccan/ccan/io/poll.c:453 0x5614a490c12f main connectd/connectd.c:2164 0x7ffa14db509a ??? ???:0 0x5614a4904e99 ??? ???:0 0xffffffffffffffff ??? ???:0 ``` Signed-off-by: Rusty Russell <[email protected]>
1 parent 8f7e939 commit c9587ab

File tree

2 files changed

+8
-4
lines changed

2 files changed

+8
-4
lines changed

CHANGELOG.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7-
## [0.11.2] - 2022-06-20: Simon's Carefully Chosen Release Name III
7+
## [0.11.2] - 2022-06-24: Simon's Carefully Chosen Release Name III
88

99
Regressions since 0.10.2 which could not wait for the 0.12 release,
1010
which especially hurt larger nodes.
@@ -13,6 +13,7 @@ which especially hurt larger nodes.
1313

1414
- Protocol: treat LND "internal error" as warnings, not force close events (like v0.10) ([#5326])
1515
- connectd: no longer occasional crashes when peers reconnect. ([#5300])
16+
- connectd: another crash fix on trying to reconnect to disconnecting peer. ([#5340])
1617
- topology: Under some circumstances we were considering the limits on the wrong direction for a channel ([#5286])
1718
- routing: Fixed an issue where we would exclude the entire channel if either direction was disabled, or we hadn't seen an update yet. ([#5286])
1819
- connectd: large memory usage with many peers fixed. ([#5312])
@@ -25,6 +26,8 @@ which especially hurt larger nodes.
2526
[#5326]: https://github.com/ElementsProject/lightning/pull/5326
2627
[#5328]: https://github.com/ElementsProject/lightning/pull/5328
2728
[#5331]: https://github.com/ElementsProject/lightning/pull/5331
29+
[#5340]: https://github.com/ElementsProject/lightning/pull/5340
30+
2831
[0.11.2]: https://github.com/ElementsProject/lightning/releases/tag/v0.11.2
2932

3033
## [0.11.1] - 2022-05-13: Simon's Carefully Chosen Release Name II

connectd/connectd.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -756,6 +756,7 @@ static void destroy_io_conn(struct io_conn *conn, struct connecting *connect)
756756
&connect->addrs[connect->addrnum]),
757757
connect->connstate, errstr));
758758
connect->addrnum++;
759+
connect->conn = NULL;
759760
try_connect_one_addr(connect);
760761
}
761762

@@ -857,8 +858,7 @@ static void try_connect_one_addr(struct connecting *connect)
857858
struct sockaddr_in6 *sa6;
858859
#endif
859860

860-
/* In case we fail without a connection, make destroy_io_conn happy */
861-
connect->conn = NULL;
861+
assert(!connect->conn);
862862

863863
/* Out of addresses? */
864864
if (connect->addrnum == tal_count(connect->addrs)) {
@@ -1879,6 +1879,7 @@ static void try_connect_peer(struct daemon *daemon,
18791879
connect->seconds_waited = seconds_waited;
18801880
connect->addrhint = tal_steal(connect, addrhint);
18811881
connect->errors = tal_strdup(connect, "");
1882+
connect->conn = NULL;
18821883
list_add_tail(&daemon->connecting, &connect->list);
18831884
tal_add_destructor(connect, destroy_connecting);
18841885

@@ -1931,7 +1932,7 @@ void peer_conn_closed(struct peer *peer)
19311932
tal_free(peer);
19321933

19331934
/* If we wanted to connect to it, but found it was exiting, try again */
1934-
if (connect)
1935+
if (connect && !connect->conn)
19351936
try_connect_one_addr(connect);
19361937
}
19371938

0 commit comments

Comments
 (0)