gossip: bloated network graph, slow to prune

To set the stage, our LSP (running v0.1.3) was getting progressively more and more unhealthy. After investigating some issues with high CPU from gossip, it turns out our `NetworkGraph` was not actually getting pruned--or not quickly enough. The LSP accumulated a 185MiB `NetworkGraph` with 47k nodes and 309k channels, over 3.5x the live network graph size.

For context, we've been using `utxo_lookup: None` for `P2PGossipSync`. I think when the code was written, there was no async `UtxoLookup`. We've been relying on `NetworkGraph::remove_stale_channels_and_tracking` to prune the network graph.

There might also be something degenerate going on when our user nodes pull their graph from the LSP and then proceed to P2PGossipSync with the LSP. Or when the LSP tries to gossip it's ancient, bloated graph with other peers. Turning off user node gossip sync is definitely a priority, since it seems to be causing problems.

Curious what was going on, I pulled the raw network graph off the node and poked around ([lexe-lsp.network_graph.20250912.bin.zip](https://mega.nz/file/qZRjjYLB#WZ7DuVXjGgKiIKvDLrCxxV1V7fDGXuoDp2LdppTWD5s) if you're curious). 250k channels already had their channel updates pruned (i.e., both `one_to_two: None` and `two_to_one: None`), but the `announcement_received_timestamp`'s were still not prunable.

Of these channels with pruned updates but a fresh `announcement_recevied_timestamp`, there was kind of a weird distribution where the most channels last received an announcement around the same time, 8 days ago. Not sure what that's about...

![lexe-lsp.network_graph.prunable_channels_by_last_announcement_recvd.jpeg](https://github.com/user-attachments/assets/f3cc8658-3cad-41c3-9acb-2e1ebf8c166c)

Anyway, desperate to get prod healthy again, I cooked this up: [graph: reduce time-to-prune for chans w/ no recent announce to 5d](https://github.com/lexe-app/rust-lightning/commit/88112b9858d3e8f05b3f15422539f3ae7a63dfae). The LSP is also now forced to prune immediately at startup. The diff is hacky, but it at least got the LSP healthy again. The first prune took quite a while, with later prunes going much faster:

```
Pruned network graph in (161.474806812 s)  nodes=18373 pruned_nodes=28540 channels=57560 pruned_channels=251810
Pruned network graph in  (39.365067000 ms) nodes=18381 pruned_nodes=0     channels=57578 pruned_channels=2
```

If anyone has any thoughts on what might be causing the graph to get so bloated, that would be appreciated. If necessary, we can also impl `UtxoLookup` now that it's async. The 160s prune also looks worth optimizing.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gossip: bloated network graph, slow to prune #4070

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gossip: bloated network graph, slow to prune #4070

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions