Big nodes, big loads and lots of (overdue) optimizations #8677

rustyrussell · 2025-11-10T06:47:04Z

This is a result of work with large nodes, especially with bookkeeper dealing with 1.6M events at once, which lead to a point release.

This starts by undoing those workarounds, and using synthetic data (eventually, 5M records) to provoke large loads and optimize out the bottlenecks. The final result is much faster, and far lower latency.

I noticed this in the logs: ``` listinvoices: description/bolt11/bolt12 not found ( {"jsonrpc":"2) ``` And we make the same formatting mistake in several places. Signed-off-by: Rusty Russell <[email protected]>

Nobody has hit this yet, but we're about to with our tests. The size of the db is going to be whatever the total size of the tables are; bigger nodes, bigger db. Signed-off-by: Rusty Russell <[email protected]>

… create index. Signed-off-by: Rusty Russell <[email protected]>

This reverts `bookkeeper: only read listchannelmoves 1000 entries at a time.` commit, so we can properly fix the scalability in the coming patches. tests/test_coinmoves.py::test_generate_coinmoves (100,000): Time (from start to end of l2 node): 207 seconds Worst latency: 106 seconds Signed-off-by: Rusty Russell <[email protected]>

… at once." This reverts commit 1dda0c0 so we can test what its like to be flooded with logs again. This benefits from other improvements we've made this release, to handling plugin input (i.e. converting to use common/jsonrpc_io), so this doesn't make much difference. tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3): Time (from start to end of l2 node): 211 seconds Worst latency: 108 seconds Signed-off-by: Rusty Russell <[email protected]>

We start with 100,000 entries. We will scale this to 2M as we fix the O(N^2) bottlenecks. I measure the node time after we modify the db, like so: while guilt push && rm -rf /tmp/ltests* && uv run make -s RUST=0; do RUST=0 VALGRIND=0 TIMEOUT=100 TEST_DEBUG=1 eatmydata uv run pytest -vvv -p no:logging tests/test_coinmoves.py::test_generate_coinmoves > /tmp/`guilt top`-sql 2>&1; done Then analyzed the results with: FILE=/tmp/synthetic-data.patch-sql; START=$(grep 'lightningd-2 .* Server started with public key' $FILE | tail -n1 | cut -d\ -f2 | cut -d. -f1); END=$(grep 'lightningd-2 .* JSON-RPC shutdown' $FILE | tail -n1 | cut -d\ -f2 | cut -d. -f1); echo $(( $(date +%s -d $END) - $(date +%s -d $START) )); grep 'E assert' $FILE; tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3): Time (from start to end of l2 node): 85 seconds Worst latency: 75 seconds Signed-off-by: Rusty Russell <[email protected]>

daywalker90 · 2025-11-10T13:55:51Z

@cdecker what do you think about rustyrussell#16 ?

…rink it. We make a copy, then attach a destructor to the hook in case that plugin exits, so we can NULL it out in the local copy. When we have 300,000 requests pending, this means we have 300,000 destructors, which don't scale (it's a single-linked list). Simply NULL out (rather than shrink) the array in the `plugin_hook`. Then we can keep using that. tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3): Time (from start to end of l2 node): 34 seconds **WAS 85** Worst latency: 24 seconds **WAS 75** Signed-off-by: Rusty Russell <[email protected]>

When we have many commands, this is where we spend all our time, and it's just for an old assertion. tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3): Time (from start to end of l2 node): 13 seconds **WAS 34** Worst latency: 4.0 seconds **WAS 24* Signed-off-by: Rusty Russell <[email protected]>

…quests. If we have USDT compiled in, scanning the array of spans becomes prohibitive if we have really large numbers of requests. In the bookkeeper code, when catching up with 1.6M channel events, this became clear in profiling. Use a hash table instead. Before: tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3): Time (from start to end of l2 node): 269 seconds (vs 14 with HAVE_USDT=0) Worst latency: 4.0 seconds After: tests/test_coinmoves.py::test_generate_coinmoves (100,000, sqlite3): Time (from start to end of l2 node): 14 seconds Worst latency: 4.3 seconds Signed-off-by: Rusty Russell <[email protected]>

If we only have 8 or fewer spans at once (as is the normal case), don't do allocation, which might interfere with tracing. This doesn't change our test_generate_coinmoves() benchmark. Signed-off-by: Rusty Russell <[email protected]>

Now we've rid ourselves of the worst offenders, we can make this a real stress test. We remove plugin io saving and low-level logging, to avoid benchmarking testing artifacts. Here are the results: tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 518 seconds Worst latency: 353 seconds tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, Postgres): Time (from start to end of l2 node): 417 seconds Worst latency: 96.6 seconds Signed-off-by: Rusty Russell <[email protected]>

Profiling shows us spending all our time in tal_arr_remove when dealing with a giant number of output streams. This applies both for RPC output and plugin output. Use linked list instead. tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 239 seconds **WAS 518** Worst latency: 56.9 seconds **WAS 353** Signed-off-by: Rusty Russell <[email protected]>

This potentially saves us some reads (not measurably though), at cost of less fairness. It's important to measure though, because a single large request will increase buffer size for successive requests, so we can see this pattern in real usage. tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 227 seconds (was 239) Worst latency: 62.4 seconds (was 56.9) Signed-off-by: Rusty Russell <[email protected]>

…oks are called. We're going to use this on the "rpc_command" hook, to allow xpay to specify that it only wants to be called on "pay" commands. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]> Changelog-Added: Plugins: the `rpc_command` hook can now specify a "filter" on what commands it is interested in.

…hey want. Signed-off-by: Rusty Russell <[email protected]>

tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 135 seconds **WAS 227** Worst latency: 12.1 seconds **WAS 62.4** Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]> Changelog-Added: pyln-client: optional filters can be given when hooks are registered (for supported hooks)

Changelog-Added: Plugins: "filters" can be specified on the `custommsg` hook to limit what message types the hook will be called for. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

… issues. A client can do this by sending a large request, so this allows us to see what happens if they do that, even though 1MB (2MB buffer) is more than we need. This drives our performance through the floor: see next patch which gets us back on track. tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 271 seconds **WAS 135** Worst latency: 105 seconds **WAS 12.1** Signed-off-by: Rusty Russell <[email protected]>

We would keep parsing if we were out of tokens, even if we had actually finished one object! These are comparison against the "xpay: use filtering on rpc_command so we only get called on "pay"." not the disasterous previous one! tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 126 seconds (was 135) Worst latency: 5.1 seconds **WAS 12.1** Signed-off-by: Rusty Russell <[email protected]>

This rotates through fds explicitly, to avoid unfairness. This doesn't really make a difference until we start using it. Signed-off-by: Rusty Russell <[email protected]>

Now that ccan/io rotates through callbacks, we can call io_always() to yield. Though it doesn't fire on our benchmark, it's a good thing to do. Signed-off-by: Rusty Russell <[email protected]>

Now that ccan/io rotates through callbacks, we can call io_always() to yield. We're now fast enough that this doesn't have any effect on this test, bit it's still good to have. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

…nmoves each time. tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 102 seconds **WAS 126** Worst latency: 4.5 seconds **WAS 5.1** Signed-off-by: Rusty Russell <[email protected]>

We have a reasonable number of commands now, and we *already* keep a strmap for the usage strings. So simply keep the usage and the command in the map, and skip the array. tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 95 seconds (was 102) Worst latency: 4.5 seconds tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, Postgres): Time (from start to end of l2 node): 231 seconds Worst latency: 4.8 seconds Note the values compare against 25.09.2 (Postgres): sqlite3: Time (from start to end of l2 node): 403 seconds Postgres: Time (from start to end of l2 node): 671 seconds Signed-off-by: Rusty Russell <[email protected]>

Now we've found all the issues, the latency spike (4 seconds on my laptop) for querying 2M elements remains. Restore the limited sampling which we reverted, but make it 10,000 now. This doesn't help our worst-case latency, because sql still asks for all 2M entries on first access. We address that next. Signed-off-by: Rusty Russell <[email protected]>

This avoids latency spikes when we ask lightningd to give us 2M entries. tests/test_coinmoves.py::test_generate_coinmoves (2,000,000, sqlite3): Time (from start to end of l2 node): 88 seconds (was 95) Worst latency: 0.028 seconds **WAS 4.5** Signed-off-by: Rusty Russell <[email protected]>

This is slow, but will make sure we find out if we add latency spikes in future. tests/test_coinmoves.py::test_generate_coinmoves (5,000,000, sqlite3): Time (from start to end of l2 node): 236 seconds Worst latency: 0.15 seconds tests/test_coinmoves.py::test_generate_coinmoves (5,000,000, Postgres): Time (from start to end of l2 node): 557 seconds Worst latency: 0.16 seconds Signed-off-by: Rusty Russell <[email protected]> Changelog-Fixed: lightningd: multiple signficant speedups for large nodes, especially preventing "freezes" under exceptionally high load.

To measure the improvement (if any) if we don't actually create empty transactions. Signed-off-by: Rusty Russell <[email protected]>

We always start a transaction before processing, but there are cases where we don't need to. Switch to doing it on-demand. This doesn't make a big difference for sqlite3, but it can for Postgres because of the latency: 12% or so. Every bit helps! 30 runs, min-max(mean+/-stddev): Postgres before: 8.842773-9.769030(9.19531+/-0.21) Postgres after: 8.007967-8.321856(8.14172+/-0.066) sqlite3 before: 7.486042-8.371831(8.15544+/-0.19) sqlite3 after: 7.973411-8.576135(8.3025+/-0.12) Signed-off-by: Rusty Russell <[email protected]>

cdecker

Very nice improvements!

cdecker · 2025-11-11T12:23:44Z

lightningd/jsonrpc.c

+	assert(!streq(cmd->command->name, "xxxxX"));
+	assert(!streq(cmd->usage, "xxxxX"));


Where do these constants come from? Are they used somewhere for testing, otherwise I'd remove these.

cdecker · 2025-11-11T12:27:34Z

db/exec.c

+		return;

 	db_prepare_for_changes(db);
 	ok = db->config->begin_tx_fn(db);


Technically we could save 1/2 RTT by pipelining the transaction start with the first command, i.e., we fire and forget the BEGIN command, and immediately queue the actual query that caused us to init a tx behind it. That saves us the return-path from server for BEGIN. At that point we're splitting hairs though ^^

rustyrussell added 6 commits November 10, 2025 17:10

bookkeeper: fix printing of bad JSON results.

d4e5a30

I noticed this in the logs: ``` listinvoices: description/bolt11/bolt12 not found ( {"jsonrpc":"2) ``` And we make the same formatting mistake in several places. Signed-off-by: Rusty Russell <[email protected]>

plugins/sql: remove size limit.

8b7f351

Nobody has hit this yet, but we're about to with our tests. The size of the db is going to be whatever the total size of the tables are; bigger nodes, bigger db. Signed-off-by: Rusty Russell <[email protected]>

plugins/sql: print times taken to do list comand, populate table, and…

0c7c415

… create index. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell added this to the v25.12 milestone Nov 10, 2025

rustyrussell requested a review from cdecker as a code owner November 10, 2025 06:47

rustyrussell added 21 commits November 11, 2025 10:20

lightningd: support "filters" in plugins manifest to restrict when ho…

f9cadf5

…oks are called. We're going to use this on the "rpc_command" hook, to allow xpay to specify that it only wants to be called on "pay" commands. Signed-off-by: Rusty Russell <[email protected]>

lightningd: add support for filters on "rpc_command" hook.

f80602b

Signed-off-by: Rusty Russell <[email protected]> Changelog-Added: Plugins: the `rpc_command` hook can now specify a "filter" on what commands it is interested in.

libplugin: allow plugins to register optional filters for each hook t…

9cfe0d0

…hey want. Signed-off-by: Rusty Russell <[email protected]>

pyln-client: support hook filters.

c0a5f62

Signed-off-by: Rusty Russell <[email protected]> Changelog-Added: pyln-client: optional filters can be given when hooks are registered (for supported hooks)

lightningd: allow filtering on custommsg hook too.

ed4349b

Changelog-Added: Plugins: "filters" can be specified on the `custommsg` hook to limit what message types the hook will be called for. Signed-off-by: Rusty Russell <[email protected]>

commando, chanbackup: use custommsg hooks.

c5044ef

Signed-off-by: Rusty Russell <[email protected]>

ccan: update to get io_loop fairness.

817367c

This rotates through fds explicitly, to avoid unfairness. This doesn't really make a difference until we start using it. Signed-off-by: Rusty Russell <[email protected]>

lightningd: don't process more than 100 commands from a JSONRPC at once.

13f93e6

Now that ccan/io rotates through callbacks, we can call io_always() to yield. Though it doesn't fire on our benchmark, it's a good thing to do. Signed-off-by: Rusty Russell <[email protected]>

plugins/sql: use modern data style, not globals.

0dc876c

Signed-off-by: Rusty Russell <[email protected]>

sql: if we use dev-sqlfilename, don't bother syncing it to disk.

cfb27fb

Signed-off-by: Rusty Russell <[email protected]>

rustyrussell added 7 commits November 11, 2025 10:24

pytest: test for 1M JSONRPC calls which don't need transactions.

41f254d

To measure the improvement (if any) if we don't actually create empty transactions. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the guilt/bkpr-post-migration branch from 9bb2e61 to aec622c Compare November 10, 2025 23:55

cdecker approved these changes Nov 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Big nodes, big loads and lots of (overdue) optimizations #8677

Big nodes, big loads and lots of (overdue) optimizations #8677

rustyrussell commented Nov 10, 2025

Uh oh!

daywalker90 commented Nov 10, 2025

Uh oh!

cdecker left a comment

Uh oh!

cdecker Nov 11, 2025

Uh oh!

cdecker Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		assert(!streq(cmd->command->name, "xxxxX"));
		assert(!streq(cmd->usage, "xxxxX"));

Big nodes, big loads and lots of (overdue) optimizations #8677

Are you sure you want to change the base?

Big nodes, big loads and lots of (overdue) optimizations #8677

Conversation

rustyrussell commented Nov 10, 2025

Uh oh!

daywalker90 commented Nov 10, 2025

Uh oh!

cdecker left a comment

Choose a reason for hiding this comment

Uh oh!

cdecker Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

cdecker Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants