disable in-packet parallel processing for gossip #4838

alexpyattaev · 2025-02-06T22:04:19Z

Problem

Verifying signatures for individual CRDS values within a packet is not very useful as we do not have unlimited CPU cores, and there is already parallel processing on a per-packet level.

Summary of Changes

Disable parallel processing within a gossip packet

alexpyattaev · 2025-02-06T22:16:50Z

@behzadnouri do you think it is a good idea? It was discussed on discord in context of "why would we use threads if someone sends a packet full of bogus CRDS values", i.e. it would be cheaper to verify them serially and exit on first failure.

behzadnouri · 2025-02-06T22:23:46Z

@behzadnouri do you think it is a good idea? It was discussed on discord in context of "why would we use threads if someone sends a packet full of bogus CRDS values", i.e. it would be cheaper to verify them serially and exit on first failure.

When doing #4259 I actually tested removing inner par_iter on a staked testnet node, and there was a slight regression. So I left it as is.

alexpyattaev · 2025-02-07T08:13:15Z

What kind of regression did you observe? i.e. which metrics degraded?

behzadnouri · 2025-02-07T14:15:37Z

What kind of regression did you observe? i.e. which metrics degraded?

https://github.com/anza-xyz/agave/blob/f36a62dce/gossip/src/cluster_info.rs#L2288

"why would we use threads if someone sends a packet full of bogus CRDS values", i.e. it would be cheaper to verify them serially and exit on first failure.

I think the counter argument is that, if we verify CrdsValue's serially the spammer can easily send a Vec<CrdsValue> which all sigverify except for the very last one. So with serial verification will still sigverify more CrdsValues, not fewer.

alexpyattaev · 2025-02-07T18:43:20Z

That is true. In the end, however, we are getting > 20K packets per second, there is obviously enough of them to make it irrelevant to parallelize within a packet. And parallelizing unnecessarily is pure overhead.

behzadnouri · 2025-02-07T19:30:12Z

we are getting > 20K packets per second, there is obviously enough of them to make it irrelevant to parallelize within a packet.

But the run_socket_consume function runs a lot more often than just once every second.
So it does not necessarily have a lot of packets to process each time it runs.

And parallelizing unnecessarily is pure overhead.

That was my thinking too, and like I said I tried removing the inner par_iter in #4259 too for that exact same reason. But the measurements showed a slight regression.
So I intuitively agree with you, but I guess this is one of those cases that test results do not match intuition.

alexpyattaev · 2025-02-07T22:36:40Z

This is the distribution of the number of packets that are available for processing (MNB data, a few huge spikes filtered out).

So vast majority of the time we are dealing with less than 100 packets in that queue.

Beyond that, we are looking at maybe 35 packets median value, 40 packets on average, and 100000 max. This is enough to spread across the 8 threads that gossip currently uses for sigverify, with several packets per thread. Under low load, maybe we could win a few microseconds of latency, but under high load making code simpler and working in larger units should be benefitial. I'll have a working gossip traffic generator done soon enough so I'll be able to test what happens under high load.

behzadnouri · 2025-02-08T19:23:37Z

This is the distribution of the number of packets that are available for processing (MNB data, a few huge spikes filtered out).
So vast majority of the time we are dealing with less than 100 packets in that queue.

The distribution depends a lot on the node's stake.
There also seems to be a current issue with unstaked nodes not getting enough push messages. So they receive tons of pull responses every so often when they send out a pull request. So that causes a lot more spiky traffic.

Under low load, maybe we could win a few microseconds of latency, but under high load making code simpler and working in larger units should be benefitial.

I repeated the test again on a staked node on testnet.
Like before, if we remove inner par_iter, there is 5%+ regression in verify_gossip_packets_time.
The left side is master, the right is this code.

disable in-packet parallel processing

9ea6d4d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable in-packet parallel processing for gossip #4838

disable in-packet parallel processing for gossip #4838

alexpyattaev commented Feb 6, 2025

alexpyattaev commented Feb 6, 2025

behzadnouri commented Feb 6, 2025

alexpyattaev commented Feb 7, 2025

behzadnouri commented Feb 7, 2025 •

edited

Loading

alexpyattaev commented Feb 7, 2025

behzadnouri commented Feb 7, 2025

alexpyattaev commented Feb 7, 2025

behzadnouri commented Feb 8, 2025 •

edited

Loading

disable in-packet parallel processing for gossip #4838

Are you sure you want to change the base?

disable in-packet parallel processing for gossip #4838

Conversation

alexpyattaev commented Feb 6, 2025

Problem

Summary of Changes

alexpyattaev commented Feb 6, 2025

behzadnouri commented Feb 6, 2025

alexpyattaev commented Feb 7, 2025

behzadnouri commented Feb 7, 2025 • edited Loading

alexpyattaev commented Feb 7, 2025

behzadnouri commented Feb 7, 2025

alexpyattaev commented Feb 7, 2025

behzadnouri commented Feb 8, 2025 • edited Loading

behzadnouri commented Feb 7, 2025 •

edited

Loading

behzadnouri commented Feb 8, 2025 •

edited

Loading