optimize how we submit to witnesses #855

dhh1128 · 2024-08-29T20:44:13Z

Feature request description/rationale

As I understand it, when we submit events to witnesses, we use a round-robin approach that, for N=5 witnesses, works like this:

# first loop - get initial receipts
receipts = []
for each witness:
    call witness with existing receipts, and ask for the new receipt
    add new receipt to receipts array

# second loop - share other receipts
for each witness except last:
    submit receipts from all other witnesses

This results in the following:

metric	count	explanation
requests to witnesses	9	every witness is called twice except the final witness in the first loop, which sees everybody elses receipts on the first call
bandwidth, requests to witnesses	9 * request_overhead + 20 * receipt_size	each witness ends up seeing 4 other receipts
bandwidth, responses from witnesses	9 * response_overhead + 5 * receipt_size	each witness returns one receipt but responds twice
latency	9 * request_response_roundtrip	9 calls made, one after the other
timeout	9 * reasonable_timeout_per_call	cumulative timeout

I suggest we change to the following algorithm:

call all witnesses in parallel and ask for a receipt
wait until toad have responded or until timeout
call all witnesses with all other receipts

If I am analyzing correctly, this would give us the following metrics:

metric	count	explanation
requests to witnesses	10	every witness is called twice
bandwidth, requests to witnesses	10 * request_overhead + 20 * receipt_size	each witness ends up seeing 4 other receipts
bandwidth, responses from witnesses	10 * response_overhead + 5 * receipt_size	each witness returns one receipt but responds twice
latency	2 * request_response_roundtrip	2 phases, each fully parallel
timeout	2 * reasonable_timeout_per_call	cumulative timeout

Is my analysis accurate? If so, we consume almost identical bandwidth (10 vs 9 calls, same amount of data transferred), but our latency is decreased almost by a factor of 5. This would speed up many KERI operations, as far as users perceive things.

A possible downside to this approach is that error handling gets more complicated, and also, the chances of things being in escrow goes up (unless we don't begin phase 2 until all witnesses respond or time out). And of course, running stuff in parallel is a bit more complicated. But making it noticeable faster to talk to witnesses still seems to me like it might be worth the tradeoff.

What do you think?

The text was updated successfully, but these errors were encountered:

dhh1128 · 2024-08-29T20:46:09Z

Tagging @SmithSamuelM since I discussed this verbally with him.

SmithSamuelM · 2024-08-29T22:19:41Z

Given the added complexity, I suggest it not be a switchover but that the other algorithm be a configuration choice. That way unit tests and simple configurations can continue to the existing algorithm but those deployments that want the new one can choose it. It should be as simple as adding a different Doer for the new algorithm and at config time selecting that Doer instead of the current default.

dhh1128 added the feature request label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize how we submit to witnesses #855

optimize how we submit to witnesses #855

dhh1128 commented Aug 29, 2024 •

edited

Loading

dhh1128 commented Aug 29, 2024

SmithSamuelM commented Aug 29, 2024

optimize how we submit to witnesses #855

optimize how we submit to witnesses #855

Comments

dhh1128 commented Aug 29, 2024 • edited Loading

Feature request description/rationale

dhh1128 commented Aug 29, 2024

SmithSamuelM commented Aug 29, 2024

dhh1128 commented Aug 29, 2024 •

edited

Loading