You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I understand it, when we submit events to witnesses, we use a round-robin approach that, for N=5 witnesses, works like this:
# first loop - get initial receipts
receipts = []
for each witness:
call witness with existing receipts, and ask for the new receipt
add new receipt to receipts array
# second loop - share other receipts
for each witness except last:
submit receipts from all other witnesses
This results in the following:
metric
count
explanation
requests to witnesses
9
every witness is called twice except the final witness in the first loop, which sees everybody elses receipts on the first call
bandwidth, requests to witnesses
9 * request_overhead + 20 * receipt_size
each witness ends up seeing 4 other receipts
bandwidth, responses from witnesses
9 * response_overhead + 5 * receipt_size
each witness returns one receipt but responds twice
latency
9 * request_response_roundtrip
9 calls made, one after the other
timeout
9 * reasonable_timeout_per_call
cumulative timeout
I suggest we change to the following algorithm:
call all witnesses in parallel and ask for a receipt
wait until toad have responded or until timeout
call all witnesses with all other receipts
If I am analyzing correctly, this would give us the following metrics:
metric
count
explanation
requests to witnesses
10
every witness is called twice
bandwidth, requests to witnesses
10 * request_overhead + 20 * receipt_size
each witness ends up seeing 4 other receipts
bandwidth, responses from witnesses
10 * response_overhead + 5 * receipt_size
each witness returns one receipt but responds twice
latency
2 * request_response_roundtrip
2 phases, each fully parallel
timeout
2 * reasonable_timeout_per_call
cumulative timeout
Is my analysis accurate? If so, we consume almost identical bandwidth (10 vs 9 calls, same amount of data transferred), but our latency is decreased almost by a factor of 5. This would speed up many KERI operations, as far as users perceive things.
A possible downside to this approach is that error handling gets more complicated, and also, the chances of things being in escrow goes up (unless we don't begin phase 2 until all witnesses respond or time out). And of course, running stuff in parallel is a bit more complicated. But making it noticeable faster to talk to witnesses still seems to me like it might be worth the tradeoff.
What do you think?
The text was updated successfully, but these errors were encountered:
Given the added complexity, I suggest it not be a switchover but that the other algorithm be a configuration choice. That way unit tests and simple configurations can continue to the existing algorithm but those deployments that want the new one can choose it. It should be as simple as adding a different Doer for the new algorithm and at config time selecting that Doer instead of the current default.
Feature request description/rationale
As I understand it, when we submit events to witnesses, we use a round-robin approach that, for N=5 witnesses, works like this:
This results in the following:
I suggest we change to the following algorithm:
If I am analyzing correctly, this would give us the following metrics:
Is my analysis accurate? If so, we consume almost identical bandwidth (10 vs 9 calls, same amount of data transferred), but our latency is decreased almost by a factor of 5. This would speed up many KERI operations, as far as users perceive things.
A possible downside to this approach is that error handling gets more complicated, and also, the chances of things being in escrow goes up (unless we don't begin phase 2 until all witnesses respond or time out). And of course, running stuff in parallel is a bit more complicated. But making it noticeable faster to talk to witnesses still seems to me like it might be worth the tradeoff.
What do you think?
The text was updated successfully, but these errors were encountered: