Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize how we submit to witnesses #855

Open
dhh1128 opened this issue Aug 29, 2024 · 2 comments
Open

optimize how we submit to witnesses #855

dhh1128 opened this issue Aug 29, 2024 · 2 comments

Comments

@dhh1128
Copy link
Contributor

dhh1128 commented Aug 29, 2024

Feature request description/rationale

As I understand it, when we submit events to witnesses, we use a round-robin approach that, for N=5 witnesses, works like this:

# first loop - get initial receipts
receipts = []
for each witness:
    call witness with existing receipts, and ask for the new receipt
    add new receipt to receipts array

# second loop - share other receipts
for each witness except last:
    submit receipts from all other witnesses

This results in the following:

metric count explanation
requests to witnesses 9 every witness is called twice except the final witness in the first loop, which sees everybody elses receipts on the first call
bandwidth, requests to witnesses 9 * request_overhead + 20 * receipt_size each witness ends up seeing 4 other receipts
bandwidth, responses from witnesses 9 * response_overhead + 5 * receipt_size each witness returns one receipt but responds twice
latency 9 * request_response_roundtrip 9 calls made, one after the other
timeout 9 * reasonable_timeout_per_call cumulative timeout

I suggest we change to the following algorithm:

call all witnesses in parallel and ask for a receipt
wait until toad have responded or until timeout
call all witnesses with all other receipts

If I am analyzing correctly, this would give us the following metrics:

metric count explanation
requests to witnesses 10 every witness is called twice
bandwidth, requests to witnesses 10 * request_overhead + 20 * receipt_size each witness ends up seeing 4 other receipts
bandwidth, responses from witnesses 10 * response_overhead + 5 * receipt_size each witness returns one receipt but responds twice
latency 2 * request_response_roundtrip 2 phases, each fully parallel
timeout 2 * reasonable_timeout_per_call cumulative timeout

Is my analysis accurate? If so, we consume almost identical bandwidth (10 vs 9 calls, same amount of data transferred), but our latency is decreased almost by a factor of 5. This would speed up many KERI operations, as far as users perceive things.

A possible downside to this approach is that error handling gets more complicated, and also, the chances of things being in escrow goes up (unless we don't begin phase 2 until all witnesses respond or time out). And of course, running stuff in parallel is a bit more complicated. But making it noticeable faster to talk to witnesses still seems to me like it might be worth the tradeoff.

What do you think?

@dhh1128
Copy link
Contributor Author

dhh1128 commented Aug 29, 2024

Tagging @SmithSamuelM since I discussed this verbally with him.

@SmithSamuelM
Copy link
Collaborator

Given the added complexity, I suggest it not be a switchover but that the other algorithm be a configuration choice. That way unit tests and simple configurations can continue to the existing algorithm but those deployments that want the new one can choose it. It should be as simple as adding a different Doer for the new algorithm and at config time selecting that Doer instead of the current default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants