Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent duplicate event when anchoring reg or cred in multisigs #271

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rodolfomiranda
Copy link
Collaborator

@rodolfomiranda rodolfomiranda commented Jul 27, 2024

This PR fixes an inconsistency that occurs when a member of a multisig joins an anchoring event (for a registry creation or credential issuance/revocation) and when it's KEL already contains that event, resulting in a new event with the same anchor.
That situation can be easily replicated if the threshold of the multisig is less that the total number of members, and the last member joins with a time delay that allows the propagation of the event.

To fix that problem, the code now checks if the last event already has the anchor event and avoid creating a new event in the KEL.

Copy link

codecov bot commented Jul 27, 2024

Codecov Report

Attention: Patch coverage is 89.47368% with 4 lines in your changes missing coverage. Please review.

Project coverage is 83.84%. Comparing base (850af59) to head (a3e2b21).

Files Patch % Lines
src/keri/app/credentialing.ts 88.88% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #271      +/-   ##
==========================================
+ Coverage   83.82%   83.84%   +0.02%     
==========================================
  Files          48       48              
  Lines        4229     4260      +31     
  Branches     1034     1062      +28     
==========================================
+ Hits         3545     3572      +27     
- Misses        656      684      +28     
+ Partials       28        4      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lenkan
Copy link
Collaborator

lenkan commented Aug 12, 2024

In my opinion, we should try to reduce the amount of API-calls we do in each method. So I am wondering if there is a way to resolve this issue without adding an additional call.

What is the actual scenario here?

  1. Member 1 creates the acdc, issuance and anchoring events.
  2. Member 2 creates the acdc, issuance and anchoring events.
  3. Member 3 does what exactly, also what happens when they do this thing?

I would expect keria to respond with HTTP 400 Bad Request if member 3 tries to create an inconsistent state.

Maybe member 3 can detect that the issuance is already anchored and it only needs to "import" the credential?

@rodolfomiranda
Copy link
Collaborator Author

What is currently happening (as implemented in KERIA and Signify), for example when creating a Registry:

  • member 1 creates an anchoring event for the registry and sends exn to other members
  • member 2 receives exn from member 1, creates anchoring event for the registry and sends exn to other members
  • member 1 receives exn from member 2. Since it's the designated member, it verifies that signature threshold was satisfied, sends event+signatures to witnesses, collect witness receipts and distribute them among members. Finally creates the registry in db
  • member 2 receives receipts, verifies threshold and creates the registry in DB.
  • member 3 receives receipts (KERIA) and creates the event in the KEL. Member 3 also receives the exn to join the registry creation (Signify) so it starts creating the anchoring event from its last event in the KEL that already have the anchor, so it ends up creating another event with the exact anchor information but in sn+1. It never receives enough signatures to validate the event and so it never creates the registry. Also its KEL has one event more than the other members.

For KERIA, the event that member3 is trying to create is valid. It would be tricky to make it aware of those edge cases automatically, but not imposible. I added the "check" at the client side for simplicity and also because it gives more control on the decision, but I know it's not the perfect solution. Every time we automate decisions, we'll find edge cases.

Regarding the additional API call, I also agree that we should reduce those, but it this case the client needs to get the latest status of the KEL to proceed correctly. It'd also face an extra call even if KERIA responds with 400 I think.


// check if last event already has the anchor in it
// and avoid creating a new event if it does
const lastEvent = events[events.length - 1];
Copy link
Contributor

@iFergal iFergal Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with registry events but I'm also curious if there's a way to generally prevent duplicate anchoring events in a KEL directly in KERIA, at least with the same signing keys.

The trouble here is if there are many concurrent things happening, and the duplicate event is e.g. not actually the lastEvent but the event before that (events.length - 2 or so on)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this should be handled server side and in a way that avoids any race conditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might actually just be something we can add to keripy directly, maybe Sam or Phil have some ideas

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that something can also be done in the keripy/keria side. Those types of race conditions are always tricky and recover from them is not trivial. There's some code on keripy that assigns one of the members as the lead. And I remember Phil mentioning that they cover for some race conditions, but probably not all cases.

Anyway, the goal of this PR is to solve a specific use case that is that one member of a multisig initiates the creation of a new event and the others join; however one of them joins late, after the event was already completed (thresholds satisfied) with the signatures of the other members. We want this tardy member to create the correct event, not a new one. And we can catch the error on the client side because we know that the same anchor is already on the KEL. There are no race conditions in this use case. One member start the event, and the rest of the members join. This require an extra call, but I think it worth it since prevent other problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would still be cleaner if the client side did the redundant signing while late joining and KERIA just accepts it and doesn't add it to the KEL (as it's already a duplicate).

Doing on the client side here doesn't cover the case I mentioned for events.length - 2 and also doesn't cover the case of the final threshold signature appearing at the same time as the client is signing but after they did this check (which is a race condition).

Albeit we could have it as a stop gap solution perhaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, then again, that would leave an inconsistent KEL locally since ultimately the controller of the KEL is on the client side, hmm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right in both, this PR only covers the use case of events.lenght - 1. And for the race condition, we need something on keripy that I was trying to avoid, or postpone it for later, for simplicity and urgent need.
We do also need a way to recover from the duplication in case it happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the nitty gritty details of it but if it's urgent, perhaps it's OK so long as we open the appropriate issues now and tackle it soon (and not let it fall into the pile of issues :P)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of this discussion, I'm interested to see what @rodolfomiranda and @iFergal think about @lenkan PR solution #286

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lenkan's PR makes sense to me and avoids my race condition concerns here

@lenkan
Copy link
Collaborator

lenkan commented Aug 30, 2024

@rodolfomiranda Sorry for delay here. We are hitting this too. I think we will be able to end up in the same situation anyway if we are unlucky with the timing? What do you think? Is there a way to move this check closer to the DB transaction to avoid race conditions? I am not as familiar as you with the internals of KERIA.

@lenkan
Copy link
Collaborator

lenkan commented Oct 3, 2024

I have had some more time to investigate and create other reproductions of similar issues. At the moment, I think it would be better if these methods accept the sequence number as an argument instead of trying to calculate it. That way you can pass it in from the exn message that you received from the other group participants. Does that make sense? We had a similar discussion about this here: #222 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants