Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

PROD: Race condition error: Prior Document Contact Already Marked as Successful #319

Open
Marc-Petersen opened this issue Feb 27, 2019 · 2 comments
Assignees
Labels
Milestone

Comments

@Marc-Petersen
Copy link

Following is an example of a race condition, which generates an error on the service endpoint:

  1. AMS 27996604 (version 9) is built stating that it is replacing document version 8. This was an update to the Pathway Element (system HAR 3800000009280).

  2. AMS 27996609 (version 10) is built also stating that it is replacing document version 8. This document was for HAR 3800000009279. <--This is a result of the race condition described in an earlier post.

  3. DXC responds to version 10 with a success status in AMS 27997072.

  4. DXC responds to version 9 also with a success status in AMS 27997074.

  5. AMS 28381801 (version 11) is built stating that it is replacing document version 10, which is the most recent successful DXS record contact per Epic.

  6. DXC responds to version 11 (in AMS 28381802) with an INTEGRITY_CHECK error stating that the most recent document version should be version 9 (presumably because the version 9 document was processed by the service endpoint after the version 10 document).

Resolution path:
What we'd expect from the service endpoint in step 4 is that document version 9 should get rejected with an error stating that the correct document to replace is version 10, not version 8 (which has already been replaced by version 10).

If further documentation is needed in form of examples or other, simply post requests.

@TueCN TueCN self-assigned this Feb 27, 2019
@TueCN
Copy link
Contributor

TueCN commented Feb 27, 2019

I am not familiar with the terms AMS and HAR, but I assume the AMS numbers do not have semantic meaning? (I don't see how the numbers are related).

Anyway, you are correct that LPR should respond with an INTEGRITY_CHECK error at step 4.

I have reproduced the issue and I am looking into it.


Severity: My initial analysis seems to suggest that this bug cannot result in data loss.
Due to database transaction isolation, the 2 concurrent updates cannot both successfully commit updates to the same rows. The database will roll back the transaction that commits last (and LPR will return a SOAP fault saying: javax.persistence.OptimisticLockException: Row was updated or deleted by another transaction

This means that the scope of this issue seems to be limited to LPR not being compliant in that it does not fully respect the rules outlined in https://scandihealth.github.io/lpr3-docs/aspects/index.html#documents-and-versioning. It allows non-conflicting appends to not-current version documents when processing document updates concurrently.

We will look into making the service instead return the intended INTEGRITY_CHECK error in this situation.

@TueCN TueCN added the bug label Feb 27, 2019
@TueCN
Copy link
Contributor

TueCN commented Mar 1, 2019

RESOLUTION
LPR now serializes concurrent updates to the same set.
This means that whichever of the concurrent updates that first "locks" the set wins, and the other requests must wait until that update is complete before they can continue.

This means any subsequent update that did not expect the first update (like in this issue) will fail with an INTEGRITY_CHECK RegistryError: PARENT_DOCUMENT_ID_MISMATCH.

@TueCN TueCN added this to the Next Release milestone Mar 1, 2019
@TueCN TueCN assigned Marc-Petersen and unassigned TueCN Mar 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants