Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: reject duplicate submissions #5047

Merged
merged 41 commits into from
Jan 15, 2025
Merged

Conversation

rajpatel24
Copy link
Contributor

@rajpatel24 rajpatel24 commented Aug 5, 2024

Summary

Implemented logic to detect and reject duplicate submissions.

Description

We have identified a race condition in the submission processing that causes duplicate submissions with identical UUIDs and XML hashes. This issue is particularly problematic under conditions with multiple remote devices submitting forms simultaneously over unreliable networks.

To address this issue, a PR has been raised with the following proposed changes:

  • Race Condition Resolution: A locking mechanism has been added to prevent the race condition when checking for existing instances and creating new ones. This aims to eliminate duplicate submissions.

  • UUID Enforcement: Submissions without a UUID are now explicitly disallowed. This ensures that every submission is uniquely identifiable and further mitigates the risk of duplicate entries.

  • Introduction of root_uuid:

    • To ensure a consistent submission UUID throughout its lifecycle and prevent duplicate submissions with the same UUID, a new root_uuid column has been added to the Instance model with a unique constraint (root_uuid per xform).

      • If the <meta><rootUuid> is present in the submission XML, it is stored in the root_uuid column.

      • If <meta><rootUuid> is not present, the value from <meta><instanceID> is used instead.

    • This approach guarantees that the root_uuid remains constant across the lifecycle of a submission, providing a reliable identifier for all instances.

  • UUID Handling Improvement: Updated the logic to strip only the uuid: prefix while preserving custom, non-UUID ID schemes (e.g., domain.com:1234). This ensures compliance with the OpenRosa spec and prevents potential ID collisions with custom prefixes.

  • Error Handling:

    • 202 Accepted: Returns when content is identical to an existing submission and successfully processed.
    • 409 Conflict: Returns when a duplicate UUID is detected but with differing content.

These changes should improve the robustness of the submission process and prevent both race conditions and invalid submissions.

Notes

  • Implemented a fix to address the race condition that leads to duplicate submissions with the same UUID and XML hash.
  • Incorporated improvements from existing work, ensuring consistency and robustness in handling concurrent submissions.
  • The fix aims to prevent duplicate submissions, even under high load and unreliable network conditions.

Related issues

Supersedes kobotoolbox/kobocat#876
and kobotoolbox/kobocat#859

@rajpatel24 rajpatel24 changed the base branch from main to beta August 5, 2024 12:43
@noliveleger noliveleger self-requested a review August 5, 2024 13:43
@rajpatel24 rajpatel24 force-pushed the 862-reject_duplicate_submissions branch from 28eefe5 to 039ed95 Compare August 5, 2024 16:20
@rajpatel24 rajpatel24 changed the title 862 Reject Duplicate Submissions Reject Duplicate Submissions Aug 5, 2024
@rajpatel24 rajpatel24 force-pushed the 862-reject_duplicate_submissions branch from 039ed95 to 6a177da Compare August 7, 2024 13:51
@noliveleger noliveleger self-assigned this Aug 21, 2024
@noliveleger noliveleger changed the base branch from beta to kobocat-django-app-part-2-refactor-mock-deployment-backend August 21, 2024 19:47
 # Conflicts:
 #	kobo/apps/openrosa/apps/logger/exceptions.py
 #	kobo/apps/openrosa/apps/logger/models/instance.py
 #	kobo/apps/openrosa/apps/logger/models/xform.py
 #	kobo/apps/openrosa/apps/logger/xform_instance_parser.py
 #	kobo/apps/openrosa/libs/utils/logger_tools.py
@noliveleger noliveleger changed the base branch from kobocat-django-app-part-2-refactor-mock-deployment-backend to beta-refactored August 26, 2024 15:05
@rajpatel24 rajpatel24 force-pushed the 862-reject_duplicate_submissions branch 2 times, most recently from bf597d5 to 2958052 Compare September 3, 2024 10:51
- Add test case for duplicate submission with an attachment
- Improve logic to extract UUID from xml
- Add logic to reject submission without UUID
@rajpatel24 rajpatel24 force-pushed the 862-reject_duplicate_submissions branch from 2958052 to f3c89f6 Compare September 3, 2024 17:29
Copy link

kobo/apps/openrosa/apps/logger/tests/test_parsing.py Outdated Show resolved Hide resolved
kobo/apps/openrosa/apps/logger/tests/test_parsing.py Outdated Show resolved Hide resolved
kobo/apps/openrosa/apps/logger/xform_instance_parser.py Outdated Show resolved Hide resolved
kpi/tests/api/v2/test_api_submissions.py Show resolved Hide resolved
kobo/apps/openrosa/libs/utils/logger_tools.py Outdated Show resolved Hide resolved
kobo/apps/openrosa/libs/utils/logger_tools.py Outdated Show resolved Hide resolved
Comment on lines +222 to +237
int_lock = int.from_bytes(
hashlib.shake_128(
f'{xform_id}!!{submission_uuid}!!{xml_hash}'.encode()
).digest(7), 'little'
)
acquired = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajpatel24 please write a comment to explain why we use this thing.

Copy link
Contributor

@Guitlle Guitlle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. We don't have preview steps, though, should we?

@rajpatel24 rajpatel24 force-pushed the 862-reject_duplicate_submissions branch from 578b2be to b0a4208 Compare January 9, 2025 12:27
@rajpatel24 rajpatel24 requested a review from noliveleger January 9, 2025 13:34
@noliveleger
Copy link
Contributor

The code looks good. We don't have preview steps, though, should we?

Indeed, but this PR predates the new PR workflow.
I'm relying on the the tests to reproduce the "previous steps".

Copy link
Contributor

@noliveleger noliveleger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajpatel24 I'll let you have the honors of merging this PR.

FYI: I pushed one tiny commit for the management command to fix it to avoid another back and forth.

@noliveleger noliveleger changed the title Reject Duplicate Submissions feat!: reject duplicate submissions Jan 13, 2025
@noliveleger noliveleger changed the title feat!: reject duplicate submissions fix!: reject duplicate submissions Jan 13, 2025
@noliveleger noliveleger removed their assignment Jan 13, 2025
@rajpatel24 rajpatel24 merged commit 9598180 into main Jan 15, 2025
4 checks passed
@rajpatel24 rajpatel24 deleted the 862-reject_duplicate_submissions branch January 15, 2025 07:59
magicznyleszek added a commit that referenced this pull request Jan 17, 2025
commit 11ee676
Author: Rebecca Graber <[email protected]>
Date:   Thu Jan 16 08:47:09 2025 -0500

    feat(projectHistoryLogs): log new submissions (#5416)

    ### 📣 Summary
    Create logs when new submissions are added to projects.

    ### 👷 Description for instance maintainers
    Allow null user_uids in AuditLogs so we can log anonymous submissions.

    ### 💭 Notes
    We previously had no need for null users in audit logs because the
    actions we logged were all restricted to authenticated users, but since
    we allow anonymous submissions, we needed a way to log those.

    ### 👀 Preview steps

    Feature/no-change template:
    1. ℹ️ have an account and a project. Make sure the account username is
    not `admin` (see [this notion
    task](https://www.notion.so/kobotoolbox/Anonymous-submissions-dont-work-if-user-named-admin-owns-asset-1767e515f65480608dfcee76ba9b3710?pvs=4))
    2. Deploy the project
    3. Add a submission to the project
    4. Go to `api/v2/asset/<asset-uid>/history`
    5. 🟢 There should be a new project history log with
    `action='add-submission'` and all the usual metadata, plus
    ```
    "submission": {
        "submitted_by": "user1"
    }
    ```
    6. Enable submissions without username/email to the project
    7. To make sure you're submitting anonymously, copy and paste the enketo
    link into a new private tab and add a new submission
    8. 🟢 Reload the endpoint. There should be a new audit log with
    `action='add-submission'`
    a. The user should be
    `http://kf.kobo.local:8080/api/v2/users/AnonymousUser/`
      b. The user_uid will be the uid of the anonymous user in the database
      c. The username should be `AnonymousUser`
    d. The metadata should contain `{"submission": {"submitted_by":
    "AnonymousUser"}` in addition to the usual

commit bbfdaf1
Author: olive-KTB <[email protected]>
Date:   Thu Jan 16 02:49:14 2025 +0100

    update gitlab-ci.yml

commit bc56f8f
Author: Akuukis <[email protected]>
Date:   Wed Jan 15 10:50:39 2025 +0200

    refactor(frontend): Mantine Component Library PoC (#5344)

    ### 💭 Notes

    Please read the PR commit-by-commit, here's a guide.

    1.
    [3105f47](3105f47)
    Setup Mantine Component Library. Please install the new dependencies,
    and add recommended VSCode extensions.
    2.
    [453f2c1](453f2c1)
    Here's a preview that Mantine in general works, and API for the example
    Button is different but generally similar.

    | our button | Mantine default button |
    |--------|--------|
    |
    ![image](https://github.com/user-attachments/assets/1f3e5736-c5eb-4cbf-a56e-a393dab561d3)
    |
    ![image](https://github.com/user-attachments/assets/c8f2a9dc-cf1a-44c3-881a-ff05da433060)
    |

    ![image](https://github.com/user-attachments/assets/e2abe407-3113-4900-9027-43186137ce13)

    3.
    [d782e38](d782e38)
    Example of custom styled component, Button. IMHO achieves pixel-perfect
    match in storybook and example above, except for line breaks, spinner
    and hover animations. Click animation matches out of box. Icon-only
    buttons are omitted because Mantine uses a different component
    `IconAction` for those.

    | Original on left / Mantine implementation on right |
    | --- |
    |
    ![image](https://github.com/user-attachments/assets/9a58b7af-3ff4-4b18-995c-6c57ff4264cb)
    |

    5.
    [eaf45e3](eaf45e3)
    Wrapped Button to add support for inbuilt Tooltip. No idea if we want to
    move forward with these two coupled, but I found it useful for
    comparison by re-implementing part of old Button behavior that's
    represented in storybook.

    | Original | Mantine implementation |
    | --- | --- |
    |
    ![image](https://github.com/user-attachments/assets/7bf2a504-902b-4801-a2dd-3e7648166316)
    |
    ![image](https://github.com/user-attachments/assets/41bc4dff-acf3-4328-ab6f-f7fcccbde20b)
    |

    ### 👀 Preview steps

    1. ℹ️ open Kobo home
    4. 🟢 [on main] notice the original "new" button
    6. 🟢 [on PR] notice the new "new" button with default mantine style,
    only slightly different

    ---------

    Co-authored-by: Leszek Pietrzak <[email protected]>
    Co-authored-by: Leszek <[email protected]>
    Co-authored-by: James Kiger <[email protected]>
    Co-authored-by: Paulo Amorim <[email protected]>
    Co-authored-by: James Kiger <[email protected]>

commit 9598180
Author: Raj Patel <[email protected]>
Date:   Wed Jan 15 13:29:03 2025 +0530

    fix!: reject duplicate submissions (#5047)

    ## Summary
    Implemented logic to detect and reject duplicate submissions.

    ## Description

    We have identified a race condition in the submission processing that
    causes duplicate submissions with identical UUIDs and XML hashes. This
    issue is particularly problematic under conditions with multiple remote
    devices submitting forms simultaneously over unreliable networks.

    To address this issue, a PR has been raised with the following proposed
    changes:

    - Race Condition Resolution: A locking mechanism has been added to
    prevent the race condition when checking for existing instances and
    creating new ones. This aims to eliminate duplicate submissions.

    - UUID Enforcement: Submissions without a UUID are now explicitly
    disallowed. This ensures that every submission is uniquely identifiable
    and further mitigates the risk of duplicate entries.

    - Introduction of `root_uuid`:

    - To ensure a consistent submission UUID throughout its lifecycle and
    prevent duplicate submissions with the same UUID, a new `root_uuid`
    column has been added to the `Instance` model with a unique constraint
    (`root_uuid` per `xform`).

    - If the `<meta><rootUuid>` is present in the submission XML, it is
    stored in the `root_uuid` column.

    - If `<meta><rootUuid>` is not present, the value from
    `<meta><instanceID>` is used instead.

    - This approach guarantees that the `root_uuid` remains constant across
    the lifecycle of a submission, providing a reliable identifier for all
    instances.

    - UUID Handling Improvement: Updated the logic to strip only the `uuid:`
    prefix while preserving custom, non-UUID ID schemes (e.g.,
    domain.com:1234). This ensures compliance with the OpenRosa spec and
    prevents potential ID collisions with custom prefixes.

    - Error Handling:
    - 202 Accepted: Returns when content is identical to an existing
    submission and successfully processed.
    - 409 Conflict: Returns when a duplicate UUID is detected but with
    differing content.

    These changes should improve the robustness of the submission process
    and prevent both race conditions and invalid submissions.

    ## Notes

    - Implemented a fix to address the race condition that leads to
    duplicate submissions with the same UUID and XML hash.
    - Incorporated improvements from existing work, ensuring consistency and
    robustness in handling concurrent submissions.
    - The fix aims to prevent duplicate submissions, even under high load
    and unreliable network conditions.

    ## Related issues

    Supersedes
    [kobotoolbox/kobocat#876](kobotoolbox/kobocat#876)
    and kobotoolbox/kobocat#859

    ---------

    Co-authored-by: Olivier Leger <[email protected]>

commit d22b8b5
Merge: 89bd9b7 b4aa1b7
Author: John N. Milner <[email protected]>
Date:   Tue Jan 14 15:20:49 2025 -0500

    Merge remote-tracking branch 'origin/release/2.024.36'

commit 89bd9b7
Author: Rebecca Graber <[email protected]>
Date:   Tue Jan 14 15:11:37 2025 -0500

    fix(auditLogs): correctly serialize audit logs from deleted users (#5418)

    ### 📣 Summary
    Fixes a 500 error from the various audit log endpoints when there are
    actions by deleted users.

    ### 📖 Description
    Return empty user and username fields in the response if the user was
    deleted after the log was created. This applies to `/api/v2/audit-logs`,
    `api/v2/assets/<uid>/history`, and `api/v2/project-history-logs`.

    ### 💭 Notes
    Small fix in the serializer. Also updates the ProjectHistoryLog
    serializer to inherit from the AuditLogSerializer so we don't have to
    duplicate the method fields.

    ### 👀 Preview steps

    Bug template:
    1. ℹ️ have a super user account and a project
    2. Create a new user (user1) and give them the `Edit Form` permission on
    the project.
    3. Log in as user1 and make an edit to the project.
    4. Log out user1 and log back in as the super user
    5. Delete user1. You can do this from the admin page if you delete the
    user from the User list, then from the Trash Bin.
    6. Go to:
      a. `api/v2/audit-logs`
      b. `api/v2/project-history-logs`
      c. `api/v2/assets/<uid>/history`
    7. 🔴 [on main] All will return a 500 error (`AttributeError: 'NoneType'
    object has no attribute 'username'`)
    8. 🟢 [on PR] The endpoint will return the expected logs. For all user1's
    actions, the user and username fields will be empty. but the user_uid
    should still refer to the old user.

commit b4aa1b7
Author: jnm <[email protected]>
Date:   Tue Jan 14 15:01:18 2025 -0500

    feat: export background-geopoint as GPS field (#5420)

    See kobotoolbox/formpack#327. This change just updates the formpack
    commit hash used by KPI

commit dddd619
Author: Olivier Léger <[email protected]>
Date:   Tue Jan 14 14:10:31 2025 -0500

    fix: catch additional XLSForm validation errors during deployment (#5419)

    ### 📣 Summary
    Enhanced error handling to catch more validation errors in XLSForm
    during deployment.

    ### 📖 Description
    Validation error handling for XLSForm deployment has been enhanced to
    catch a wider range of issues. This prevents the display of a generic
    500 error in the deployment modal and instead returns the explicit error
    message.

    ### Notes
    Supersedes #5417, #5411 and #5403

commit 9189ac9
Author: Olivier Léger <[email protected]>
Date:   Tue Jan 14 09:40:34 2025 -0500

    fix: handle case sensitivity for "Settings" sheet name with explicit error TASK-1353 (#5417)

    ### 📣 Summary
    Improved error handling for case-sensitive "Settings" sheet names.

    ### 📖 Description
    This update addresses an issue where a sheet named "Settings" with
    uppercase or mixed case letters causes unexpected behavior. An explicit
    error message is now raised to alert users of the case sensitivity,
    ensuring they can resolve the issue easily.

commit 8e8d6bb
Author: Rebecca Graber <[email protected]>
Date:   Mon Jan 13 15:02:04 2025 -0500

    test: rename admin user in fixture (#5415)

    ### 💭 Notes
    Developer-facing changes only. Changes the username of the admin user to
    `adminuser` in preparation for disallowing the name `admin` as part of
    https://www.notion.so/kobotoolbox/Anonymous-submissions-dont-work-if-user-named-admin-owns-asset-1767e515f65480608dfcee76ba9b3710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants