Skip to content

Commit

Permalink
fix!: reject duplicate submissions (#5047)
Browse files Browse the repository at this point in the history
## Summary
Implemented logic to detect and reject duplicate submissions.

## Description

We have identified a race condition in the submission processing that
causes duplicate submissions with identical UUIDs and XML hashes. This
issue is particularly problematic under conditions with multiple remote
devices submitting forms simultaneously over unreliable networks.

To address this issue, a PR has been raised with the following proposed
changes:

- Race Condition Resolution: A locking mechanism has been added to
prevent the race condition when checking for existing instances and
creating new ones. This aims to eliminate duplicate submissions.

- UUID Enforcement: Submissions without a UUID are now explicitly
disallowed. This ensures that every submission is uniquely identifiable
and further mitigates the risk of duplicate entries.

- Introduction of `root_uuid`:

- To ensure a consistent submission UUID throughout its lifecycle and
prevent duplicate submissions with the same UUID, a new `root_uuid`
column has been added to the `Instance` model with a unique constraint
(`root_uuid` per `xform`).

- If the `<meta><rootUuid>` is present in the submission XML, it is
stored in the `root_uuid` column.

- If `<meta><rootUuid>` is not present, the value from
`<meta><instanceID>` is used instead.

- This approach guarantees that the `root_uuid` remains constant across
the lifecycle of a submission, providing a reliable identifier for all
instances.

- UUID Handling Improvement: Updated the logic to strip only the `uuid:`
prefix while preserving custom, non-UUID ID schemes (e.g.,
domain.com:1234). This ensures compliance with the OpenRosa spec and
prevents potential ID collisions with custom prefixes.

- Error Handling:
- 202 Accepted: Returns when content is identical to an existing
submission and successfully processed.
- 409 Conflict: Returns when a duplicate UUID is detected but with
differing content.

These changes should improve the robustness of the submission process
and prevent both race conditions and invalid submissions.

## Notes

- Implemented a fix to address the race condition that leads to
duplicate submissions with the same UUID and XML hash.
- Incorporated improvements from existing work, ensuring consistency and
robustness in handling concurrent submissions.
- The fix aims to prevent duplicate submissions, even under high load
and unreliable network conditions.

## Related issues

Supersedes
[kobotoolbox/kobocat#876](kobotoolbox/kobocat#876)
and kobotoolbox/kobocat#859

---------

Co-authored-by: Olivier Leger <[email protected]>
  • Loading branch information
rajpatel24 and noliveleger authored Jan 15, 2025
1 parent d22b8b5 commit 9598180
Show file tree
Hide file tree
Showing 37 changed files with 915 additions and 264 deletions.
1 change: 1 addition & 0 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:
KOBOCAT_URL: http://kobocat
KOBOCAT_INTERNAL_URL: http://kobocat
KOBOFORM_URL: http://kpi
SKIP_TESTS_WITH_CONCURRENCY: 'True'
strategy:
matrix:
python-version: ['3.10']
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,33 @@
import multiprocessing
import os
from collections import defaultdict
from functools import partial

import pytest
import requests
import simplejson as json
from django.conf import settings
from django.contrib.auth.models import AnonymousUser
from django.core.files.base import ContentFile
from django.core.files.uploadedfile import InMemoryUploadedFile
from django.test.testcases import LiveServerTestCase
from django_digest.test import DigestAuth
from rest_framework.authtoken.models import Token

from kobo.apps.kobo_auth.shortcuts import User
from kobo.apps.openrosa.apps.main.models import UserProfile
from kobo.apps.openrosa.libs.tests.mixins.request_mixin import RequestMixin
from rest_framework import status

from kobo.apps.openrosa.apps.api.tests.viewsets.test_abstract_viewset import (
TestAbstractViewSet,
)
from kobo.apps.openrosa.apps.api.viewsets.xform_submission_api import XFormSubmissionApi
from kobo.apps.openrosa.apps.logger.models import Attachment
from kobo.apps.openrosa.apps.main import tests as main_tests
from kobo.apps.openrosa.libs.constants import CAN_ADD_SUBMISSIONS
from kobo.apps.openrosa.libs.utils.guardian import assign_perm
from kobo.apps.openrosa.libs.utils import logger_tools
from kobo.apps.openrosa.libs.utils.logger_tools import (
OpenRosaResponseNotAllowed,
OpenRosaTemporarilyUnavailable,
Expand Down Expand Up @@ -212,7 +227,7 @@ def test_post_submission_authenticated_bad_json(self):
self.assertTrue('error' in rendered_response.data)
self.assertTrue(
rendered_response.data['error'].startswith(
'Received empty submission'
'Instance ID is required'
)
)
self.assertTrue(rendered_response.status_code == 400)
Expand Down Expand Up @@ -418,7 +433,7 @@ def test_submission_account_inactive(self):

def test_submission_blocking_flag(self):
# Set 'submissions_suspended' True in the profile to test if
# submission do fail with the flag set
# submission does fail with the flag set
self.xform.user.profile.submissions_suspended = True
self.xform.user.profile.save()

Expand Down Expand Up @@ -536,3 +551,108 @@ def test_submission_customizable_confirmation_message(self):
self.assertContains(
response, 'Successful submission.', status_code=201
)


class ConcurrentSubmissionTestCase(RequestMixin, LiveServerTestCase):
"""
Inherit from LiveServerTestCase to be able to test concurrent requests
to submission endpoint in different transactions (and different processes).
Otherwise, DB is populated only on the first request but still empty on
subsequent ones.
"""

def setUp(self):
self.user = User.objects.create_user(
username='bob', password='bob', email='[email protected]'
)
self.token, _ = Token.objects.get_or_create(user=self.user)
UserProfile.objects.get_or_create(user=self.user)

def publish_xls_form(self):
path = os.path.join(
settings.OPENROSA_APP_DIR,
'apps',
'main',
'tests',
'fixtures',
'transportation',
'transportation.xls',
)

with open(path, 'rb') as f:
xls_file = ContentFile(f.read(), name='transportation.xls')

self.xform = logger_tools.publish_xls_form(xls_file, self.user)

@pytest.mark.skipif(
settings.SKIP_TESTS_WITH_CONCURRENCY,
reason='GitLab does not seem to support multi-processes'
)
def test_post_concurrent_same_submissions(self):
DUPLICATE_SUBMISSIONS_COUNT = 2 # noqa

self.publish_xls_form()
username = 'bob'
survey = 'transport_2011-07-25_19-05-49'
results = defaultdict(int)

with multiprocessing.Pool() as pool:
for result in pool.map(
partial(
submit_data,
live_server_url=self.live_server_url,
survey_=survey,
username_=username,
token_=self.token.key
),
range(DUPLICATE_SUBMISSIONS_COUNT),
):
results[result] += 1

assert results[status.HTTP_201_CREATED] == 1
assert results[status.HTTP_202_ACCEPTED] == DUPLICATE_SUBMISSIONS_COUNT - 1


def submit_data(identifier, survey_, username_, live_server_url, token_):
"""
Submit data to live server.
It has to be outside `ConcurrentSubmissionTestCase` class to be pickled by
`multiprocessing.Pool().map()`.
"""
media_file = '1335783522563.jpg'
main_directory = os.path.dirname(main_tests.__file__)
path = os.path.join(
main_directory,
'fixtures',
'transportation',
'instances',
survey_,
media_file,
)
with open(path, 'rb') as f:
f = InMemoryUploadedFile(
f,
'media_file',
media_file,
'image/jpg',
os.path.getsize(path),
None,
)
submission_path = os.path.join(
main_directory,
'fixtures',
'transportation',
'instances',
survey_,
f'{survey_}.xml',
)
with open(submission_path) as sf:
files = {'xml_submission_file': sf, 'media_file': f}
headers = {'Authorization': f'Token {token_}'}
response = requests.post(
f'{live_server_url}/{username_}/submission',
files=files,
headers=headers
)
return response.status_code
42 changes: 40 additions & 2 deletions kobo/apps/openrosa/apps/logger/exceptions.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
class AccountInactiveError(Exception):
pass


class BuildDbQueriesAttributeError(Exception):
pass

Expand All @@ -10,8 +14,14 @@ class BuildDbQueriesNoConfirmationProvidedError(Exception):
pass


class AccountInactiveError(Exception):
pass
class ConflictingSubmissionUUIDError(Exception):
def __init__(self, message='Submission with this instance ID already exists'):
super().__init__(message)


class DuplicateInstanceError(Exception):
def __init__(self, message='Duplicate Instance'):
super().__init__(message)


class DuplicateUUIDError(Exception):
Expand All @@ -22,9 +32,37 @@ class FormInactiveError(Exception):
pass


class InstanceEmptyError(Exception):
def __init__(self, message='Empty instance'):
super().__init__(message)


class InstanceInvalidUserError(Exception):
def __init__(self, message='Could not determine the user'):
super().__init__(message)


class InstanceIdMissingError(Exception):
def __init__(self, message='Could not determine the instance ID'):
super().__init__(message)


class InstanceMultipleNodeError(Exception):
pass


class InstanceParseError(Exception):
def __init__(self, message='The instance could not be parsed'):
super().__init__(message)


class MissingValidationStatusPayloadError(Exception):
pass


class TemporarilyUnavailableError(Exception):
pass


class XLSFormError(Exception):
pass
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,7 @@
</kids>
<gps>-1.2627557 36.7926442 0.0 30.0</gps>
<web_browsers>chrome ie</web_browsers>
<meta>
<instanceID>uuid:364f173c688e482486a48661700466gg</instanceID>
</meta>
</new_repeats>
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@
</kids>
<gps>-1.2627557 36.7926442 0.0 30.0</gps>
<web_browsers>chrome ie</web_browsers>
<meta>
<instanceID>uuid:364f173c688e482486a48661700466gg</instanceID>
</meta>
</new_repeats>
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<?xml version='1.0' ?><survey_names id="survey_names"><start>2012-08-17T11:24:53.254+03</start><name>HD</name><end>2012-08-17T11:24:57.897+03</end></survey_names>
<?xml version='1.0' ?><survey_names id="survey_names"><start>2012-08-17T11:24:53.254+03</start><name>HD</name><end>2012-08-17T11:24:57.897+03</end><meta><instanceID>uuid:729f173c688e482486a48661700455ff</instanceID></meta></survey_names>
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,7 @@
<has_children>0</has_children>
<gps>-1.2836198 36.8795437 0.0 1044.0</gps>
<web_browsers>firefox chrome safari</web_browsers>
<meta>
<instanceID>uuid:364f173c688e482486a48661700466gg</instanceID>
</meta>
</tutorial>
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,7 @@
<has_children>0</has_children>
<gps>-1.2836198 36.8795437 0.0 1044.0</gps>
<web_browsers>firefox chrome safari</web_browsers>
<meta>
<instanceID>uuid:639f173c688e482486a48661700456ty</instanceID>
</meta>
</tutorial>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<?xml version='1.0' ?>
<tutorial id="tutorial">
<name>Larry
Again
</name><!-- newline to make sure we can handle it -->
<age>23</age>
<picture>1335783522563.jpg</picture>
<has_children>0</has_children>
<gps>-1.2836198 36.8795437 0.0 1044.0</gps>
<web_browsers>firefox chrome safari</web_browsers>
<meta>
<instanceID>uuid:729f173c688e482486a48661700455ff</instanceID>
</meta>
<formhub>
<uuid>c2f940a2f1e8490d8d3c3030452ed401</uuid>
</formhub>
</tutorial>
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<?xml version='1.0' ?>
<tutorial id="tutorial">
<name>Larry
Again
</name><!-- newline to make sure we can handle it -->
<age>23</age>
<picture>1335783522563.jpg</picture>
<has_children>0</has_children>
<gps>-1.2836198 36.8795437 0.0 1044.0</gps>
<web_browsers>firefox chrome safari</web_browsers>
<meta>
<instanceID>uuid:031c585a-88b7-4962-a566-849a7dd15891</instanceID>
<deprecatedID>uuid:729f173c688e482486a48661700455ff</deprecatedID>
</meta>
<formhub>
<uuid>c2f940a2f1e8490d8d3c3030452ed401</uuid>
</formhub>
</tutorial>
Loading

0 comments on commit 9598180

Please sign in to comment.