Skip to content

Commit

Permalink
Attempt to spread documents more evenly across annotators
Browse files Browse the repository at this point in the history
Rather than picking the next document for each annotator completely at random, we now prefer documents that have fewer existing annotations.  This is achieved by first sorting the list of documents by the number of COMPLETED+PENDING annotations and then randomizing only within each group, i.e. we first try (in random order) those documents with no existing annotations, then if none of those are suitable we try (again in random order) those documents with one annotation, then two, etc. until we either find a valid document to assign or run out of documents to try.  The effect of this should be that at any given time the full set of documents should be "evenly" annotated, or as close as possible if the number of completed annotation does not divide evenly into num_docs*annotations_per_doc

Fixes #372
  • Loading branch information
ianroberts committed Aug 4, 2023
1 parent 262acbd commit 04118ad
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions backend/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,9 +737,18 @@ def assign_annotator_task(self, user, doc_type=DocumentType.ANNOTATION):
Annotation task performs an extra check for remaining annotation task (num_annotation_tasks_remaining),
testing and training does not do this check as the annotator must annotate all documents.
"""
if (DocumentType.ANNOTATION and self.num_annotation_tasks_remaining > 0) or \
DocumentType.TEST or DocumentType.TRAINING:
for doc in self.documents.filter(doc_type=doc_type).order_by('?'):
if (doc_type == DocumentType.ANNOTATION and self.num_annotation_tasks_remaining > 0) or \
doc_type == DocumentType.TEST or doc_type == DocumentType.TRAINING:
if doc_type == DocumentType.TEST or doc_type == DocumentType.TRAINING:
queryset = self.documents.filter(doc_type=doc_type).order_by('?')
else:
# Prefer documents which have fewer complete or pending annotations, in order to
# spread the annotators as evenly as possible across the available documents
queryset = self.documents.filter(doc_type=doc_type).alias(
occupied_annotations=Count("annotations", filter=Q(annotations__status=Annotation.COMPLETED)
| Q(annotations__status=Annotation.PENDING))
).order_by('occupied_annotations', '?')
for doc in queryset:
# Check that annotator hasn't annotated and that
# doc hasn't been fully annotated
if doc.user_can_annotate_document(user):
Expand Down

0 comments on commit 04118ad

Please sign in to comment.