Attempt to spread documents more evenly across annotators

Rather than picking the next document for each annotator completely at random, we now prefer documents that have fewer existing annotations. This is achieved by first sorting the list of documents by the number of COMPLETED+PENDING annotations and then randomizing only within each group, i.e. we first try (in random order) those documents with no existing annotations, then if none of those are suitable we try (again in random order) those documents with one annotation, then two, etc. until we either find a valid document to assign or run out of documents to try. The effect of this should be that at any given time the full set of documents should be "evenly" annotated, or as close as possible if the number of completed annotation does not divide evenly into num_docs*annotations_per_doc Fixes #372
GateNLP · Aug 4, 2023 · 04118ad · 04118ad
1 parent 262acbd
commit 04118ad
Showing 1 changed file with 12 additions and 3 deletions.
diff --git a/backend/models.py b/backend/models.py
@@ -737,9 +737,18 @@ def assign_annotator_task(self, user, doc_type=DocumentType.ANNOTATION):
         Annotation task performs an extra check for remaining annotation task (num_annotation_tasks_remaining),
         testing and training does not do this check as the annotator must annotate all documents.
         """
-        if (DocumentType.ANNOTATION and self.num_annotation_tasks_remaining > 0) or \
-                DocumentType.TEST or DocumentType.TRAINING:
-            for doc in self.documents.filter(doc_type=doc_type).order_by('?'):
+        if (doc_type == DocumentType.ANNOTATION and self.num_annotation_tasks_remaining > 0) or \
+                doc_type == DocumentType.TEST or doc_type == DocumentType.TRAINING:
+            if doc_type == DocumentType.TEST or doc_type == DocumentType.TRAINING:
+                queryset = self.documents.filter(doc_type=doc_type).order_by('?')
+            else:
+                # Prefer documents which have fewer complete or pending annotations, in order to
+                # spread the annotators as evenly as possible across the available documents
+                queryset = self.documents.filter(doc_type=doc_type).alias(
+                    occupied_annotations=Count("annotations", filter=Q(annotations__status=Annotation.COMPLETED)
+                                                                     | Q(annotations__status=Annotation.PENDING))
+                ).order_by('occupied_annotations', '?')
+            for doc in queryset:
                 # Check that annotator hasn't annotated and that
                 # doc hasn't been fully annotated
                 if doc.user_can_annotate_document(user):