diff --git a/.github/workflows/create-release.yml b/.github/workflows/create-release.yml index a3bb9a43..a353cb8e 100644 --- a/.github/workflows/create-release.yml +++ b/.github/workflows/create-release.yml @@ -35,7 +35,7 @@ jobs: - name: Create release artifacts run: | sed "s/DEFAULT_IMAGE_TAG=latest/DEFAULT_IMAGE_TAG=${GITHUB_REF_NAME#v}/" install/get-teamware.sh > ./get-teamware.sh - tar cvzf install.tar.gz README.md docker-compose*.yml generate-docker-env.sh create-django-db.sh nginx custom-policies Caddyfile + tar cvzf install.tar.gz README.md docker-compose*.yml generate-docker-env.sh create-django-db.sh nginx custom-policies Caddyfile backup_manual.sh backup_restore.sh - name: Create release uses: softprops/action-gh-release@v1 diff --git a/CHANGELOG.md b/CHANGELOG.md index 57c56ac2..39693672 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,28 @@ ### Fixed +In versions from 0.2.0 to 2.1.0 inclusive the default `docker-compose.yml` file fails to back up the database, due to a mismatch between the version of the database server and the version of the backup client. This is now fixed, but in order to create a proper database backup before attempting to upgrade you will need to manually edit your `docker-compose.yml` file and change + +```yaml + pgbackups: + image: prodrigestivill/postgres-backup-local:12 +``` + +to + +```yaml + pgbackups: + image: prodrigestivill/postgres-backup-local:14 +``` + +(change the "12" to "14"), then run `docker compose up -d` (or `docker-compose up -d`) again to upgrade just the backup tool. Once the correct backup tool is running you can start an immediate backup using + +``` +docker compose run --rm -it pgbackups /backup.sh +``` + +(or `docker-compose` if your version of Docker does not support compose v2). + ## [2.1.0] 2023-05-03 ### Added diff --git a/CITATION.cff b/CITATION.cff index e3e14ade..c2dabc4c 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -26,7 +26,7 @@ identifiers: - description: The collection of archived snapshots of all versions of GATE Teamware 2 type: doi - value: 10.5281/zenodo.7821718 + value: 10.5281/zenodo.7899193 keywords: - NLP - machine learning diff --git a/README.md b/README.md index b7d9927f..d13e4263 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,13 @@ ![](/frontend/public/static/img/gate-teamware-logo.svg "GATE Teamware") -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7821718.svg)](https://doi.org/10.5281/zenodo.7821718) +[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7899193.svg)](https://doi.org/10.5281/zenodo.7899193) A web application for collaborative document annotation. Full documentation can be [found here][docs]. -GATE teamware provides a flexible web app platform for managing classification of documents by human annotators. +GATE Teamware provides a flexible web app platform for managing classification of documents by human annotators. ## Key Features * Configure annotation options using a highly flexible JSON config. @@ -37,6 +37,16 @@ bash ./get-teamware.sh [A Helm chart](https://github.com/GateNLP/charts/tree/main/gate-teamware) is also available to allow deployment on Kubernetes. +### Upgrading + +**When upgrading GATE Teamware it is strongly recommended to ensure you have a recent backup of your database before starting the upgrade procedure.** Database schema changes should be applied automatically as part of the upgrade but unexpected errors may cause data corruption - **always** take a backup before starting any significant changes to your database, so you can roll back in the event of failure. + +Check the [changelog](CHANGELOG.md) - any breaking changes and special considerations for upgrades to particular versions will be documented there. + +To upgrade a GATE Teamware installation that you installed using `get-teamware.sh`, simply download and run the latest version of the script in the same folder. It will detect your existing configuration and prompt you for any new settings that have been introduced in the new version. Note that any manual changes you have made to the `docker-compose.yml` and other files will not be duplicated automatically for the new version, you will have to port the necessary changes to the new files by hand. + +Upgrading a Kubernetes deployment generally consists simply of installing the new chart version with `help upgrade`. As above, check the GATE Teamware changelog and the [chart readme](https://github.com/GateNLP/charts/tree/main/gate-teamware) for any special considerations, new or changed configuration values, etc. and ensure you have a recent database backup before starting the upgrade process. + ## Building locally Follow these steps to run the app on your local machine using `docker-compose`: 1. Clone this repository by running `git clone https://github.com/GateNLP/gate-teamware.git` and move into the `gate-teamware` directory. @@ -63,7 +73,7 @@ Teamware is developed by the [GATE](https://gate.ac.uk) team, an academic resear ## Citation For published work that has used Teamware, please cite this repository. One way is to include a citation such as: -> Karmakharm, T., Wilby, D., Roberts, I., & Bontcheva, K. (2022). GATE Teamware (Version 0.1.4) [Computer software]. https://github.com/GateNLP/gate-teamware +> Karmakharm, T., Wilby, D., Roberts, I., & Bontcheva, K. (2022). GATE Teamware (Version 2.1.0) [Computer software]. https://github.com/GateNLP/gate-teamware Please use the `Cite this repository` button at the top of the [project's GitHub repository](https://github.com/GATENLP/gate-teamware) to get an up to date citation. diff --git a/backend/management/commands/download_annotations.py b/backend/management/commands/download_annotations.py new file mode 100644 index 00000000..31053a30 --- /dev/null +++ b/backend/management/commands/download_annotations.py @@ -0,0 +1,58 @@ +import json +from django.core.management.base import BaseCommand, CommandError +from django.template.loader import render_to_string +from backend.rpcserver import JSONRPCEndpoint +from backend.views import DownloadAnnotationsView +import argparse + +class Command(BaseCommand): + + help = "Download annotation data" + + + + def add_arguments(self, parser): + parser.add_argument("output_path", type=str, help="Path of file output") + parser.add_argument("project_id", type=str, help="ID of the project") + parser.add_argument("doc_type", type=str, help="Document type all, training, test, or annotation") + parser.add_argument("export_type", type=str, help="Type of export json, jsonl or csv") + parser.add_argument("anonymize", type=self.str2bool, help="Data should be anonymized or not ") + parser.add_argument("-j", "--json_format", type=str, help="Type of json format: raw (default) or gate ") + parser.add_argument("-n", "--num_entries_per_file", type=int, help="Number of entries to generate per file, default 500") + + + def handle(self, *args, **options): + + annotations_downloader = DownloadAnnotationsView() + + output_path = options["output_path"] + project_id = options["project_id"] + doc_type = options["doc_type"] + export_type = options["export_type"] + anonymize = options["anonymize"] + json_format = options["json_format"] if options["json_format"] else "raw" + num_entries_per_file = options["num_entries_per_file"] if options["num_entries_per_file"] else 500 + + print(f"Writing annotations to {output_path} \n Project: {project_id}\n Document type: {doc_type}\n Export type: {export_type} \n Anonymized: {anonymize} \n Json format: {json_format} \n Num entries per file: {num_entries_per_file}\n") + + with open(output_path, "wb") as z: + annotations_downloader.write_zip_to_file(file_stream=z, + project_id=project_id, + doc_type=doc_type, + export_type=export_type, + json_format=json_format, + anonymize=anonymize, + documents_per_file=num_entries_per_file) + + + def str2bool(self, v): + if isinstance(v, bool): + return v + if v.lower() in ('yes', 'true', 't', 'y', '1'): + return True + elif v.lower() in ('no', 'false', 'f', 'n', '0'): + return False + else: + raise argparse.ArgumentTypeError('Boolean value expected.') + + diff --git a/backend/models.py b/backend/models.py index f0b93af7..249638a9 100644 --- a/backend/models.py +++ b/backend/models.py @@ -64,6 +64,13 @@ class ServiceUser(AbstractUser): agreed_privacy_policy = models.BooleanField(default=False) is_deleted = models.BooleanField(default=False) + def lock_user(self): + """ + Lock this user with a SELECT FOR UPDATE. This method must be called within a transaction, + the lock will be released when the transaction commits or rolls back. + """ + return type(self).objects.filter(id=self.id).select_for_update().get() + @property def has_active_project(self): return self.annotatorproject_set.filter(status=AnnotatorProject.ACTIVE).count() > 0 @@ -485,6 +492,9 @@ def reject_annotator(self, user, finished_time=timezone.now()): annotator_project.status = AnnotatorProject.COMPLETED annotator_project.rejected = True annotator_project.save() + + Annotation.clear_all_pending_user_annotations(user) + except ObjectDoesNotExist: raise Exception(f"User {user.username} is not an annotator of the project.") @@ -589,6 +599,9 @@ def get_annotator_task(self, user): user from annotator list if there's no more tasks or user reached quota. """ + # Lock required to prevent concurrent calls from assigning two different tasks + # to the same user + user = user.lock_user() annotation = self.get_current_annotator_task(user) if annotation: # User has existing task @@ -623,7 +636,7 @@ def get_current_annotator_task(self, user): annotation = current_annotations.first() if annotation.document.project != self: - return RuntimeError( + raise RuntimeError( "The annotation doesn't belong to this project! Annotator should only work on one project at a time") return annotation @@ -724,9 +737,18 @@ def assign_annotator_task(self, user, doc_type=DocumentType.ANNOTATION): Annotation task performs an extra check for remaining annotation task (num_annotation_tasks_remaining), testing and training does not do this check as the annotator must annotate all documents. """ - if (DocumentType.ANNOTATION and self.num_annotation_tasks_remaining > 0) or \ - DocumentType.TEST or DocumentType.TRAINING: - for doc in self.documents.filter(doc_type=doc_type).order_by('?'): + if (doc_type == DocumentType.ANNOTATION and self.num_annotation_tasks_remaining > 0) or \ + doc_type == DocumentType.TEST or doc_type == DocumentType.TRAINING: + if doc_type == DocumentType.TEST or doc_type == DocumentType.TRAINING: + queryset = self.documents.filter(doc_type=doc_type).order_by('?') + else: + # Prefer documents which have fewer complete or pending annotations, in order to + # spread the annotators as evenly as possible across the available documents + queryset = self.documents.filter(doc_type=doc_type).alias( + occupied_annotations=Count("annotations", filter=Q(annotations__status=Annotation.COMPLETED) + | Q(annotations__status=Annotation.PENDING)) + ).order_by('occupied_annotations', '?') + for doc in queryset: # Check that annotator hasn't annotated and that # doc hasn't been fully annotated if doc.user_can_annotate_document(user): diff --git a/backend/tests/test_models.py b/backend/tests/test_models.py index 33a3fba1..0dfa9bcb 100644 --- a/backend/tests/test_models.py +++ b/backend/tests/test_models.py @@ -411,6 +411,26 @@ def test_reject_annotator(self): self.assertEqual(AnnotatorProject.COMPLETED, annotator_project.status) self.assertEqual(True, annotator_project.rejected) + def test_remove_annotator_clears_pending(self): + annotator = self.annotators[0] + # Start a task - should be one pending annotation + self.project.get_annotator_task(annotator) + self.assertEqual(1, annotator.annotations.filter(status=Annotation.PENDING).count()) + + # remove annotator from project - pending annotations should be cleared + self.project.remove_annotator(annotator) + self.assertEqual(0, annotator.annotations.filter(status=Annotation.PENDING).count()) + + def test_reject_annotator_clears_pending(self): + annotator = self.annotators[0] + # Start a task - should be one pending annotation + self.project.get_annotator_task(annotator) + self.assertEqual(1, annotator.annotations.filter(status=Annotation.PENDING).count()) + + # reject annotator from project - pending annotations should be cleared + self.project.reject_annotator(annotator) + self.assertEqual(0, annotator.annotations.filter(status=Annotation.PENDING).count()) + def test_num_documents(self): self.assertEqual(self.project.num_documents, self.num_docs) diff --git a/backend/tests/test_rpc_endpoints.py b/backend/tests/test_rpc_endpoints.py index 096f3e2f..45db135e 100644 --- a/backend/tests/test_rpc_endpoints.py +++ b/backend/tests/test_rpc_endpoints.py @@ -8,6 +8,7 @@ from django.utils import timezone import json +import logging from backend.models import Annotation, Document, DocumentType, Project, AnnotatorProject, UserDocumentFormatPreference from backend.rpc import create_project, update_project, add_project_document, add_document_annotation, \ @@ -28,7 +29,7 @@ from backend.tests.test_rpc_server import TestEndpoint - +LOGGER = logging.getLogger(__name__) class TestUserAuth(TestCase): @@ -1379,7 +1380,7 @@ def setUp(self): self.num_training_docs = 5 self.training_docs = [] for i in range(self.num_training_docs): - self.docs.append(Document.objects.create(project=self.proj, + self.training_docs.append(Document.objects.create(project=self.proj, doc_type=DocumentType.TRAINING, data={ "text": f"Document {i}", @@ -1396,7 +1397,7 @@ def setUp(self): self.num_test_docs = 10 self.test_docs = [] for i in range(self.num_test_docs): - self.docs.append(Document.objects.create(project=self.proj, + self.test_docs.append(Document.objects.create(project=self.proj, doc_type=DocumentType.TEST, data={ "text": f"Document {i}", @@ -1609,10 +1610,11 @@ def test_annotations_per_doc_not_enforced_for_training_or_test(self): self.proj.save() docs_annotated_per_user = [] - for (i, (ann_user, _)) in enumerate(self.annotators): + for (ann_user, _) in self.annotators: # Add to project self.assertTrue(add_project_annotator(self.manager_request, self.proj.id, ann_user.username)) + for (i, (ann_user, _)) in enumerate(self.annotators): # Every annotator should be able to complete every training document, even though # max annotations per document is less than the total number of annotators self.assertEqual(self.num_training_docs, @@ -1623,6 +1625,7 @@ def test_annotations_per_doc_not_enforced_for_training_or_test(self): self.assertEqual(self.num_training_docs, self.proj.get_annotator_document_score(ann_user, DocumentType.TRAINING)) + for (i, (ann_user, _)) in enumerate(self.annotators): # Every annotator should be able to complete every test document, even though # max annotations per document is less than the total number of annotators self.assertEqual(self.num_test_docs, @@ -1633,6 +1636,7 @@ def test_annotations_per_doc_not_enforced_for_training_or_test(self): self.assertEqual(self.num_training_docs, self.proj.get_annotator_document_score(ann_user, DocumentType.TRAINING)) + for (i, (ann_user, _)) in enumerate(self.annotators): # Now attempt to complete task normally num_annotated = self.complete_annotations(self.num_docs, "Annotation", annotator=i) docs_annotated_per_user.append(num_annotated) @@ -1662,15 +1666,30 @@ def complete_annotations(self, num_annotations_to_complete, expected_doc_type_st # Expect to get self.num_training_docs tasks num_completed_tasks = 0 + if expected_doc_type_str == 'Annotation': + all_docs = self.docs + elif expected_doc_type_str == 'Training': + all_docs = self.training_docs + else: + all_docs = self.test_docs + + annotated_docs = {doc.pk: ' ' for doc in all_docs} for i in range(num_annotations_to_complete): task_context = get_annotation_task(ann_req) if task_context: self.assertEqual(expected_doc_type_str, task_context.get("document_type"), f"Document type does not match in task {task_context!r}, " + "annotator {ann.username}, document {i}") + annotated_docs[task_context['document_id']] = "\u2714" complete_annotation_task(ann_req, task_context["annotation_id"], {"sentiment": answer}) num_completed_tasks += 1 + # Draw a nice markdown table of exactly which documents each annotator was given + if annotator == 0: + LOGGER.debug("Annotator | " + (" | ".join(str(i) for i in annotated_docs.keys()))) + LOGGER.debug(" | ".join(["--"] * (len(annotated_docs)+1))) + LOGGER.debug(ann.username + " | " + (" | ".join(str(v) for v in annotated_docs.values()))) + return num_completed_tasks class TestAnnotationChange(TestEndpoint): diff --git a/backend/views.py b/backend/views.py index c4becef3..99616bc7 100644 --- a/backend/views.py +++ b/backend/views.py @@ -58,36 +58,10 @@ def get(self, request, project_id, doc_type, export_type, json_format, entries_p def generate_download(self, project_id, doc_type="all", export_type="json", json_format="raw", anonymize=True, chunk_size=512, documents_per_file=500): - project = Project.objects.get(pk=project_id) - with tempfile.TemporaryFile() as z: - with ZipFile(z, "w") as zip: - all_docs = project.documents.all() - if doc_type == "training": - all_docs = project.documents.filter(doc_type=DocumentType.TRAINING) - elif doc_type == "test": - all_docs = project.documents.filter(doc_type=DocumentType.TEST) - elif doc_type == "annotation": - all_docs = project.documents.filter(doc_type=DocumentType.ANNOTATION) - - - num_docs = all_docs.count() - num_slices = math.ceil(num_docs/documents_per_file) - - for slice_index in range(num_slices): - start_index = slice_index*documents_per_file - end_index = ((slice_index+1)*documents_per_file) - if end_index >= num_docs: - end_index = num_docs - - slice_docs = all_docs[start_index:end_index] - - with tempfile.NamedTemporaryFile("w+") as f: - self.write_docs_to_file(f, slice_docs, export_type, json_format, anonymize) - zip.write(f.name, f"project-{project_id}-{doc_type}-{slice_index:04d}.{export_type}") + self.write_zip_to_file(z, project_id, doc_type, export_type, json_format, anonymize, documents_per_file) # Stream file output - z.seek(0) while True: c = z.read(chunk_size) @@ -96,6 +70,34 @@ def generate_download(self, project_id, doc_type="all", export_type="json", json else: break + + def write_zip_to_file(self, file_stream, project_id, doc_type="all", export_type="json", json_format="raw", anonymize=True, documents_per_file=500): + + project = Project.objects.get(pk=project_id) + with ZipFile(file_stream, "w") as zip: + all_docs = project.documents.all() + if doc_type == "training": + all_docs = project.documents.filter(doc_type=DocumentType.TRAINING) + elif doc_type == "test": + all_docs = project.documents.filter(doc_type=DocumentType.TEST) + elif doc_type == "annotation": + all_docs = project.documents.filter(doc_type=DocumentType.ANNOTATION) + + num_docs = all_docs.count() + num_slices = math.ceil(num_docs / documents_per_file) + + for slice_index in range(num_slices): + start_index = slice_index * documents_per_file + end_index = ((slice_index + 1) * documents_per_file) + if end_index >= num_docs: + end_index = num_docs + + slice_docs = all_docs[start_index:end_index] + + with tempfile.NamedTemporaryFile("w+") as f: + self.write_docs_to_file(f, slice_docs, export_type, json_format, anonymize) + zip.write(f.name, f"project-{project_id}-{doc_type}-{slice_index:04d}.{export_type}") + def write_docs_to_file(self, file, documents, export_type, json_format, anonymize): if export_type == "json": self.write_docs_as_json(file, documents, json_format, anonymize) diff --git a/backup_manual.sh b/backup_manual.sh index afe1d0c6..1cb24721 100755 --- a/backup_manual.sh +++ b/backup_manual.sh @@ -2,4 +2,17 @@ set -e -docker-compose run --rm --entrypoint="/bin/bash" pgbackups ./backup.sh \ No newline at end of file +declare -a COMPOSE +if docker compose >/dev/null 2>&1 ; then + # We have compose v2 + COMPOSE[0]="docker" + COMPOSE[1]="compose" +elif which docker-compose > /dev/null 2>&1 ; then + # We have compose v1 + COMPOSE[0]="docker-compose" +else + echo "Unable to find docker compose or docker-compose on your PATH" + exit 1 +fi + +"${COMPOSE[@]}" run --rm -it pgbackups /backup.sh \ No newline at end of file diff --git a/backup_restore.sh b/backup_restore.sh index df9f3eb6..f7e9658c 100755 --- a/backup_restore.sh +++ b/backup_restore.sh @@ -2,19 +2,32 @@ set -e +declare -a COMPOSE +if docker compose >/dev/null 2>&1 ; then + # We have compose v2 + COMPOSE[0]="docker" + COMPOSE[1]="compose" +elif which docker-compose > /dev/null 2>&1 ; then + # We have compose v1 + COMPOSE[0]="docker-compose" +else + echo "Unable to find docker compose or docker-compose on your PATH" + exit 1 +fi + export $(grep -v '^#' .env | xargs) # Get db container up and running -docker-compose up -d db +"${COMPOSE[@]}" up -d db # Drop the schema as this is created in the backup dump -docker-compose run --rm -e DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE -e DB_USERNAME=postgres -e "DB_PASSWORD=$PG_SUPERUSER_PASSWORD" -T --entrypoint "python" backend manage.py dbshell -- -c 'DROP SCHEMA public CASCADE;' +"${COMPOSE[@]}" run --rm -e DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE -e DB_USERNAME=postgres -e "DB_PASSWORD=$PG_SUPERUSER_PASSWORD" -T --entrypoint "python" backend manage.py dbshell -- -c 'DROP SCHEMA public CASCADE;' # Run the backup restore -zcat "$1" | docker-compose run --rm -e DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE -e DB_USERNAME=postgres -e "DB_PASSWORD=$PG_SUPERUSER_PASSWORD" -T --entrypoint "python" backend manage.py dbshell -- +zcat "$1" | "${COMPOSE[@]}" run --rm -e DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE -e DB_USERNAME=postgres -e "DB_PASSWORD=$PG_SUPERUSER_PASSWORD" -T --entrypoint "python" backend manage.py dbshell -- # Reinstate the correct user permissions -docker-compose run --rm -e DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE -e DB_USERNAME=postgres -e "DB_PASSWORD=$PG_SUPERUSER_PASSWORD" -T --entrypoint "python" backend manage.py dbshell -- -c "\ +"${COMPOSE[@]}" run --rm -e DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE -e DB_USERNAME=postgres -e "DB_PASSWORD=$PG_SUPERUSER_PASSWORD" -T --entrypoint "python" backend manage.py dbshell -- -c "\ GRANT ALL ON SCHEMA public TO \"$DB_USERNAME\"; GRANT ALL ON SCHEMA public TO public; GRANT CONNECT ON DATABASE \"$DJANGO_DB_NAME\" TO \"$DB_BACKUP_USER\"; @@ -22,4 +35,4 @@ GRANT USAGE ON SCHEMA public TO \"$DB_BACKUP_USER\";\ " # shut down the stack -docker-compose down +"${COMPOSE[@]}" down diff --git a/cypress/e2e/connection-error.spec.js b/cypress/e2e/connection-error.spec.js new file mode 100644 index 00000000..d1c9ae41 --- /dev/null +++ b/cypress/e2e/connection-error.spec.js @@ -0,0 +1,31 @@ + + +describe("Test connection errors", () =>{ + + it("Test initialise with 404 response", () => { + + cy.intercept("POST", "/rpc/", + { + statusCode: 404, + body: { + + } + } + ).as('rpcCalls') // and assign an alias + cy.visit("/") + cy.contains("TEAMWARE").should("be.visible") + }) + + it("Test initialise with blank response", () =>{ + cy.intercept( + { + method: 'POST', // Route all GET requests + url: '/rpc/', // that have a URL that matches '/users/*' + }, + [] // and force the response to be: [] + ).as('rpcCalls') // and assign an alias + cy.visit("/") + cy.contains("TEAMWARE").should("be.visible") + }) + +}) diff --git a/docker-compose.yml b/docker-compose.yml index 2d3ef61e..5015defd 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -67,7 +67,7 @@ services: - ./create-django-db.sh:/docker-entrypoint-initdb.d/001-create-django-db.sh pgbackups: - image: prodrigestivill/postgres-backup-local:12 + image: prodrigestivill/postgres-backup-local:14 restart: always user: ${BACKUPS_USER_GROUP} volumes: diff --git a/docs/docs/.vuepress/components/AnnotationRendererPreview.vue b/docs/docs/.vuepress/components/AnnotationRendererPreview.vue index 868d9c54..98ecc1af 100644 --- a/docs/docs/.vuepress/components/AnnotationRendererPreview.vue +++ b/docs/docs/.vuepress/components/AnnotationRendererPreview.vue @@ -7,11 +7,15 @@ +

Document {{documentIndex + 1}} of {{documents.length}}

- @@ -35,11 +39,23 @@ export default { }, data(){ return { - annotationOutput: {} - + annotationOutput: {}, + documentIndex: 0 } }, + computed: { + documents() { + if(Array.isArray(this.document)) { + return this.document + } else { + return [this.document] + } + }, + currentDocument() { + return this.documents[this.documentIndex] + } + }, props: { preAnnotation: { default(){ @@ -74,6 +90,12 @@ export default { ] } }, + }, + methods: { + nextDocument() { + this.documentIndex = (this.documentIndex + 1) % this.documents.length; + this.$refs.annotationRenderer.clearForm() + } } } diff --git a/docs/docs/developerguide/README.md b/docs/docs/developerguide/README.md index 2e6b1697..ef38fc97 100644 --- a/docs/docs/developerguide/README.md +++ b/docs/docs/developerguide/README.md @@ -175,7 +175,7 @@ $ ./backup_restore.sh path/to/my/backup.sql.gz This will first launch the database container, then via Django's `dbshell` command, running in the `backend` service, execute a number of SQL commands before and after running all the SQL from the backup file. -4. Redeploy the stack, via `./deploy.sh staging` or `./deploy.sh production`, whichever is the case. +4. Redeploy the stack, via `./deploy.sh staging`, `./deploy.sh production`, or simply `docker compose up -d`, whichever is the case. 5. The database *should* be restored. ## Configuration @@ -188,7 +188,7 @@ and this must be overridden depending on use. ### Database A SQLite3 database is used during development and during integration testing. -For staging and production, postgreSQL is used, running from a `postgres-12` docker container. Settings are found in `teamware/settings/base.py` and `deployment.py` as well as being set as environment variables by `./generate-docker-env.sh` and passed to the container as configured in `docker-compose.yml`. +For staging and production, postgreSQL is used, running from a `postgres-14` docker container. Settings are found in `teamware/settings/base.py` and `deployment.py` as well as being set as environment variables by `./generate-docker-env.sh` and passed to the container as configured in `docker-compose.yml`. In Kubernetes deployments the PostgreSQL database is installed using the Bitnami `postresql` public chart. diff --git a/docs/docs/developerguide/releases.md b/docs/docs/developerguide/releases.md index ec14d1f2..8a08d12b 100644 --- a/docs/docs/developerguide/releases.md +++ b/docs/docs/developerguide/releases.md @@ -9,7 +9,7 @@ Note: Releases are always made from the `master` branch of the repository. 1. **Update the changelog** - This has to be done manually, go through any pull requests to `dev` since the last release. - In github pull requests page, use the search term `is:pr merged:>=yyyy-mm-dd` to find all merged PR from the date since the last version change. - Include the changes in the `CHANGELOG.md` file; the changelog section _MUST_ begin with a level-two heading that starts with the relevant version number in square brackets (`## [N.M.P] Optional descriptive suffix`) as the GitHub workflow that creates a release from the eventual tag depends on this pattern to find the right release notes. Each main item within the changelog should have a link to the originating PR e.g. \[#123\](https://github.com/GateNLP/gate-teamware/pull/123). -1. **Update and check the version numbers** - from the teamware directory run `python version.py check` to check whether all version numbers are up to date. If not, update the master `VERSION` file and run `python version.py update` to update all other version numbers and commit the result. Note that `version.py` requires `pyyaml` for reading `CITATION.cff`, `pyyaml` is included in Teamware's dependencies. +1. **Update and check the version numbers** - from the teamware directory run `python version.py check` to check whether all version numbers are up to date. If not, update the master `VERSION` file and run `python version.py update` to update all other version numbers and commit the result. Alternatively, run `python version.py update ` where `` is the version number to update to, e.g. `python version.py update 2.1.0`. Note that `version.py` requires `pyyaml` for reading `CITATION.cff`, `pyyaml` is included in Teamware's dependencies. 1. **Create a version of the documentation** - Run `npm run docs:create_version`, this will archive the current version of the documentation using the version number in `package.json`. 1. **Create a pull request from `dev` to `master`** including any changes to `CHANGELOG.md`, `VERSION`. 1. **Create a tag** - Once the dev-to-master pull request has been merged, create a tag from the resulting `master` branch named `vN.M.P` (i.e. the new version number prefixed with the letter `v`). This will trigger two GitHub workflows: diff --git a/docs/docs/manageradminguide/config_examples.js b/docs/docs/manageradminguide/config_examples.js index e1b8ea32..d6e40454 100644 --- a/docs/docs/manageradminguide/config_examples.js +++ b/docs/docs/manageradminguide/config_examples.js @@ -204,6 +204,70 @@ export default { ] }, + configConditional1: [ + { + "name": "uri", + "type": "radio", + "title": "Select the most appropriate URI", + "options":[ + {"fromDocument": "candidates"}, + {"value": "other", "label": "Other"} + ] + }, + { + "name": "otherValue", + "type": "text", + "title": "Please specify another value", + "if": "annotation.uri == 'other'", + "regex": "^(https?|urn):", + "valError": "Please specify a URI (starting http:, https: or urn:)" + } + ], + configConditional2: [ + { + "name": "htmldisplay", + "type": "html", + "text": "{{{text}}}" + }, + { + "name": "sentiment", + "type": "radio", + "title": "Sentiment", + "description": "Please select a sentiment of the text above.", + "options": [ + {"value": "negative", "label": "Negative"}, + {"value": "neutral", "label": "Neutral"}, + {"value": "positive", "label": "Positive"} + ] + }, + { + "name": "reason", + "type": "text", + "title": "Why do you disagree with the suggested value?", + "if": "annotation.sentiment !== document.preanno.sentiment" + } + ], + docsConditional2: [ + { + "text": "I love the thing!", + "preanno": { + "sentiment": "positive" + } + }, + { + "text": "I hate the thing!", + "preanno": { + "sentiment": "negative" + } + }, + { + "text": "The thing is ok, I guess...", + "preanno": { + "sentiment": "neutral" + } + } + ], + doc1: {text: "Sometext with html"}, doc2: { diff --git a/docs/docs/manageradminguide/project_config.md b/docs/docs/manageradminguide/project_config.md index 7263d616..d3d4ea3d 100644 --- a/docs/docs/manageradminguide/project_config.md +++ b/docs/docs/manageradminguide/project_config.md @@ -135,7 +135,7 @@ Another field can be added to collect more information, e.g. a text field for op Note that for the above case, the `optional` field is added ensure that allows user to not have to input any value. -This `optional` field can be used on all components. +This `optional` field can be used on all components. Any component may optionally have a field named `if`, containing an expression that is used to determine whether or not the component appears based on information in the document and/or the values entered in the other components. For example the user could be presented with a set of options that includes an "other" choice, and if the annotator chooses "other" then an additional free text field appears for them to fill in. The `if` option is described in more detail under the [conditional components](#conditional-components) section below. Some fields are available to configure which are specific to components, e.g. the `options` field are only available for the `radio`, `checkbox` and `selector` components. See details below on the usage of each specific component. @@ -513,6 +513,166 @@ The separators can be more than one character, and you can set `"valueLabelSepar Static and `fromDocument` options may be freely interspersed in any order, so you can have a fully-dynamic set of options by specifying _only_ a `fromDocument` entry with no static options, or you can have static options that are listed first followed by dynamic options, or dynamic options first followed by static, etc. +### Conditional components + +By default all components listed in the project configuration will be shown for all documents. However this is not always appropriate, for example you may have some components that are only relevant to certain documents, or only relevant for particular combinations of values in _other_ components. To allow for these kinds of scenarios any component can have a field named `if` specifying the conditions under which that component should be shown. + +The `if` field is an _expression_ that is able to refer to fields in both the current _document_ being annotated and the current state of the other annotation components. The expression language is largely based on a subset of the standard JavaScript expression syntax but with a few additional syntax elements to ease working with array data and regular expressions. + +The following simple example shows how you might implement an "Other (please specify)" pattern, where the user can select from a list of choices but also has the option to supply their own answer if none of the choices are appropriate. The free text field is only shown if the user selects the "other" choice. + + + +**Project configuration** + +```json +[ + { + "name": "uri", + "type": "radio", + "title": "Select the most appropriate URI", + "options":[ + {"fromDocument": "candidates"}, + {"value": "other", "label": "Other"} + ] + }, + { + "name": "otherValue", + "type": "text", + "title": "Please specify another value", + "if": "annotation.uri == 'other'", + "regex": "^(https?|urn):", + "valError": "Please specify a URI (starting http:, https: or urn:)" + } +] +``` + +**Document** + +```json +{ + "text": "President Bush visited the air base yesterday...", + "candidates": [ + { + "value": "http://dbpedia.org/resource/George_W._Bush", + "label": "George W. Bush (Jnr)" + }, + { + "value": "http://dbpedia.org/resource/George_H._W._Bush", + "label": "George H. W. Bush (Snr)" + } + ] +} +``` + + +Note that validation rules (such as `optional`, `minSelected` or `regex`) are not applied to components that are hidden by an `if` expression - hidden components will never be included in the annotation output, even if they would be considered "required" had they been visible. + +Components can also be made conditional on properties of the _document_, or a combination of the document and the annotation values, for example + + + +**Project configuration** + +```json +[ + { + "name": "htmldisplay", + "type": "html", + "text": "{{{text}}}" + }, + { + "name": "sentiment", + "type": "radio", + "title": "Sentiment", + "description": "Please select a sentiment of the text above.", + "options": [ + {"value": "negative", "label": "Negative"}, + {"value": "neutral", "label": "Neutral"}, + {"value": "positive", "label": "Positive"} + ] + }, + { + "name": "reason", + "type": "text", + "title": "Why do you disagree with the suggested value?", + "if": "annotation.sentiment !== document.preanno.sentiment" + } +] +``` + +**Documents** + +```json +[ + { + "text": "I love the thing!", + "preanno": { "sentiment": "positive" } + }, + { + "text": "I hate the thing!", + "preanno": { "sentiment": "negative" } + }, + { + "text": "The thing is ok, I guess...", + "preanno": { "sentiment": "neutral" } + } +] +``` + + + +The full list of supported constructions is as follows: + +- the `annotation` variable refers to the current state of the annotation components for this document + - the current value of a particular component can be accessed as `annotation.componentName` or `annotation['component name']` - the brackets version will always work, the dot version works if the component's `name` is a valid JavaScript identifier + - if a component has not been set since the form was last cleared the value may be `null` or `undefined` - the expression should be written to cope with both + - the value of a `text`, `textarea`, `radio` or `selector` component will be a single string (or null/undefined), the value of a `checkbox` component will be an _array_ of strings since more than one value may be selected. If no value is selected the array may be null, undefined or empty, the expression must be prepared to handle any of these +- the `document` variable refers to the current document that is being annotated + - again properties of the document can be accessed as `document.propertyName` or `document['property name']` + - continue the same pattern for nested properties e.g. `document.scores.label1` + - individual elements of array properties can be accessed by zero-based index (e.g. `document.options[0]`) +- various comparison operators are available: + - `==` and `!=` (equal and not-equal) + - `<`, `<=`, `>=`, `>` (less-than, less-or-equal, greater-or-equal, greater-than) + - these operators follow JavaScript rules, which are not always intuitive. Generally if both arguments are strings then they will be compared by lexicographic order, but if either argument is a number then the other one will also be converted to a number before comparing. So if the `score` component is set to the value "10" (a string of two digits) then `annotation.score < 5` would be _false_ (10 is converted to number and compared to 5) but `annotation.score < '5'` would be _true_ (the string "10" sorts before the string "5") + - `in` checks for the presence of an item in an array or a key in an object + - e.g. `'other' in annotation.someCheckbox` checks if the `other` option has been ticked in a checkbox component (whose value is an array) + - this is different from normal JavaScript rules, where `i in myArray` checks for the presence of an array _index_ rather than an array _item_ +- other operators + - `+` (concatenate strings, or add numbers) + - if either argument is a string then both sides are converted to strings and concatenated together + - otherwise both sides are treated as numbers and added + - `-`, `*`, `/`, `%` (subtraction, multiplication, division and remainder) + - `&&`, `||` (boolean AND and OR) + - `!` (prefix boolean NOT, e.g. `!annotation.selected` is true if `selected` is false/null/undefined and false otherwise) + - conditional operator `expr ? valueIfTrue : valueIfFalse` (exactly as in JavaScript, first evaluates the test `expr`, then either the `valueIfTrue` or `valueIfFalse` depending on the outcome of the test) +- `value =~ /regex/` tests whether the given string value contains any matches for the given [regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions#writing_a_regular_expression_pattern) + - use `^` and/or `$` to anchor the match to the start and/or end of the value, for example `annotation.example =~ /^a/i` checks whether the `example` annotation value _starts with_ "a" or "A" (the `/i` flag makes the expression case-insensitive) + - since the project configuration is entered as JSON, any backslash characters within the regex must be doubled to escape them from the JSON parser, i.e. `"if": "annotation.option =~ /\\s/"` would check if `option` contains any space characters (for which the regular expression literal is `/\s/`) +- _Quantifier_ expressions let you check whether `any` or `all` of the items in an array or key/value pairs in an object match a predicate expression. The general form is `any(x in expr, predicate)` or `all(x in expr, predicate)`, where `expr` is an expression that resolves to an array or object value, `x` is a new identifier, and `predicate` is the expression to test each item against. The `predicate` expression can refer to the `x` identifier + - `any(option in annotation.someCheckbox, option > 3)` + - `all(e in document.scores, e.value < 0.7)` (assuming `scores` is an object mapping labels to scores, e.g. `{"scores": {"positive": 0.5, "negative": 0.3}}`) + - when testing a predicate against an _object_ each entry has `.key` and `.value` properties giving the key and value of the current entry + - on a null, undefined or empty array/object, `any` will return _false_ (since there are no items that pass the test) and `all` will return _true_ (since there are no items that _fail_ the test) + - the predicate is optional - `any(arrayExpression)` resolves to `true` if any item in the array has a value that JavaScript considers to be "truthy", i.e. anything other than the number 0, the empty string, null or undefined. So `any(annotation.myCheckbox)` is a convenient way to check whether _at least one_ option has been selected in a `checkbox` component. + +If the `if` expression for a particular component is _syntactically invalid_ (missing operands, mis-matched brackets, etc.) then the condition will be ignored and the component will always be displayed as though it did not have an `if` expression at all. Conversely, if the expression is valid but an error occurs while _evaluating_ it, this will be treated the same as if the expression returned `false`, and the associated component will not be displayed. The behaviour is this way around as the most common reason for errors during evaluation is attempting to refer to annotation components that have not yet been filled in - if this is not appropriate in your use case you must account for the possibility within your expression. For example, suppose `confidence` is a `radio` or `selector` component with values ranging from 1 to 5, then another component that declares + +``` +"if": "annotation.confidence && annotation.confidence < 4"` +``` + +will hide this component if `confidence` is unset, displaying it only if `confidence` is set to a value less than 4, whereas + +``` +"if": "!annotation.confidence || annotation.confidence < 4" +``` + +will hide this component only if `confidence` is actually _set_ to a value of 4 or greater - it will _show_ this component if `confidence` is unset. Either approach may be correct depending on your project's requirements. + +To assist managers in authoring project configurations with `if` conditions, the "preview" mode on the project configuration page will display details of any errors that occur when parsing the expressions, or when evaluating them against the **Document input preview** data. You are encouraged to test your expressions thoroughly against a variety of inputs to ensure they behave as intended, before opening your project to annotators. +