smart-on-fhir · mikix · Oct 13, 2023 · Oct 11, 2023 · Oct 12, 2023 · Oct 12, 2023
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -0,0 +1,37 @@
+name: CI
+on:
+  pull_request:
+  push:
+    branches:
+      - main
+
+# The goal here is to cancel older workflows when a PR is updated (because it's pointless work)
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
+  cancel-in-progress: true
+
+jobs:
+  unittest:
+    name: unit tests
+    runs-on: ubuntu-22.04
+    strategy:
+      matrix:
+        # don't go crazy with the Python versions as they eat up CI minutes
+        python-version: ["3.10"]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install .[tests]
+
+      - name: Test with pytest
+        run: |
+          python -m pytest
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+/.idea/
+__pycache__/
diff --git a/README.md b/README.md
@@ -30,42 +30,81 @@ The most common chart-review measures agreement of the _**class_label**_ from a
 * 2 human reviewers _vs_ each other
 
 ---
-**EACH STUDY HAS STUDY-SPECIFIC COHORT** 
+### How to Install
+1. Clone this repo.
+2. Install it locally like so: `pipx install .`
 
-`config.py` defines study specific variables. 
+`chart-review` is not yet released on PyPI.
 
-  * study_folder = `/opt/cumulus/chart-review/studyname`
-  * class_labels = `['case', 'control', 'unknown', '...']`
-  * Annotators 
-  * NoteRanges
+---
+### How to Run
+
+#### Set Up Project Folder
+
+Chart Review operates on a project folder that holds your config & data.
+1. Make a new folder.
+2. Export your Label Studio annotations and put that in the folder as `labelstudio-export.json`.
+3. Add a `config.yaml` file (or `config.json`) that looks something like this (read more on this format below):
+
+```yaml
+labels:
+  - cough
+  - fever
+
+annotators:
+  jane: 2
+  john: 6
+  jack: 8
+
+ranges:
+  jane: 242-250  # inclusive
+  john: [260-271, 277]
+  jack: [jane, john]
+```
+
+#### Run
+
+Call `chart-review` with the sub-command you want and its arguments:
 
-Enum **Annotators** maps a SimpleName to LabelStudioUserId
-* human subject matter expert _like_ "Rena"
-* computer method _like_ "NLP" 
-* coded data sources _like_ "ICD10"
+`chart-review accuracy --project-dir /path/to/project/dir jane john jack`
+
+Pass `--help` to see more options.
+
+---
+### Config File Format 
+
+`config.yaml` defines study specific variables. 
+
+  * Class labels: `labels: ['cough', 'fever']`
+  * Annotators: `annotators: {'jane': 3, 'john': 8}`
+  * Note ranges: `ranges: {'jane': 40-50, 'john': [2, 3, 4, 5]}`
+
+`annotators` maps a name to a Label Studio User ID
+* human subject matter expert _like_ `jane`
+* computer method _like_ `nlp` 
+* coded data sources _like_ `icd10`
 
-Enum **NoteRanges** maps a selection of NoteID from the corpus 
-* corpus = range(1, end+1)
-* annotator1_vs_2 = Iterable
-* annotator2_vs_3 = Iterable
-* annotator3_vs_1 = Iterable
-* annotator3_vs_1 = Iterable
+`ranges` maps a selection of Note IDs from the corpus 
+* `corpus: start:end`
+* `annotator1_vs_2: [list, of, notes]`
+* `annotator2_vs_3: corpus`
 
 ---
 **BASE COHORT METHODS**
 
 `cohort.py`
-* from chartreview import _labelstudio_, _mentions_, _agree_  
+* from chart_review import _labelstudio_, _mentions_, _agree_
 
 class **Cohort** defines the base class to analyze study cohorts.
   * init(`config.py`)
 
-`mentions.py` 
+`simplify.py`
 * **rollup**(...) : return _LabelStudioExport_ with 1 "rollup" annotation replacing individual mentions
-* other methods are rarely used currently
-  * overlaps(...) : test if two mentions overlap (True/False)
-  * calc_term_freq(...) : term frequency of highlighted mention text 
-  * calc_term_label_confusion : report of exact mentions with 2+ class_labels  
+
+`mentions.py` (methods are rarely used currently)
+* overlaps(...) : test if two mentions overlap (True/False)
+* calc_term_freq(...) : term frequency of highlighted mention text
+* calc_term_label_confusion : report of exact mentions with 2+ class_labels
 
 `agree.py` get confusion matrix comparing annotators {truth, reviewer}  
 * **confusion_matrix** (truth, reviewer, ...) returns List[TruePos, TrueNeg, FalsePos, FalseNeg]  

diff --git a/chartreview/__init__.py → chart_review/__init__.py b/chartreview/__init__.py → chart_review/__init__.py
diff --git a/chartreview/agree.py → chart_review/agree.py b/chartreview/agree.py → chart_review/agree.py
@@ -1,8 +1,8 @@
 from typing import Dict, List
 from collections.abc import Iterable
 from ctakesclient.typesystem import Span
-from chartreview import mentions
-from chartreview import simplify
+from chart_review import mentions
+from chart_review import simplify
 
 def confusion_matrix(simple: dict, gold_ann: str, review_ann: str, note_range: Iterable, label_pick=None) -> Dict[str, list]:
     """

diff --git a/chart_review/cli.py b/chart_review/cli.py
@@ -0,0 +1,71 @@
+"""Run chart-review from the command-line"""
+
+import argparse
+import sys
+
+from chart_review import cohort
+from chart_review.commands.accuracy import accuracy
+
+
+###############################################################################
+#
+# CLI Helpers
+#
+###############################################################################
+
+def add_project_args(parser: argparse.ArgumentParser) -> None:
+    parser.add_argument(
+        "--project-dir",
+        default=".",
+        help="Directory holding project files, like config.yaml and labelstudio-export.json (default: current dir)",
+    )
+
+
+def define_parser() -> argparse.ArgumentParser:
+    """Fills out an argument parser with all the CLI options."""
+    parser = argparse.ArgumentParser()
+    subparsers = parser.add_subparsers(required=True)
+
+    add_accuracy_subparser(subparsers)
+
+    return parser
+
+
+###############################################################################
+#
+# Accuracy
+#
+###############################################################################
+
+def add_accuracy_subparser(subparsers) -> None:
+    parser = subparsers.add_parser("accuracy")
+    add_project_args(parser)
+    parser.add_argument("one")
+    parser.add_argument("two")
+    parser.add_argument("base")
+    parser.set_defaults(func=run_accuracy)
+
+
+def run_accuracy(args: argparse.Namespace) -> None:
+    reader = cohort.CohortReader(args.project_dir)
+    accuracy(reader, args.one, args.two, args.base)
+
+
+###############################################################################
+#
+# Main CLI entrypoints
+#
+###############################################################################
+
+def main_cli(argv: list[str] = None) -> None:
+    """Main entrypoint that wraps all the core program logic"""
+    try:
+        parser = define_parser()
+        args = parser.parse_args(argv)
+        args.func(args)
+    except Exception as exc:
+        sys.exit(str(exc))
+
+
+if __name__ == "__main__":
+    main_cli()
diff --git a/chartreview/cohort.py → chart_review/cohort.py b/chartreview/cohort.py → chart_review/cohort.py
@@ -1,30 +1,26 @@
-from typing import List
+import os
 from collections.abc import Iterable
-from enum import Enum, EnumMeta
-from chartreview.common import guard_str, guard_iter, guard_in
-from chartreview import common
-from chartreview import simplify
-from chartreview import mentions
-from chartreview import agree
+from chart_review.common import guard_str, guard_iter, guard_in
+from chart_review import common
+from chart_review import config
+from chart_review import simplify
+from chart_review import mentions
+from chart_review import agree
 
 class CohortReader:
 
-    def __init__(self, project_dir: str, annotator: EnumMeta, note_range: EnumMeta, class_labels: List[str]):
+    def __init__(self, project_dir: str):
         """
         :param project_dir: str like /opt/labelstudio/study_name
-        :param annotator: Enum.name is human-readable name like "rena" and Enum.value is LabelStudio "complete_by"
-        :param note_range: Enum.name is human-readable name like "andy_alon" and  Enum.value is LabelStudio "annotation.id"
-        :param class_labels: defined by "clinical annotation guidelines"
         """
         self.project_dir = project_dir
+        self.config = config.ProjectConfig(project_dir)
         self.labelstudio_json = self.path('labelstudio-export.json') #TODO: refactor labelstudio.py
-        self.annotator = annotator
-        self.note_range = note_range
-        self.class_labels = class_labels
+        self.annotator = self.config.annotators
+        self.note_range = self.config.note_ranges
+        self.class_labels = self.config.class_labels
         self.annotations = None
 
-        common.print_line(f'Loading(...) \n {self.labelstudio_json}')
-
         saved = common.read_json(self.labelstudio_json)
         if isinstance(saved, list):
             self.annotations = simplify.simplify_full(self.labelstudio_json, self.annotator)
@@ -38,7 +34,7 @@ def __init__(self, project_dir: str, annotator: EnumMeta, note_range: EnumMeta,
             self.annotations = compat
 
     def path(self, filename):
-        return f'{self.project_dir}/{filename}'
+        return os.path.join(self.project_dir, filename)
 
     def calc_term_freq(self, annotator) -> dict:
         """
@@ -113,15 +109,3 @@ def score_reviewer_table_dict(self, gold_ann, review_ann, note_range) -> dict:
             table[label] = self.score_reviewer(gold_ann, review_ann, note_range, label)
 
         return table
-
-    def get_config(self) -> dict:
-        as_dict = dict()
-        as_dict['class_labels'] = self.class_labels
-        as_dict['project_dir'] = self.project_dir
-        as_dict['annotation_file'] = self.labelstudio_json
-        as_dict['annotator'] = {i.name: i.value for i in self.annotator}
-        as_dict['note_range'] = {i.name: ','.join([str(j) for j in list(i.value)]) for i in self.note_range}
-        return as_dict
-
-    def write_config(self):
-        common.write_json(self.path('config.json'), self.get_config())
diff --git a/chartreview/covid_symptom/__init__.py → chart_review/commands/__init__.py b/chartreview/covid_symptom/__init__.py → chart_review/commands/__init__.py
diff --git a/chart_review/commands/accuracy.py b/chart_review/commands/accuracy.py
@@ -0,0 +1,41 @@
+"""Methods for high-level accuracy calculations."""
+
+import os
+
+from chart_review import agree, cohort, common
+
+
+def accuracy(reader: cohort.CohortReader, first_ann: str, second_ann: str, base_ann: str) -> None:
+    """
+    High-level accuracy calculation between three reviewers.
+
+    The results will be written to the project directory.
+
+    :param reader: the cohort configuration
+    :param first_ann: the first annotator to compare
+    :param second_ann: the second annotator to compare
+    :param base_ann: the base annotator to compare the others against
+    """
+    # Grab ranges
+    first_range = reader.config.note_ranges[first_ann]
+    second_range = reader.config.note_ranges[second_ann]
+
+    # All labels first
+    first_matrix = reader.confusion_matrix(first_ann, base_ann, first_range)
+    second_matrix = reader.confusion_matrix(second_ann, base_ann, second_range)
+    whole_matrix = agree.append_matrix(first_matrix, second_matrix)
+    table = agree.score_matrix(whole_matrix)
+
+    # Now do each labels separately
+    for label in reader.class_labels:
+        first_matrix = reader.confusion_matrix(first_ann, base_ann, first_range, label)
+        second_matrix = reader.confusion_matrix(second_ann, base_ann, second_range, label)
+        whole_matrix = agree.append_matrix(first_matrix, second_matrix)
+        table[label] = agree.score_matrix(whole_matrix)
+
+    # And write out the results
+    output_stem = os.path.join(reader.project_dir, f"accuracy-{first_ann}-{second_ann}-{base_ann}")
+    common.write_json(f"{output_stem}.json", table)
+    print(f"Wrote {output_stem}.json")
+    common.write_text(f"{output_stem}.csv", agree.csv_table(table, reader.class_labels))
+    print(f"Wrote {output_stem}.csv")
diff --git a/chartreview/common.py → chart_review/common.py b/chartreview/common.py → chart_review/common.py