Skip to content

Commit

Permalink
Merge pull request #20 from smart-on-fhir/mikix/docs
Browse files Browse the repository at this point in the history
docs: add initial user docs
  • Loading branch information
mikix authored Jan 18, 2024
2 parents c55f057 + f13de0c commit a332de8
Show file tree
Hide file tree
Showing 9 changed files with 400 additions and 149 deletions.
4 changes: 4 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

### Checklist
- [ ] Consider if documentation (like in `docs/`) needs to be updated
- [ ] Consider if tests should be added
24 changes: 24 additions & 0 deletions .github/workflows/pages.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Update Cumulus docs
on:
push:
branches: ["main"]
paths: ["docs/**"]

jobs:
update-docs:
name: Update Cumulus docs
runs-on: ubuntu-latest
steps:
- name: Send workflow dispatch
uses: actions/github-script@v7
with:
# This token is set to expire in May 2024.
# You can make a new one with write access to Actions on the cumulus repo.
github-token: ${{ secrets.CUMULUS_DOC_TOKEN }}
script: |
await github.rest.actions.createWorkflowDispatch({
owner: 'smart-on-fhir',
repo: 'cumulus',
ref: 'main',
workflow_id: 'pages.yaml',
})
30 changes: 30 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,35 @@
# Contributing to Chart Review

First off, thank you!
Read on below for tips on getting involved with the project.

## Talk to Us

If something annoys you, it probably annoys other folks too.
Don't be afraid to suggest changes or improvements!

Not every suggestion will align with project goals,
but even if not, it can help to talk it out.

Look at [open issues](https://github.com/smart-on-fhir/chart-review/issues),
and if you don't see your concern,
[file a new issue](https://github.com/smart-on-fhir/chart-review/issues/new)!

## Set up your dev environment

To use the same dev environment as us, you'll want to run these commands:
```sh
pip install .[dev]
pre-commit install
```

This will install dependencies & build tools,
as well as set up a `black` auto-formatter commit hook.

## Vocabulary

Here is a quick introduction to some terminology you'll see in the source code.

### Labels
- **Label**: a tag that can be applied to a word, like "Fever" or "Ideation".
These are often applied by humans during a chart review in Label Studio,
Expand Down
179 changes: 30 additions & 149 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,165 +1,46 @@
# chart-review
Measure agreement between two "_reviewers_" from the "_confusion matrix_"
# Chart Review

**Measure agreement between chart annotators.**

Whether your chart annotations come from humans, machine-learning, or coded data like ICD-10,
`chart-review` can compare them to reveal interesting statistics like:

**Accuracy**
* F1-score ([agreement](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/))
* [Sensitivity and Specificity](https://en.wikipedia.org/wiki/Sensitivity_and_specificity)
* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship))
* False Negative Rate (FNR)
* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship)
* False Negative Rate (FNR)

**Confusion Matrix**
**Confusion Matrix**
* TP = True Positive (type I error)
* TN = True Negative (type II error)
* FP = False Positive
* FN = False Negative

**Power Calculations** for sample size estimation
* Power = 1 - FNR
* FNR = FN / (FN + TP)


---
**CHART-REVIEW** here is defined as "reading" and "annotating" (highlighting) medical notes to measure accuracy of a measurement.
Measurements can establish the reliability of ICD10, or the reliable utility of NLP to automate labor intensive process.

Agreement among 2+ human subject matter expert reviewers is considered the defacto gold-standard for ground-truth labeling, but cannot be done manually at scale.

The most common chart-review measures agreement of the _**class_label**_ from a careful list of notes
* 1 human reviewer _vs_ ICD10 codes
* 1 human reviewer _vs_ NLP results
* 2 human reviewers _vs_ each other

---
### How to Install
1. Clone this repo.
2. Install it locally like so: `pipx install .`

`chart-review` is not yet released on PyPI.

---
### How to Run

#### Set Up Project Folder
* FP = False Positive
* FN = False Negative

Chart Review operates on a project folder that holds your config & data.
1. Make a new folder.
2. Export your Label Studio annotations and put that in the folder as `labelstudio-export.json`.
3. Add a `config.yaml` file (or `config.json`) that looks something like this (read more on this format below):
## Documentation

```yaml
labels:
- cough
- fever
For guides on installing & using Chart Review,
[read our documentation](https://docs.smarthealthit.org/cumulus/chart-review/).

annotators:
jane: 2
john: 6
jack: 8
## Example

ranges:
jane: 242-250 # inclusive
john: [260-271, 277]
jack: [jane, john]
```
#### Run
Call `chart-review` with the sub-command you want and its arguments:

For Jane as truth for Jack's annotations:
```shell
chart-review accuracy jane jack
```

For Jack as truth for John's annotations:
```shell
chart-review accuracy jack john
$ ls
config.yaml labelstudio-export.json

$ chart-review accuracy jane john
accuracy-jane-john:
F1 Sens Spec PPV NPV TP FN TN FP Label
0.889 0.8 1.0 1.0 0.5 4 1 1 0 *
1.0 1.0 1.0 1.0 1.0 1 0 1 0 Cough
0 0 0 0 0 2 0 0 0 Fatigue
0 0 0 0 0 1 1 0 0 Headache
```

Pass `--help` to see more options.

---
### Config File Format

`config.yaml` defines study specific variables.

* Class labels: `labels: ['cough', 'fever']`
* Annotators: `annotators: {'jane': 3, 'john': 8}`
* Note ranges: `ranges: {'jane': 40-50, 'john': [2, 3, 4, 5]}`
## Contributing

`annotators` maps a name to a Label Studio User ID
* human subject matter expert _like_ `jane`
* computer method _like_ `nlp`
* coded data sources _like_ `icd10`

`ranges` maps a selection of Note IDs from the corpus
* `corpus: start:end`
* `annotator1_vs_2: [list, of, notes]`
* `annotator2_vs_3: corpus`
We love 💖 contributions!

#### External Annotations

You may have annotations from NLP or coded FHIR data that you want to compare against.
Easy!

Set up your config to point at a CSV file in your project folder that holds two columns:
- DocRef ID (real or anonymous)
- Label

```yaml
annotators:
human: 1
external_nlp:
filename: my_nlp.csv
```

When `chart-review` runs, it will inject the external annotations and match up the DocRef IDs
to Label Studio notes based on metadata in your Label Studio export.

---
**BASE COHORT METHODS**

`cohort.py`
* from chart_review import _labelstudio_, _mentions_, _agree_

class **Cohort** defines the base class to analyze study cohorts.
* init(`config.py`)

`simplify.py`
* **rollup**(...) : return _LabelStudioExport_ with 1 "rollup" annotation replacing individual mentions

`term_freq.py` (methods are rarely used currently)
* overlaps(...) : test if two mentions overlap (True/False)
* calc_term_freq(...) : term frequency of highlighted mention text
* calc_term_label_confusion : report of exact mentions with 2+ class_labels

`agree.py` get confusion matrix comparing annotators {truth, annotator}
* **confusion_matrix** (truth, annotator, ...) returns List[TruePos, TrueNeg, FalsePos, FalseNeg]
* **score_matrix** (matrix) returns dict with keys {F1, Sens, Spec, PPV, NPV, TP,FP,TN,FN}

`labelstudio.py` handles LabelStudio JSON

Class **LabelStudioExport**
* init(`labelstudio-export.json`)

Class **LabelStudioNote**
* init(...)

`publish.py` tables and figures for PubMed manuscripts
* table_csv(...)
* table_json(...)

---
**NICE TO HAVES LATER**

* **_confusion matrix_** type support using Pandas
* **score_matrix** would be nicer to use a Pandas strongly typed class

---
### Set up your dev environment

To use the same dev environment as us, you'll want to run these commands:
```sh
pip install .[dev]
pre-commit install
```
If you have a good suggestion 💡 or found a bug 🐛,
[read our brief contributors guide](CONTRIBUTING.md)
for pointers to filing issues and what to expect.
6 changes: 6 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Chart Review Documentation

These documents are meant to be built as one part of the larger body of
[Cumulus documentation](https://docs.smarthealthit.org/cumulus).

To test changes here locally, read more at the [Cumulus docs repo](https://github.com/smart-on-fhir/cumulus).
46 changes: 46 additions & 0 deletions docs/accuracy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Accuracy Command
parent: Chart Review
nav_order: 5
# audience: lightly technical folks
# type: how-to
---

# The Accuracy Command

The `accuracy` command will print agreement statistics like F1 scores and confusion matrices
for every label in your project, between two annotators.

Provide two annotator names (the first name will be considered the ground truth) and
your accuracy scores will be printed to the console.

## Example

```shell
$ chart-review accuracy jane john
accuracy-jane-john:
F1 Sens Spec PPV NPV TP FN TN FP Label
0.929 0.958 0.908 0.901 0.961 91 4 99 10 *
0.895 0.895 0.938 0.895 0.938 17 2 30 2 cough
0.815 0.917 0.897 0.733 0.972 11 1 35 4 fever
0.959 1.0 0.812 0.921 1.0 35 0 13 3 headache
0.966 0.966 0.955 0.966 0.955 28 1 21 1 stuffy-nose
```

## Options

### `--config=PATH`

Use this to point to a secondary (non-default) config file.
Useful if you have multiple label setups (e.g. one grouped into a binary label and one not).

### `--project-dir=DIR`

Use this to run `chart-review` outside of your project dir.
Config files, external annotations, etc will be looked for in that directory.

### `--save`

Use this to write a JSON and CSV file to the project directory,
rather than printing to the console.
Useful for passing results around in a machine-parsable format.
Loading

0 comments on commit a332de8

Please sign in to comment.