diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
new file mode 100644
index 0000000..6bcf50f
--- /dev/null
+++ b/.github/pull_request_template.md
@@ -0,0 +1,4 @@
+
+### Checklist
+- [ ] Consider if documentation (like in `docs/`) needs to be updated
+- [ ] Consider if tests should be added
diff --git a/.github/workflows/pages.yaml b/.github/workflows/pages.yaml
new file mode 100644
index 0000000..c37311b
--- /dev/null
+++ b/.github/workflows/pages.yaml
@@ -0,0 +1,24 @@
+name: Update Cumulus docs
+on:
+  push:
+    branches: ["main"]
+    paths: ["docs/**"]
+
+jobs:
+  update-docs:
+    name: Update Cumulus docs
+    runs-on: ubuntu-latest
+    steps:
+      - name: Send workflow dispatch
+        uses: actions/github-script@v7
+        with:
+          # This token is set to expire in May 2024.
+          # You can make a new one with write access to Actions on the cumulus repo.
+          github-token: ${{ secrets.CUMULUS_DOC_TOKEN }}
+          script: |
+            await github.rest.actions.createWorkflowDispatch({
+              owner: 'smart-on-fhir',
+              repo: 'cumulus',
+              ref: 'main',
+              workflow_id: 'pages.yaml',
+            })
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 2d81844..2d9cc34 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,5 +1,35 @@
+# Contributing to Chart Review
+
+First off, thank you!
+Read on below for tips on getting involved with the project.
+
+## Talk to Us
+
+If something annoys you, it probably annoys other folks too.
+Don't be afraid to suggest changes or improvements!
+
+Not every suggestion will align with project goals,
+but even if not, it can help to talk it out.
+
+Look at [open issues](https://github.com/smart-on-fhir/chart-review/issues),
+and if you don't see your concern,
+[file a new issue](https://github.com/smart-on-fhir/chart-review/issues/new)!
+
+## Set up your dev environment
+
+To use the same dev environment as us, you'll want to run these commands:
+```sh
+pip install .[dev]
+pre-commit install
+```
+
+This will install dependencies & build tools,
+as well as set up a `black` auto-formatter commit hook.
+
 ## Vocabulary
 
+Here is a quick introduction to some terminology you'll see in the source code.
+
 ### Labels
 - **Label**: a tag that can be applied to a word, like "Fever" or "Ideation".
   These are often applied by humans during a chart review in Label Studio,
diff --git a/README.md b/README.md
index 3dd58b9..224a0a3 100644
--- a/README.md
+++ b/README.md
@@ -1,165 +1,46 @@
-# chart-review
-Measure agreement between two "_reviewers_" from the "_confusion matrix_"
+# Chart Review
+
+**Measure agreement between chart annotations.**
+
+Whether your chart annotations come from humans, machine-learning, or coded data like ICD-10,
+`chart-review` can compare them to reveal interesting statistics like:
 
 **Accuracy**
 * F1-score ([agreement](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/))
 * [Sensitivity and Specificity](https://en.wikipedia.org/wiki/Sensitivity_and_specificity)
-* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship))
-* False Negative Rate (FNR) 
+* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship)
+* False Negative Rate (FNR)
 
-**Confusion Matrix** 
+**Confusion Matrix**
 * TP = True Positive (type I error)
 * TN = True Negative (type II error)
-* FP = False Positive 
-* FN = False Negative 
-
-**Power Calculations** for sample size estimation
-* Power = 1 - FNR 
-* FNR = FN / (FN + TP) 
-
-
----
-**CHART-REVIEW** here is defined as "reading" and "annotating" (highlighting) medical notes to measure accuracy of a measurement.
-Measurements can establish the reliability of ICD10, or the reliable utility of NLP to automate labor intensive process. 
- 
-Agreement among 2+ human subject matter expert reviewers is considered the defacto gold-standard for ground-truth labeling, but cannot be done manually at scale.  
-
-The most common chart-review measures agreement of the _**class_label**_ from a careful list of notes 
-* 1 human reviewer _vs_ ICD10 codes
-* 1 human reviewer _vs_ NLP results
-* 2 human reviewers _vs_ each other
-
----
-### How to Install
-1. Clone this repo.
-2. Install it locally like so: `pipx install .`
-
-`chart-review` is not yet released on PyPI.
-
----
-### How to Run
-
-#### Set Up Project Folder
+* FP = False Positive
+* FN = False Negative
 
-Chart Review operates on a project folder that holds your config & data.
-1. Make a new folder.
-2. Export your Label Studio annotations and put that in the folder as `labelstudio-export.json`.
-3. Add a `config.yaml` file (or `config.json`) that looks something like this (read more on this format below):
+## Documentation
 
-```yaml
-labels:
-  - cough
-  - fever
+For guides on installing & using Chart Review,
+[read our documentation](https://docs.smarthealthit.org/cumulus/chart-review/).
 
-annotators:
-  jane: 2
-  john: 6
-  jack: 8
+## Example
 
-ranges:
-  jane: 242-250  # inclusive
-  john: [260-271, 277]
-  jack: [jane, john]
-```
-
-#### Run
-
-Call `chart-review` with the sub-command you want and its arguments:
-
-For Jane as truth for Jack's annotations:
-```shell
-chart-review accuracy jane jack
-```
-
-For Jack as truth for John's annotations:
 ```shell
-chart-review accuracy jack john
+$ ls
+config.yaml  labelstudio-export.json
+
+$ chart-review accuracy jane john
+accuracy-jane-john:
+F1     Sens  Spec  PPV  NPV  TP  FN  TN  FP  Label
+0.889  0.8   1.0   1.0  0.5  4   1   1   0   *
+1.0    1.0   1.0   1.0  1.0  1   0   1   0   Cough
+0      0     0     0    0    2   0   0   0   Fatigue
+0      0     0     0    0    1   1   0   0   Headache
 ```
 
-Pass `--help` to see more options.
-
----
-### Config File Format 
-
-`config.yaml` defines study specific variables. 
-
-  * Class labels: `labels: ['cough', 'fever']`
-  * Annotators: `annotators: {'jane': 3, 'john': 8}`
-  * Note ranges: `ranges: {'jane': 40-50, 'john': [2, 3, 4, 5]}`
+## Contributing
 
-`annotators` maps a name to a Label Studio User ID
-* human subject matter expert _like_ `jane`
-* computer method _like_ `nlp` 
-* coded data sources _like_ `icd10`
-  
-`ranges` maps a selection of Note IDs from the corpus 
-* `corpus: start:end`
-* `annotator1_vs_2: [list, of, notes]`
-* `annotator2_vs_3: corpus`
+We love 💖 contributions!
 
-#### External Annotations
-
-You may have annotations from NLP or coded FHIR data that you want to compare against.
-Easy!
-
-Set up your config to point at a CSV file in your project folder that holds two columns:
-- DocRef ID (real or anonymous)
-- Label
-
-```yaml
-annotators:
-  human: 1
-  external_nlp:
-    filename: my_nlp.csv
-```
-
-When `chart-review` runs, it will inject the external annotations and match up the DocRef IDs
-to Label Studio notes based on metadata in your Label Studio export.
-
----
-**BASE COHORT METHODS**
-
-`cohort.py`
-* from chart_review import _labelstudio_, _mentions_, _agree_
-
-class **Cohort** defines the base class to analyze study cohorts.
-  * init(`config.py`)
-  
-`simplify.py`
-* **rollup**(...) : return _LabelStudioExport_ with 1 "rollup" annotation replacing individual mentions
-
-`term_freq.py` (methods are rarely used currently)
-* overlaps(...) : test if two mentions overlap (True/False)
-* calc_term_freq(...) : term frequency of highlighted mention text
-* calc_term_label_confusion : report of exact mentions with 2+ class_labels
-
-`agree.py` get confusion matrix comparing annotators {truth, annotator}
-* **confusion_matrix** (truth, annotator, ...) returns List[TruePos, TrueNeg, FalsePos, FalseNeg]
-* **score_matrix** (matrix) returns dict with keys {F1, Sens, Spec, PPV, NPV, TP,FP,TN,FN}
-
-`labelstudio.py` handles LabelStudio JSON
-
-Class **LabelStudioExport**
-* init(`labelstudio-export.json`)
-
-Class **LabelStudioNote**
-* init(...)
-
-`publish.py` tables and figures for PubMed manuscripts 
-* table_csv(...)
-* table_json(...)
-
----
-**NICE TO HAVES LATER**
-
-* **_confusion matrix_** type support using Pandas
-* **score_matrix** would be nicer to use a Pandas strongly typed class
-
----
-### Set up your dev environment
-
-To use the same dev environment as us, you'll want to run these commands:
-```sh
-pip install .[dev]
-pre-commit install
-```
+If you have a good suggestion 💡 or found a bug 🐛,
+[read our brief contributors guide](CONTRIBUTING.md)
+for pointers to filing issues and what to expect.
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..9f99e5a
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,6 @@
+# Chart Review Documentation
+
+These documents are meant to be built as one part of the larger body of
+[Cumulus documentation](https://docs.smarthealthit.org/cumulus).
+
+To test changes here locally, read more at the [Cumulus docs repo](https://github.com/smart-on-fhir/cumulus).
diff --git a/docs/accuracy.md b/docs/accuracy.md
new file mode 100644
index 0000000..0476744
--- /dev/null
+++ b/docs/accuracy.md
@@ -0,0 +1,46 @@
+---
+title: Accuracy Command
+parent: Chart Review
+nav_order: 5
+# audience: lightly technical folks
+# type: how-to
+---
+
+# The Accuracy Command
+
+The `accuracy` command will print agreement statistics like F1 scores and confusion matrices
+for every label in your project, between two annotators.
+
+Provide two annotator names (the first name will be considered the ground truth) and
+your accuracy scores will be printed to the console.
+
+## Example
+
+```shell
+$ chart-review accuracy jane john
+accuracy-jane-john:
+F1     Sens   Spec   PPV    NPV    TP  FN  TN  FP  Label           
+0.929  0.958  0.908  0.901  0.961  91  4   99  10  *               
+0.895  0.895  0.938  0.895  0.938  17  2   30  2   cough     
+0.815  0.917  0.897  0.733  0.972  11  1   35  4   fever  
+0.959  1.0    0.812  0.921  1.0    35  0   13  3   headache   
+0.966  0.966  0.955  0.966  0.955  28  1   21  1   stuffy-nose
+```
+
+## Options
+
+### `--config=PATH`
+
+Use this to point to a secondary (non-default) config file.
+Useful if you have multiple label setups (e.g. one grouped into a binary label and one not).
+
+### `--project-dir=DIR`
+
+Use this to run `chart-review` outside of your project dir.
+Config files, external annotations, etc will be looked for in that directory. 
+
+### `--save`
+
+Use this to write a JSON and CSV file to the project directory,
+rather than printing to the console.
+Useful for passing results around in a machine-parsable format.
diff --git a/docs/config.md b/docs/config.md
new file mode 100644
index 0000000..35c9d95
--- /dev/null
+++ b/docs/config.md
@@ -0,0 +1,177 @@
+---
+title: Configuration
+parent: Chart Review
+nav_order: 3
+# audience: lightly technical folks
+# type: reference
+---
+
+# Configuration
+
+## File Format
+
+You can write your config file in either
+[JSON](https://en.wikipedia.org/wiki/JSON)
+or [YAML](https://en.wikipedia.org/wiki/YAML).
+Whichever you're more comfortable with.
+
+By default, Chart Review will look for either `config.json` or `config.yaml`
+in your project directory and use whichever it finds.
+
+For the remainder of this document, examples will be shown in YAML.
+
+## Alternative Configs
+You may want to experiment with different label setups for your project.
+That's easy.
+
+Just provide `--config=./path/to/config.yaml` and your
+secondary config will be used instead of the default config.
+
+## Required Fields
+
+The only truly required field is `annotators`,
+which provides a mapping from names to Label Studio ID values.
+
+Every other field has some reasonable default.
+
+## Field Definitions
+
+### `annotators`
+
+This is a required mapping of human-readable names to Label Studio IDs.
+
+#### Example
+
+Here, Alice has user ID 3 in Label Studio and Bob has the user ID 2.
+
+```yaml
+annotators:
+  alice: 3
+  bob: 2
+```
+
+#### External Annotators
+
+{: .note }
+This feature requires you to upload notes
+to Label Studio using Cumulus ETL's `upload-notes` command.
+That way the document IDs get stored correctly as Label Studio metadata.
+
+Sometimes you are working with externally-derived annotations.
+For example, from NLP or ICD10 codes.
+
+That's easy to integrate!
+Just make a CSV file with two columns:
+first an identifier for the document and second, the label.
+
+- The document identifier can be an Encounter or DocumentReference ID
+  (either the original ID or the anonymized version that Cumulus ETL creates).
+- The label should be the same kind of label you define in your config.
+- An ID can appear multiple times with different labels. All the labels will apply to that note.
+- If there are no labels for a given ID, include a line for that ID but with an empty label field.
+  That way, Chart Review will know to include that ID in its math, but with no labels.
+
+##### Example CSV
+```csv
+encounter_id,label
+abcd123,Cough
+abcd123,Fever
+efgh456,
+ijkl789,Cough
+```
+
+##### Example Config
+```yaml
+annotators:
+  icd10:
+    filename: icd10.csv
+```
+
+### `grouped-labels`
+
+This lets you bundle certain labels together into a smaller set.
+For example, you may have many labels for specific heart conditions, but
+are ultimately only interested in the binary determination of whether a patient is affected at all.
+
+This grouping happens after implied labels are expanded and before any scoring is done.
+
+The new group labels do not need to be a part of your source `labels` list.
+
+#### Example
+```yaml
+grouped-labels:
+  ill: [insomnia, chickenpox, ebola]
+```
+
+### `ignore`
+
+This lets you totally exclude some notes from annotation scoring.
+
+Sometimes notes were included in the Chart Review but are determined to be invalid for the
+purposes of the current study.
+If put in this ignore list, they won't affect the score.
+
+You can use either the Label Studio note ID directly,
+an Encounter ID (original or anonymized),
+or a DocumentReference ID (original or anonymized).
+
+#### Example
+```yaml
+ignore:
+  - abcd123
+  - 42
+```
+
+### `implied-labels`
+
+This lets you expand certain labels to a fuller set of implied labels.
+For example, you may have specific labels like `heart-attack`
+that also imply the `heart-condition` label.
+
+This expansion happens before labels are grouped and before any scoring is done.
+
+#### Example
+
+```yaml
+implied-labels:
+  cat: [animal, has-tail]
+  lion: cat
+```
+
+### `labels`
+
+This lets you restrict scoring to just this specific set of labels.
+
+Sometimes your source annotations have extra labels that aren't a part of your current analysis.
+If a label isn't in this list, it will not be scored.
+
+If this is not defined, all found labels will be used and scored.
+
+#### Example
+
+```yaml
+labels:
+  - animal
+  - cat
+  - has-tail
+  - lion
+```
+
+### `ranges`
+
+This is a mapping of note ranges for each annotator.
+By default, note ranges are automatically detected by looking at the Label Studio export.
+But it may be useful to manually define the note range in unusual cases.
+
+- You can provide a list of Label Studio note IDs.
+- You can reference other defined ranges.
+- You can specify a range of IDs with a hyphen.
+
+#### Example
+
+```yaml
+ranges:
+  alice: 13-54
+  bob: [5, 7, 14]
+  cathy: [alice, bob]
+```
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..be9a795
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,56 @@
+---
+title: Chart Review
+has_children: true
+# audience: non-programmers new to this project
+# type: explanation
+---
+
+# Chart Review
+
+**Measure agreement between chart annotations.**
+
+Whether your chart annotations come from humans, machine-learning, or coded data like ICD-10,
+Chart Review can compare them to reveal interesting statistics like:
+
+**Accuracy**
+* F1-score ([agreement](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1090460/))
+* [Sensitivity and Specificity](https://en.wikipedia.org/wiki/Sensitivity_and_specificity)
+* [Positive (PPV) or Negative Predictive Value (NPV)](https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values#Relationship)
+* False Negative Rate (FNR)
+
+**Confusion Matrix**
+* TP = True Positive (type I error)
+* TN = True Negative (type II error)
+* FP = False Positive
+* FN = False Negative
+
+**Power Calculations** for sample size estimation
+* Power = 1 - FNR
+* FNR = FN / (FN + TP)
+
+## Is This Part of Cumulus?
+
+Chart Review is developed by the same team
+and is designed to work with the
+[Cumulus project](https://docs.smarthealthit.org/cumulus/),
+but Chart Review is useful even outside of Cumulus.
+
+Some features (notably those dealing with external annotations)
+require Label Studio metadata that Cumulus ETL creates when it pushes notes
+to Label Studio using its `upload-notes` feature.
+
+But calculating accuracy between human annotators can be done entirely without the use of Cumulus.
+
+## Installing & Using
+
+```shell
+pip install chart-review
+chart-review --help
+```
+
+Read the [first-time setup docs](setup.md) for more.
+
+## Source Code
+Chart Review is open source.
+If you'd like to browse its code or contribute changes yourself,
+the code is on [GitHub](https://github.com/smart-on-fhir/chart-review).
diff --git a/docs/setup.md b/docs/setup.md
new file mode 100644
index 0000000..b3a2136
--- /dev/null
+++ b/docs/setup.md
@@ -0,0 +1,27 @@
+---
+title: Setup
+parent: Chart Review
+nav_order: 1
+# audience: lightly technical folks
+# type: how-to
+---
+
+# Setting Up Chart Review
+
+## Installing
+
+`pip install chart-review`
+
+## Make Project Directory
+
+1. Make a new directory to hold your project files.
+2. Export your Label Studio project and put the resulting JSON file
+   in this directory with the name `labelstudio-export.json`.
+3. Create a [config file](config.md) in this directory.
+
+## Run Chart Review
+
+The only current command is `accuracy`,
+which will print agreement statistics between two annotators.
+
+Read more about it in its own [accuracy command documentation](accuracy.md).
\ No newline at end of file