diff --git a/README.md b/README.md index c39c4dc..31e59a9 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,17 @@ # Cumulus Library - Covid -A collection of tables for generating bioinfomatics data for studying COVID-19 symptoms. -Part of the [SMART on FHIR Cumulus Project](https://smarthealthit.org/cumulus-a-universal-sidecar-for-a-smart-learning-healthcare-system/) +A collection of tables for generating bioinformatics data for studying COVID-19 symptoms. +Part of the [SMART on FHIR Cumulus Project](https://smarthealthit.org/cumulus/). -For more information, [browse the documentation](https://docs.smarthealthit.org/cumulus/library). +For more information, browse the [Cumulus Library documentation](https://docs.smarthealthit.org/cumulus/library). ## Usage To install the module, simply run `pip install cumulus-library-covid`. -This will add a `covid_symptoms` study target to `cumulus-library`. +This will add a `covid_symptom` study target to `cumulus-library`. + +See [RUNNING.md](RUNNING.md) for more details. ## Publications diff --git a/RUNNING.md b/RUNNING.md new file mode 100644 index 0000000..91598eb --- /dev/null +++ b/RUNNING.md @@ -0,0 +1,158 @@ +# Running the COVID study + +This guide will help you reproduce the COVID study from scratch. + +This includes not only the SQL in this Cumulus Library study, +but also the chart review side of things. + +## Prerequisites + +- An existing Cumulus stack, with an already-built `core` study. + - See the general [Cumulus documentation](https://docs.smarthealthit.org/cumulus/) + for setting that up. +- Install this repo with `pip install cumulus-library-covid` + +## 1. Prepare your data + +This study operates on DocumentReference resources +(it runs NLP on the referenced clinical notes). +So we need to make sure you've got those handy. + +Gather some DocumentReference ndjson from your EHR. +You can either re-export the documents of interest, +or use ndjson from a previous export. + +If you are choosing a subset of documents, +make sure to pull resources between March 2020 and June 2022. +That's the study period of interest. + +Place the ndjson in a folder using filenames like `*.DocumentReference.*.ndjson`. + +## 2. Run the ETL & Library study + +- There are [separate instructions](https://docs.smarthealthit.org/cumulus/etl/studies/covid-symptom.html) + for running the ETL and this COVID study's NLP +- You should probably re-run your Cumulus AWS Glue crawler at this point, + to pick up this new NLP table and its schema. +- Then run this study with [cumulus-library](https://docs.smarthealthit.org/cumulus/library/) + like so: `cumulus-library ... -t covid_symptom` + +You should now have all the interesting results sitting in Athena. + +## 3. Export from Athena + +In Athena's web console, run these commands and download the CSV results, +using the given filenames (we will refer back to these filenames later): +- **ctakes.csv**: `select encounter_ref, symptom_display from covid_symptom__symptom_ctakes_negation` +- **docrefs.csv**: `select distinct docref_id from covid_symptom__symptom_ctakes_negation` +- **icd10.csv**: `select encounter_ref, substring(icd10_display, 7) as symptom_display from covid_symptom__symptom_icd10` + +And with that, the natural language processing of notes is finished. +The rest of this guide will be about setting up a chart review for human comparison with NLP. + +## 4. Configure Label Studio + +- Install Label Studio according to [their docs](https://labelstud.io/guide/install.html). +- Create a new project, named however you like. + - Skip the Data Import tab. + - On the Label Setup tab, click "Custom template" on the bottom left and enter this config: +``` + + + + + +``` + +Once created, you will be looking at an empty project page. +Take note of the new URL, you'll need to know the Label Studio project ID later +(the number after `/projects/` in the URL). + +## 5. Upload notes to Label Studio + +- Review the Cumulus ETL [upload-notes docs](https://docs.smarthealthit.org/cumulus/etl/chart-review.html) +- You'll want to run `upload-notes` with the following options: +```shell +cumulus-etl upload-notes ... \ + \ +