Data curation repository based on 4yr_project_glucose
by Zhiyao Luo.
Update: This project was presented at the Institute of Biomedical Engineering, University of Oxford on Sep 2, 2024. You can find the presentation slides here.
To run this project locally, the following is required:
- Access to the MIMIC-III Clinical Database, which involves becoming a credentialed user and signing a data use agreement (find out more from the MIMIC documentation);
- Access to the Curated Data for Describing Blood Glucose Management in the Intensive Care Unit, with similar involvements; and
- Access to RxNav-in-a-Box via a Unified Medical Language System (UMLS) license agreement.
- Access to the Logical Observation Identifiers Names and Codes (LOINC) database.
Note: The RxNorm and RxClass APIs used inside RxNav-in-a-Box must be run locally. Read more at Classifying prescriptions.
You can read documentation articles without running the project locally. Essential plots utilised in the literature are pushed to enable this.
From the RxNav-in-a-Box README.txt:
- "12 gigabytes of memory to devote to a container platform (e.g., Docker)
- 100 gigabytes of disk space
- Docker Desktop, or another OCI-compatible platform (in which case you may take the included docker-compose.yml file as an example)."
From
the mimic-code
repository:
"Loading the data into a PostgreSQL database requires around ~47 GB of space. The addition of [optional] indexes adds another 26 GB. You will likely want to reserve 100 GB for the entire database."
If running both the RxNav-in-a-Box and MIMIC-III databases locally, ensure that you have enough disk space and memory:
- >200GB disk space, and
- >12GB memory.
Docker Desktop is recommended for running RxNav-in-a-Box locally.
- Clone this repository.
- Download the
.zip
file containing the dataset Curated Data for Describing Blood Glucose Management in the Intensive Care Unit (version 1.0.1) from physionet.org, and uncompress it in the directory as the clone of this repository. - Download the
.zip
file containing the LOINC database from loinc.org and uncompress it in the same directory as the clone of this repository. - Follow the instructions on mimic.mit.edu to
install MIMIC-III to a local PostgreSQL
database. Update the
.env
file with your database credentials. - Set up your Python virtual environment.
- Install required packages using
pip install -r requirements.txt
. - Set your environment variables in
.env
. - Run the
curation
module inside your virtual environment usingpython -m curation
.
- Curating demographics
- Classifying prescriptions using RxNorm and RxClass
- Classifying lab events using LOINC
- Calculating weights and heights
- Caching
name or flags | type | default | description |
---|---|---|---|
-l , --log-level |
str |
warning |
The log level. |
-m , --max-identifier-count |
int |
-1 |
The maximum number of unique ICU stays identifiers. Any number less than or equal to zero will not limit the number of identifiers, and all will be used. |
-c , --chunk_size |
int |
1000 |
The chunk size to use when querying the database. |
- Johnson A, Pollard T, Mark R. MIMIC-III Clinical Database (version 1.4). PhysioNet. 2016. Available from: https://doi.org/10.13026/C2XW26.
- Robles Arévalo A, Mateo-Collado R, Celi L A. Curated Data for Describing Blood Glucose Management in the Intensive Care Unit (version 1.0.1). PhysioNet. 2021. Available from: https://doi.org/10.13026/517s-2q57.
- Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R. Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc [Internet]. 2011 [cited 2024 Aug 1];18(4):441–8. Available from: https://academic.oup.com/jamia/article-lookup/doi/10.1136/amiajnl-2011-000116
- Vreeman DJ, McDonald CJ, Huff SM. LOINC® - A Universal Catalog of Individual Clinical Observations and Uniform Representation of Enumerated Collections. Int J Funct Inform Personal Med. 2010;3(4):273–91.
Tahmid Azam, [email protected]