Resources for ILAE Big Data Commission
- Epilepsy and OMOP
- OHDSI_Resources
- OMOP CDM Basic Data Dictionary
- Projects Best Suited for Observational Research and OHDSI Network Studies
- Analytic Use Cases and Examples
- Incremental Loading
- Data Content Ontology
- Current CDM
- Commonly Used CDM Tables Overview
- ETL Basics
- OHDSI Analysis Tools
- Data Science Handbook
- Programming Resources: Jupyter Notebooks, Python, R, and SQL
- Recommended Trainings
- Analysis with SQL
- Analysis with R
- Diversity, Equity, and Inclusion Resources
Title | Journal | Creation Date | Authors |
---|---|---|---|
Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network | Epilepsia | 2017/07/07 06:00 | Duke, Jon D | Ryan, Patrick B | Suchard, Marc A | Hripcsak, George | Jin, Peng | Reich, Christian | Schwalm, Marie-Sophie | Khoma, Yuriy | Wu, Yonghui | Xu, Hua | Shah, Nigam H | Banda, Juan M | Schuemie, Martijn J |
Characterization of Anti-seizure Medication Treatment Pathways in Pediatric Epilepsy Using the Electronic Health Record-Based Common Data Model | Frontiers in neurology | 2020/06/02 06:00 | Kim, Hunmin | Yoo, Sooyoung | Jeon, Yonghoon | Yi, Soyoung | Kim, Seok | Choi, Sun Ah | Hwang, Hee | Kim, Ki Joong |
Patient characteristics and antiseizure medication pathways in newly diagnosed epilepsy: Feasibility and pilot results using the common data model in a single-center electronic medical record database | Epilepsy & behavior : E&B | 2022/03/11 20:13 | Spotnitz, Matthew | Ostropolets, Anna | Castano, Victor G | Natarajan, Karthik | Waldman, Genna J | Argenziano, Michael | Ottman, Ruth | Hripcsak, George | Choi, Hyunmi | Youngerman, Brett E |
Identification of patients with drug-resistant epilepsy in electronic medical record data using the Observational Medical Outcomes Partnership Common Data Model | Epilepsia | 2022/09/15 03:02 | Castano, Victor G | Spotnitz, Matthew | Waldman, Genna J | Joiner, Evan F | Choi, Hyunmi | Ostropolets, Anna | Natarajan, Karthik | McKhann, Guy M | Ottman, Ruth | Neugut, Alfred I | Hripcsak, George | Youngerman, Brett E |
Conversion of the Canadian Observational Study on Epilepsy (CANOE) REDCap Registry to the OMOP Common Data Model | 2023 OHDSI Symposium Showcase | 2023/10/28 | Boyce, Danielle | Josephson, Colin Bruce | Jiang, Ray | Wiebe, Samuel |
For a sample interactive OMOP data dictionary, please click on the image below:
Source: https://www.ohdsi.org/wp-content/uploads/2023/01/SOS-challenge-intro-24jan2023.pdf
Analytic use case | Type | Structure | Example |
---|---|---|---|
Clinical characterization | Disease Natural History | Amongst patients who are diagnosed with <insert your favorite disease> , what are the patient’s characteristics from their medical history? |
Amongst patients with rheumatoid arthritis, what are their demographics (age, gender), prior conditions, medications, and health service utilization behaviors? |
Treatment utilization | Amongst patients who have <insert your favorite disease> , which treatments were patients exposed to amongst <list of treatments for disease> and in which sequence? |
Amongst patients with depression, which treatments were patients exposed to SSRI, SNRI, TCA, bupropion, esketamine and in which sequence? | |
Outcome incidence | Amongst patients who are new users of <insert your favorite drug> , how many patients experienced <insert your favorite known adverse event from the drug profile> within <time horizon following exposure start> ? |
Amongst patients who are new users of methylphenidate, how many patients experienced psychosis within 1 year of initiating treatment? | |
Population-level effect estimation | Safety surveillance | Does exposure to <insert your favorite drug> increase the risk of experiencing <insert an adverse event> within <time horizon following exposure start> ? |
Does exposure to ACE inhibitor increase the risk of experiencing Angioedema within 1 month after exposure start? |
Comparative effectiveness | Does exposure to <insert your favorite drug> have a different risk of experiencing <insert any outcome (safety or benefit)> within <time horizon following exposure start> , relative to <insert your comparator treatment> ? |
Does exposure to ACE inhibitor have a different risk of experiencing acute myocardial infarction while on treatment, relative to thiazide diuretic? | |
Patient level prediction | Disease onset and progression | For a given patient who is diagnosed with <insert your favorite disease> , what is the probability that they will go on to have <another disease or related complication> within <time horizon from diagnosis> ? |
For a given patient who is newly diagnosed with atrial fibrillation, what is the probability that they will go onto to have ischemic stroke in next 3 years? |
Treatment response | For a given patient who is a new user of <insert your favorite chronically-used drug> , what is the probability that they will <insert desired effect> in <time window> ? |
For a given patient with T2DM who start on metformin, what is the probability that they will maintain HbA1C <6.5% after 3 years? | |
Treatment safety | For a given patient who is a new user of <insert your favorite drug> , what is the probability that they will experience <insert adverse event> within <time horizon following exposure> ? |
For a given patient who is a new user of warfarin, what is the probability that they will have GI bleed in 1 year? |
This document provides a detailed overview of several essential clinical terminologies and coding systems used in healthcare. Each system has a specific role and is crucial for standardized communication in healthcare settings. The information includes development history, usage, and updates of these systems.
For more in-depth information, links to the respective official websites are provided.
- Development: Originally by the College of American Pathologists, now under SNOMED International.
- Adoption: Used in over 50 countries.
- Concepts: Over 340,000 active concepts in 19 hierarchies.
- Usage: Encodes clinical information including diseases, findings, and procedures.
- Updates: Biannual, with more frequent updates planned.
- More Information: SNOMED International
- Developer: Regenstrief Institute.
- Function: Identifiers for laboratory and clinical observations.
- Content: Over 90,000 terms.
- Collaboration: With SNOMED CT for coded content development.
- Updates: Biannual.
- More Information: LOINC
- Developer: National Library of Medicine (NLM).
- Function: Standard nomenclature for medications.
- Integration: Links to various drug vocabularies.
- Access: Requires UMLS user license for proprietary content.
- More Information: RxNorm - NLM
- Endorsement: World Health Organization (WHO).
- Versions: ICD-10 widely used with national extensions; ICD-11 adopted for future use.
- Purpose: Epidemiology, health management, clinical purposes.
- Updates: Annual, freely available.
- More Information: WHO ICD
- Developer: American Medical Association (AMA).
- Use: Encoding of medical services and procedures in the USA.
- Categories: Three categories of codes.
- Requirement: License from AMA for use.
- More Information: CPT - AMA
- Function: Bioinformatic resources for human diseases and phenotypes analysis.
- Components: Phenotype vocabulary, disease-phenotype annotations, algorithms.
- Applications: Genomic interpretation, gene-disease discovery, precision medicine.
- Content: Over 13,000 terms in 5 hierarchies.
- Availability: Freely available, multiple releases per year.
- More Information: Human Phenotype Ontology
- Initiation: By the US National Library of Medicine in 1986.
- Goal: To aid in the retrieval and integration of electronic biomedical information.
- Challenge Addressed: Different vocabularies expressing the same information differently.
- Availability: Free, but requires a license due to additional licensing requirements of some contents.
- More Information: UMLS - NLM
-
Process: Finding the closest match of a code from one ontology in another.
-
Matching: Exact equivalence is rare; approximate matching is common.
-
Challenges: Labor-intensive and requires understanding the maps' nature and limitations.
-
Alternative Approach: Mapping multiple ontologies to a central core terminology, as used by the OHDSI consortium.
-
More Information: BioPortal
graph LR
ICD9("ICD9") -->|Transformation to OMOP CDM| SNOMED("STANDARD<br>Vocabulary Concept Code<br>SNOMED")
ICD10("ICD10") -->|Transformation to OMOP CDM| SNOMED
Domain | Source Vocabulary | Standard Vocabulary |
---|---|---|
Conditions | ICD9, ICD10 | SNOMED |
Measurements | LOINC or institutional specific codes | LOINC |
Drugs | NDC | RxNORM |
Procedures | ICD9, ICD10, CPT | SNOMED |
-
ICD = International Classification of Diseases
-
SNOMED = Systematized Nomenclature of Medicine
-
LOINC = Logical Observation Identifiers Names and Codes
-
NDC = National Drug Code
-
CPT = Current Procedural Terminology
Incremental loading in the context of OHDSI refers to the process of adding new or updated data to an existing OHDSI database without the need to completely rebuild or refresh the entire dataset. This can be particularly useful for large datasets where full loads can be time-consuming and inefficient. The process involves extracting only the changes since the last load and then transforming and loading this delta of data into the existing OMOP Common Data Model (CDM) used by OHDSI tools.
For instance, in the development of an ETL (Extract, Transform, Load) process for the bulk and incremental load of German patient data into the OMOP CDM using FHIR as referenced by OHDSI, it suggests that the incremental loading is an essential part of keeping the database up-to-date in an efficient manner. OHDSI Symposium Showcase #44
This group also described a Near Real-Time Incremental OMOP-CDM ETL System
This is also described by Dr. DuWayne Willett, CMIO of UTSW, at around minute 30 of this video:
...and in this OHDSI symposium presentation: .
## Current CDM ![CDM54 Image](https://github.com/DBJHU/DBJHU.github.io/blob/main/cdm54.png)Source: OHDSI Common Data Model
The OMOP common data model (CDM) is a relational database made up of different tables that relate to each other by foreign keys (XXXX_ID values; e.g., PERSON_ID or PROVIDER_ID). The OMOP tables in your data export are as follows:
Table | Description |
---|---|
Person | Contains basic demographic information describing a participant, including biological sex, birth date, race, and ethnicity. |
Visit_occurrence | Captures encounters with healthcare providers or similar events. Contains the type of visit a person has (outpatient care, inpatient care, or long-term care), as well as the date and duration information. Rows in other tables can reference this table, for example, condition_occurrences related to a specific visit. |
Condition_occurrence | Indicates the presence of a disease or medical condition stated as a diagnosis, a sign, or symptom, which is either observed by a provider or reported by the patient. |
Drug_exposure | Captures records about the utilization of a medication. Drug exposures include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies. Radiological devices ingested or applied locally do not count as drugs. Drug exposure is inferred from clinical events associated with orders, prescriptions written, pharmacy dispensing, procedural administrations, and other patient-reported information. |
Measurement | Contains both orders and results of a systematic and standardized examination or testing of a participant or participant's sample, including laboratory tests, vital signs, quantitative findings from pathology reports, etc. |
Procedure_occurrence | Contains records of activities or processes ordered by or carried out by a healthcare provider on the patient to have a diagnostic or therapeutic purpose. |
Observation | Captures clinical facts about a person obtained in the context of an examination, questioning, or a procedure. Any data that cannot be represented by another domain, such as social and lifestyle facts, medical history, and family history, are recorded here. |
Device_exposure | Captures information about a person's exposure to a foreign physical object or instrument which is used for diagnostic or therapeutic purposes. Devices include implantable objects, blood transfusions, medical equipment and supplies, other instruments used in medical procedures, and material used in clinical care. |
Death | Contains the clinical events surrounding how and when a participant dies. |
The Book of OHDSI - Chapter 15: Data Quality
https://www.ohdsi.org/wp-content/uploads/2019/09/OMOP-Common-Data-Model-Extract-Transform-Load.pdf https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html
-
Dataset profiling and documentation
- Create data model documentation, sample data, data dictionaries, code lists, and other relevant information
- Execute database profiling scan (WhiteRabbit) on source database
- Prepare mapping approach/documents based on scan reports from database profiling scan
-
Generation of the ETL Design
- Mapping workshop with all relevant parties to:
- Understand the source
- Define the scope of source data to be transformed
- Define acceptance criteria for OMOP output
- Output: draft mapping document
- Finalize mapping document:
- Integrate all notes/documentation from workshop
- Work through mappings and verify, update, fill in gaps
- Meetings/emails with data contact/technical contact (TC) as needed
- Mapping workshop with all relevant parties to:
-
Source Data Integrations and Semantic Mapping
- Source Code mapping:
- Identify which codes are already mapped to standard vocabulary
- Identify code types for codes that need to be mapped
- Translation of code description/phrases to English, if/as needed
- Create proposed code mappings
- Generate mappings for data coming out of flowsheets (together with consortium)
- Review/approval of code mappings, often done by medical experts affiliated with Data Owner (DO).
- Identify medical imaging available and define mappings to Imaging Extension
- Identify waveform data available and map using consortium-defined guidelines
- Use OHNLP to extract OMOP data from unstructured sources
- Source Code mapping:
-
Technical architecture design
- Continuous Integration, Continuous Deployment (CI/CD):
- Decide on ETL dev/deployment flow
- Put version control mechanisms in place
- OHDSI Ecosystem:
- Evaluate infrastructure needed
- Create infrastructure design documentation
- Continuous Integration, Continuous Deployment (CI/CD):
-
Technical ETL Development
- Implement ETL (Preferred Language/Structure?)
- Update ETL based on testing/QA/feedback (8, 9)
-
Setting up of Infrastructure
- Deploy core servers and associated services based on infrastructure design in (4)
-
Installation of the OHDSI tools
- Install and configure all software (database server, Achilles/DQD/Ares, Atlas/WebAPI, R Studio server, HADES, notebooks/tooling related to analytics, and any other software to suit a site’s specific needs).
-
ETL Testing and Validation
- ETL Execution:
- Test ETL using sample/development data (with limited external data access)
- Test ETL using DO data (with full external data access)
- Verify and document QA
- Submit Achilles/DQD/AresIndexer results to central location regularly
- ETL Development Planning and Management:
- Review ETL testing and progress (TCs/meetings)
- ETL Execution:
-
Data Quality Assessment
- QA/Acceptance testing:
- Evaluate accuracy and completeness of mapping
- Review and approval by DO
- QA/Acceptance testing:
-
Documentation
- Mapping Documentation and Themis Checks
- Transformation/Technical Documentation
-
Project Management Througout
- Organization of tasks, milestones, and follow-up
R, SQL, Python, or any preferred data analysis software. Examples provided below are for R and SQL. [The Book of OHDSI Chapter 9] (https://ohdsi.github.io/TheBookOfOhdsi/SqlAndR.html) provides an overview of analysis of OHDSI data in R and SQL; note that you will not be able to avail yourselves of OHDSI software tools when analyzing your exported data for the reason explained above.
Open, rigorous and reproducible research: A practitioner’s handbook From Standord Data Science
Software Carpentry is a website that provides free online lessons to researchers wanting to enhance their programming skills for data analysis. This website offers free online lessons on a variety of useful topics including:
Additional resources:
- DataCamp
- Khan Academy
- Codecademy - Learn Python 2
- Python Data Science Handbook
- R for Data Science
- [Introduction to Programming (NIAID, NIH)](https://bioinformatics.niaid.nih.gov/resources - 70.3.1)
- [Python Programming (NIAID, NIH)](https://bioinformatics.niaid.nih.gov/resources - 70.3.2)
- [Data Analysis with Python and Pandas (NIAID, NIH)](https://bioinformatics.niaid.nih.gov/resources - 70.3.3)
- [Data Visualization with Python (NIAID, NIH)](https://bioinformatics.niaid.nih.gov/resources - 70.3.4)
- Source:NIH All of US Study
Hello! Please familiarize yourself with the following tools and resources which will help you throughout this course and your OHDSI journey.
Check out the OHDSI Forums Introduce yourself on the "Welcome to OHDSI" thread.
Bookmark The Book of OHDSI
Check out OHOP CDM FAQ
Join the OHDSI Microsoft Teams environment.
Check out the MIMIC-IV demo data set in OMOP CDM format!
Register with EHDEN Academy
Visit the Atlas Demo and Athena.
Bookmark the OHDSI YouTube tutorials and workshops
Visit the OHDSI Community Dashboard
Bookmark OMOP Common Data Model (ohdsi.github.io)
Learn about GitHub if you don't already know.
Plan to attend an OHDSI Community call
Learn about OHDSI Workgroups
Follow OHDSI on social media: Twitter LinkedIn
Subscribe to the OHDSI Newsletter
Learn about past and upcoming OHDSI events
Learn about OHDSI software
Look up individual concepts in Athena
Check out useful OHDSI-related documentation here: NIH ALL of US OMOP Documentation
Clinical Registries in OHDSI - September 2022
Matentzoglu N, Balhoff JP, Bello SM, Bizon C, BrushM, Callahan TJ et al. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database. 2022. 2022:baac035, DOI: 10.1093/database/baac035.
Mapping Commons. SSSOM: Simple Standard for Sharing Ontological Mappings. Wiki [Internet]. Available from: https://mapping-commons.github.io/sssom/about.
Mapping Commons. SSSOM: Simple Standard for Sharing Ontological Mappings. GitHub [Internet]. Available from: https://github.com/mapping-commons/sssom.
https://www.w3.org/2004/02/skos/
-
July 6, 2023:
Data Quality Dashboard output demo
By: Jared Houghtaling
-
July 27, 2023:
Flowsheet follow-up
By: Polina Talapova & Jared Houghtaling
-
August 3, 2023:
OMOP Standardized Vocabularies - Part 1
By: Jared Houghtaling and Polina Talapova
-
August 17, 2023:
OMOP Standardized Vocabularies - Part 2
By: Polina Talapova
-
August 24, 2023:
How to download and set-up a DDL (Demo)
By: Jared Houghtaling
-
August 31, 2023:
Demo of WhiteRabbit and RabbitInAHat
By: Jared Houghtaling
-
September 7, 2023:
ARES usefulness for ETL at Tufts
By: Jared Houghtaling
-
September 14, 2023:
Google form introduction for site progress tracking
By: Jared Houghtaling
-
September 21, 2023:
Sample ETL Process
By: Jared Houghtaling
October 12, 2023:
Google Form for Site Progress Tracking
With Jared Houghtaling and Andrew Williams
-
October 26, 2023:
Review and Prioritization of DQD Results, and Discussion of DQD Issue Severity
With Jared Houghtaling
-
November 2, 2023:
Principles of Mapping and Vocab Gaps Identification
With Polina Talapova
-
November 9, 2023:
Usagi & STCM Demo
With Polina Talapova & Jared Houghtailing
- How racial biases in medical algorithms lead to inequities in care | PBS News Weekend
- AI Reveals its Biases by Generating What it Thinks Professors Look Like | PetaPixel
- Does “AI” stand for augmenting inequality in the era of covid-19 healthcare? | The BMJ
- The AUC Data Science | Initiative (aucenter.edu)
- ReCode-Report.pdf (data.org)
- ADDI Researcher Roundtable: The Importance of Diversity in Dementia Research on Vimeo
- Advancing Antiracism, Diversity, Equity, and Inclusion in STEMM Organizations: Beyond Broadening Participation |The National Academies Press
- Data Literacy: The Composition Effect | Education | St. Louis Fed (stlouisfed.org)
- A lecture by Pilar Ossorio at MLHC Professor of Law and Bioethics at the University of Wisconsin Law School
- This paper by Dunkelau and Leuschel that summarizes Fairness-aware Machine Learning
- Paper by Gichoya et al about minimixing bias in AI
- A conversation with Cathy O’Neil, author of the critically acclaimed Weapons of Math Destruction - YouTube
- Nicole G. Weiskopf (NYU) and Carolyn Thompson (UCSD) – Bias in EHR - YouTube
- This Obermeyer et al paper on dissecting racial bias in an algorithm used to manage the health of populations - Science, 2019
- Fair ML Keynote talk + Microsoft talk + slides - Science, 2019
- Maria Hightower, M.D., M.B.A., MPH Chief Digital Technology officer of the University of Chicago Medicine comments on the racial bias in AI and the algorithm described above - Science, 2019
- How scientists are subtracting race from medical risk calculators - Science, 2021
- This paper summarizes how being aware of a protected class can lead to differences in fairness outcomes - Science
- Race After Technology, by Ruha Benjamin [Professor of African American studies at Princeton University], summarizes how technology [from data collection, data imputation, government policy, etc] can play a role in different outcomes in society - Science
The OMOP Query Library is a library of commonly-used SQL queries for the OMOP Common Data Model (CDM).
Below are some sample R queries that demonstrate how to read in OMOP tables from CSV files, join them based on the person_id
and visit_occurrence_id
fields, and search for specific criteria.
Note: Adjust the file paths and column names accordingly based on the actual structure and location of your CSV files. The queries below are a generic representation and may need adjustments based on the specifics of your data set.
# Read the CSV files into R data frames
person_df <- read.csv("path_to_person_table.csv", header=TRUE, stringsAsFactors=FALSE)
visit_occurrence_df <- read.csv("path_to_visit_occurrence_table.csv", header=TRUE, stringsAsFactors=FALSE)
condition_occurrence_df <- read.csv("path_to_condition_occurrence_table.csv", header=TRUE, stringsAsFactors=FALSE)
Join tables based on person_id
:
When a person has multiple visits in the visit_occurrence
table, joining the person
table with the visit_occurrence
table will result in multiple rows for that person, each corresponding to a different visit. This is a standard one-to-many join operation.
## Join person with visit_occurrence on 'person_id'
person_visit_df <- merge(person_df, visit_occurrence_df, by="person_id")
# Join the person-visit result with condition_occurrence on both 'person_id' and 'visit_occurrence_id'
full_df <- merge(person_visit_df, condition_occurrence_df, by=c("person_id", "visit_occurrence_id"))
# Define a list of person_ids to search for
search_person_ids <- c(1, 2, 3, 4, 5)
# Filter the data frame to only include rows with person_ids in the list
filtered_by_person_df <- subset(full_df, person_id %in% search_person_ids)
# Define a specific condition concept code to search for
search_condition_concept_id <- 1234567
# Filter the data frame to only include rows with the specified condition concept code
filtered_by_condition_df <- subset(full_df, condition_concept_id == search_condition_concept_id)
# Define a date range to search for
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2020-12-31")
## Filter the data frame to only include rows within the date range
filtered_by_date_df <- subset(full_df, visit_start_date >= start_date & visit_start_date <= end_date)