Skip to content

DBJHU/ILAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ILAE - TEMPORARY REPOSITORY

Resources for ILAE Big Data Commission

Table of Contents

Epilepsy and OMOP

Title Journal Creation Date Authors
Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network Epilepsia 2017/07/07 06:00 Duke, Jon D | Ryan, Patrick B | Suchard, Marc A | Hripcsak, George | Jin, Peng | Reich, Christian | Schwalm, Marie-Sophie | Khoma, Yuriy | Wu, Yonghui | Xu, Hua | Shah, Nigam H | Banda, Juan M | Schuemie, Martijn J
Characterization of Anti-seizure Medication Treatment Pathways in Pediatric Epilepsy Using the Electronic Health Record-Based Common Data Model Frontiers in neurology 2020/06/02 06:00 Kim, Hunmin | Yoo, Sooyoung | Jeon, Yonghoon | Yi, Soyoung | Kim, Seok | Choi, Sun Ah | Hwang, Hee | Kim, Ki Joong
Patient characteristics and antiseizure medication pathways in newly diagnosed epilepsy: Feasibility and pilot results using the common data model in a single-center electronic medical record database Epilepsy & behavior : E&B 2022/03/11 20:13 Spotnitz, Matthew | Ostropolets, Anna | Castano, Victor G | Natarajan, Karthik | Waldman, Genna J | Argenziano, Michael | Ottman, Ruth | Hripcsak, George | Choi, Hyunmi | Youngerman, Brett E
Identification of patients with drug-resistant epilepsy in electronic medical record data using the Observational Medical Outcomes Partnership Common Data Model Epilepsia 2022/09/15 03:02 Castano, Victor G | Spotnitz, Matthew | Waldman, Genna J | Joiner, Evan F | Choi, Hyunmi | Ostropolets, Anna | Natarajan, Karthik | McKhann, Guy M | Ottman, Ruth | Neugut, Alfred I | Hripcsak, George | Youngerman, Brett E
Conversion of the Canadian Observational Study on Epilepsy (CANOE) REDCap Registry to the OMOP Common Data Model 2023 OHDSI Symposium Showcase 2023/10/28 Boyce, Danielle | Josephson, Colin Bruce | Jiang, Ray | Wiebe, Samuel

OMOP CDM Basic Data Dictionary

For a sample interactive OMOP data dictionary, please click on the image below: OMOP Data Dictionary Thumbnail

Projects Best Suited for Observational Research and OHDSI Network Studies

Source: https://www.ohdsi.org/wp-content/uploads/2023/01/SOS-challenge-intro-24jan2023.pdf

Analytic Use Cases and Examples

Analytic use case Type Structure Example
Clinical characterization Disease Natural History Amongst patients who are diagnosed with <insert your favorite disease>, what are the patient’s characteristics from their medical history? Amongst patients with rheumatoid arthritis, what are their demographics (age, gender), prior conditions, medications, and health service utilization behaviors?
Treatment utilization Amongst patients who have <insert your favorite disease>, which treatments were patients exposed to amongst <list of treatments for disease> and in which sequence? Amongst patients with depression, which treatments were patients exposed to SSRI, SNRI, TCA, bupropion, esketamine and in which sequence?
Outcome incidence Amongst patients who are new users of <insert your favorite drug>, how many patients experienced <insert your favorite known adverse event from the drug profile> within <time horizon following exposure start>? Amongst patients who are new users of methylphenidate, how many patients experienced psychosis within 1 year of initiating treatment?
Population-level effect estimation Safety surveillance Does exposure to <insert your favorite drug> increase the risk of experiencing <insert an adverse event> within <time horizon following exposure start>? Does exposure to ACE inhibitor increase the risk of experiencing Angioedema within 1 month after exposure start?
Comparative effectiveness Does exposure to <insert your favorite drug> have a different risk of experiencing <insert any outcome (safety or benefit)> within <time horizon following exposure start>, relative to <insert your comparator treatment>? Does exposure to ACE inhibitor have a different risk of experiencing acute myocardial infarction while on treatment, relative to thiazide diuretic?
Patient level prediction Disease onset and progression For a given patient who is diagnosed with <insert your favorite disease>, what is the probability that they will go on to have <another disease or related complication> within <time horizon from diagnosis>? For a given patient who is newly diagnosed with atrial fibrillation, what is the probability that they will go onto to have ischemic stroke in next 3 years?
Treatment response For a given patient who is a new user of <insert your favorite chronically-used drug>, what is the probability that they will <insert desired effect> in <time window>? For a given patient with T2DM who start on metformin, what is the probability that they will maintain HbA1C <6.5% after 3 years?
Treatment safety For a given patient who is a new user of <insert your favorite drug>, what is the probability that they will experience <insert adverse event> within <time horizon following exposure>? For a given patient who is a new user of warfarin, what is the probability that they will have GI bleed in 1 year?

Important Paper About Implementation of the OMOP CDM!

Erica A Voss, Clair Blacketer, Sebastiaan van Sandijk, Maxim Moinat, Michael Kallfelz, Michel van Speybroeck, Daniel Prieto-Alhambra, Martijn J Schuemie, Peter R Rijnbeek, European Health Data & Evidence Network—learnings from building out a standardized international health data network, Journal of the American Medical Informatics Association, 2023;, ocad214

Data Content Ontology

Dr. Rachel Richesson presents "Learning to Use EHR Data in Learning Health Systems"

Overview of Major Clinical Terminologies and Coding Systems

This document provides a detailed overview of several essential clinical terminologies and coding systems used in healthcare. Each system has a specific role and is crucial for standardized communication in healthcare settings. The information includes development history, usage, and updates of these systems.

For more in-depth information, links to the respective official websites are provided.

SNOMED Clinical Terms (SNOMED CT)

  • Development: Originally by the College of American Pathologists, now under SNOMED International.
  • Adoption: Used in over 50 countries.
  • Concepts: Over 340,000 active concepts in 19 hierarchies.
  • Usage: Encodes clinical information including diseases, findings, and procedures.
  • Updates: Biannual, with more frequent updates planned.
  • More Information: SNOMED International

Logical Observation Identifiers Names and Codes (LOINC)

  • Developer: Regenstrief Institute.
  • Function: Identifiers for laboratory and clinical observations.
  • Content: Over 90,000 terms.
  • Collaboration: With SNOMED CT for coded content development.
  • Updates: Biannual.
  • More Information: LOINC

RxNorm

  • Developer: National Library of Medicine (NLM).
  • Function: Standard nomenclature for medications.
  • Integration: Links to various drug vocabularies.
  • Access: Requires UMLS user license for proprietary content.
  • More Information: RxNorm - NLM

International Classification of Disease (ICD)

  • Endorsement: World Health Organization (WHO).
  • Versions: ICD-10 widely used with national extensions; ICD-11 adopted for future use.
  • Purpose: Epidemiology, health management, clinical purposes.
  • Updates: Annual, freely available.
  • More Information: WHO ICD

Current Procedural Terminology (CPT)

  • Developer: American Medical Association (AMA).
  • Use: Encoding of medical services and procedures in the USA.
  • Categories: Three categories of codes.
  • Requirement: License from AMA for use.
  • More Information: CPT - AMA

Human Phenotype Ontology (HPO)

  • Function: Bioinformatic resources for human diseases and phenotypes analysis.
  • Components: Phenotype vocabulary, disease-phenotype annotations, algorithms.
  • Applications: Genomic interpretation, gene-disease discovery, precision medicine.
  • Content: Over 13,000 terms in 5 hierarchies.
  • Availability: Freely available, multiple releases per year.
  • More Information: Human Phenotype Ontology

Unified Medical Language System (UMLS)

  • Initiation: By the US National Library of Medicine in 1986.
  • Goal: To aid in the retrieval and integration of electronic biomedical information.
  • Challenge Addressed: Different vocabularies expressing the same information differently.
  • Availability: Free, but requires a license due to additional licensing requirements of some contents.
  • More Information: UMLS - NLM

Ontology Mapping in BioPortal Applications

  • Process: Finding the closest match of a code from one ontology in another.

  • Matching: Exact equivalence is rare; approximate matching is common.

  • Challenges: Labor-intensive and requires understanding the maps' nature and limitations.

  • Alternative Approach: Mapping multiple ontologies to a central core terminology, as used by the OHDSI consortium.

  • More Information: BioPortal

OMOP Domains by Source to Standard Vocabulary

graph LR
    ICD9("ICD9") -->|Transformation to OMOP CDM| SNOMED("STANDARD<br>Vocabulary Concept Code<br>SNOMED")
    ICD10("ICD10") -->|Transformation to OMOP CDM| SNOMED
Loading
Domain Source Vocabulary Standard Vocabulary
Conditions ICD9, ICD10 SNOMED
Measurements LOINC or institutional specific codes LOINC
Drugs NDC RxNORM
Procedures ICD9, ICD10, CPT SNOMED
  • ICD = International Classification of Diseases

  • SNOMED = Systematized Nomenclature of Medicine

  • LOINC = Logical Observation Identifiers Names and Codes

  • NDC = National Drug Code

  • CPT = Current Procedural Terminology

Incremental Loading

Incremental loading in the context of OHDSI refers to the process of adding new or updated data to an existing OHDSI database without the need to completely rebuild or refresh the entire dataset. This can be particularly useful for large datasets where full loads can be time-consuming and inefficient. The process involves extracting only the changes since the last load and then transforming and loading this delta of data into the existing OMOP Common Data Model (CDM) used by OHDSI tools.

For instance, in the development of an ETL (Extract, Transform, Load) process for the bulk and incremental load of German patient data into the OMOP CDM using FHIR as referenced by OHDSI, it suggests that the incremental loading is an essential part of keeping the database up-to-date in an efficient manner​. OHDSI Symposium Showcase #44

This group also described a Near Real-Time Incremental OMOP-CDM ETL System

This is also described by Dr. DuWayne Willett, CMIO of UTSW, at around minute 30 of this video:

OHDSI Symposium Presentation

...and in this OHDSI symposium presentation: OHDSI Symposium Presentation.

## Current CDM ![CDM54 Image](https://github.com/DBJHU/DBJHU.github.io/blob/main/cdm54.png)

Source: OHDSI Common Data Model

Commonly Used CDM Tables Overview

The OMOP common data model (CDM) is a relational database made up of different tables that relate to each other by foreign keys (XXXX_ID values; e.g., PERSON_ID or PROVIDER_ID). The OMOP tables in your data export are as follows:

Table Description
Person Contains basic demographic information describing a participant, including biological sex, birth date, race, and ethnicity.
Visit_occurrence Captures encounters with healthcare providers or similar events. Contains the type of visit a person has (outpatient care, inpatient care, or long-term care), as well as the date and duration information. Rows in other tables can reference this table, for example, condition_occurrences related to a specific visit.
Condition_occurrence Indicates the presence of a disease or medical condition stated as a diagnosis, a sign, or symptom, which is either observed by a provider or reported by the patient.
Drug_exposure Captures records about the utilization of a medication. Drug exposures include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies. Radiological devices ingested or applied locally do not count as drugs. Drug exposure is inferred from clinical events associated with orders, prescriptions written, pharmacy dispensing, procedural administrations, and other patient-reported information.
Measurement Contains both orders and results of a systematic and standardized examination or testing of a participant or participant's sample, including laboratory tests, vital signs, quantitative findings from pathology reports, etc.
Procedure_occurrence Contains records of activities or processes ordered by or carried out by a healthcare provider on the patient to have a diagnostic or therapeutic purpose.
Observation Captures clinical facts about a person obtained in the context of an examination, questioning, or a procedure. Any data that cannot be represented by another domain, such as social and lifestyle facts, medical history, and family history, are recorded here.
Device_exposure Captures information about a person's exposure to a foreign physical object or instrument which is used for diagnostic or therapeutic purposes. Devices include implantable objects, blood transfusions, medical equipment and supplies, other instruments used in medical procedures, and material used in clinical care.
Death Contains the clinical events surrounding how and when a participant dies.

OMOP Data Quality

The Book of OHDSI - Chapter 15: Data Quality

Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw ST, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016 Sep 11;4(1):1244. doi: 10.13063/2327-9214.1244. PMID: 27713905; PMCID: PMC5051581.

ETL Basics

https://www.ohdsi.org/wp-content/uploads/2019/09/OMOP-Common-Data-Model-Extract-Transform-Load.pdf https://ohdsi.github.io/TheBookOfOhdsi/ExtractTransformLoad.html

ETL STEPS

  1. Dataset profiling and documentation

    • Create data model documentation, sample data, data dictionaries, code lists, and other relevant information
    • Execute database profiling scan (WhiteRabbit) on source database
    • Prepare mapping approach/documents based on scan reports from database profiling scan
  2. Generation of the ETL Design

    • Mapping workshop with all relevant parties to:
      1. Understand the source
      2. Define the scope of source data to be transformed
      3. Define acceptance criteria for OMOP output
      • Output: draft mapping document
    • Finalize mapping document:
      • Integrate all notes/documentation from workshop
      • Work through mappings and verify, update, fill in gaps
      • Meetings/emails with data contact/technical contact (TC) as needed
  3. Source Data Integrations and Semantic Mapping

    • Source Code mapping:
      • Identify which codes are already mapped to standard vocabulary
      • Identify code types for codes that need to be mapped
      • Translation of code description/phrases to English, if/as needed
      • Create proposed code mappings
    • Generate mappings for data coming out of flowsheets (together with consortium)
    • Review/approval of code mappings, often done by medical experts affiliated with Data Owner (DO).
    • Identify medical imaging available and define mappings to Imaging Extension
    • Identify waveform data available and map using consortium-defined guidelines
    • Use OHNLP to extract OMOP data from unstructured sources
  4. Technical architecture design

    • Continuous Integration, Continuous Deployment (CI/CD):
      • Decide on ETL dev/deployment flow
      • Put version control mechanisms in place
    • OHDSI Ecosystem:
      • Evaluate infrastructure needed
      • Create infrastructure design documentation
  5. Technical ETL Development

    • Implement ETL (Preferred Language/Structure?)
    • Update ETL based on testing/QA/feedback (8, 9)
  6. Setting up of Infrastructure

    • Deploy core servers and associated services based on infrastructure design in (4)
  7. Installation of the OHDSI tools

    • Install and configure all software (database server, Achilles/DQD/Ares, Atlas/WebAPI, R Studio server, HADES, notebooks/tooling related to analytics, and any other software to suit a site’s specific needs).
  8. ETL Testing and Validation

    • ETL Execution:
      • Test ETL using sample/development data (with limited external data access)
      • Test ETL using DO data (with full external data access)
      • Verify and document QA
      • Submit Achilles/DQD/AresIndexer results to central location regularly
    • ETL Development Planning and Management:
      • Review ETL testing and progress (TCs/meetings)
  9. Data Quality Assessment

    • QA/Acceptance testing:
      • Evaluate accuracy and completeness of mapping
      • Review and approval by DO
  10. Documentation

    • Mapping Documentation and Themis Checks
    • Transformation/Technical Documentation
  11. Project Management Througout

    • Organization of tasks, milestones, and follow-up

OHDSI Analysis Tools

R, SQL, Python, or any preferred data analysis software. Examples provided below are for R and SQL. [The Book of OHDSI Chapter 9] (https://ohdsi.github.io/TheBookOfOhdsi/SqlAndR.html) provides an overview of analysis of OHDSI data in R and SQL; note that you will not be able to avail yourselves of OHDSI software tools when analyzing your exported data for the reason explained above.

Data Science Handbook

Open, rigorous and reproducible research: A practitioner’s handbook From Standord Data Science

Programming Resources

Jupyter Notebooks, Python, SQL, and R Programming Resources

Software Carpentry is a website that provides free online lessons to researchers wanting to enhance their programming skills for data analysis. This website offers free online lessons on a variety of useful topics including:

Additional resources:

OHDSI Resources

Hello! Please familiarize yourself with the following tools and resources which will help you throughout this course and your OHDSI journey.

Check out the OHDSI Forums Introduce yourself on the "Welcome to OHDSI" thread.

Bookmark The Book of OHDSI

Check out OHOP CDM FAQ

Join the OHDSI Microsoft Teams environment.

Check out the MIMIC-IV demo data set in OMOP CDM format!

Register with EHDEN Academy

Visit the Atlas Demo and Athena.

Bookmark the OHDSI YouTube tutorials and workshops

Visit the OHDSI Community Dashboard

Bookmark OMOP Common Data Model (ohdsi.github.io)

Learn about GitHub if you don't already know.

Plan to attend an OHDSI Community call

Learn about OHDSI Workgroups

Follow OHDSI on social media: Twitter LinkedIn

Subscribe to the OHDSI Newsletter

Learn about past and upcoming OHDSI events

Learn about OHDSI software

Look up individual concepts in Athena

Check out useful OHDSI-related documentation here: NIH ALL of US OMOP Documentation

Special Topic: Cinical Registries Using OHDSI

OHDSI and Clinical Registries: Sanity for Health Systems (Aug. 22 Community Call)

Clinical Registries in OHDSI - September 2022

Special topic: SSSOM

https://www.ohdsi.org/wp-content/uploads/2023/10/Talapova-Polina_Mapping_of_Critical_Care_EHR_Flowsheet_data_to_the_OMOP_CDM_via_SSSOM_2023symposium-Polina-Talapova.pdf

Matentzoglu N, Balhoff JP, Bello SM, Bizon C, BrushM, Callahan TJ et al. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database. 2022. 2022:baac035, DOI: 10.1093/database/baac035.

Mapping Commons. SSSOM: Simple Standard for Sharing Ontological Mappings. Wiki [Internet]. Available from: https://mapping-commons.github.io/sssom/about.

Mapping Commons. SSSOM: Simple Standard for Sharing Ontological Mappings. GitHub [Internet]. Available from: https://github.com/mapping-commons/sssom.

https://www.w3.org/2004/02/skos/

Recommended Trainings

OHDSI Community

Broadsea3.0
By: Lee Evans
Broadsea 3.0

Tufts Bridge2AI Standards Module

  • June 15, 2023:
    Data Quality Dashboard
    By: Jared Houghtaling
    Data Quality Dashboard

  • July 6, 2023:
    Data Quality Dashboard output demo
    By: Jared Houghtaling
    Data Quality Dashboard output demo

  • July 13, 2023:
    Achilles output demo
    By: Jared Houghtaling
    Achilles output demo

  • July 27, 2023:
    Flowsheet follow-up
    By: Polina Talapova & Jared Houghtaling
    Flowsheet follow-up

  • August 3, 2023:
    OMOP Standardized Vocabularies - Part 1
    By: Jared Houghtaling and Polina Talapova
    OMOP Standardized Vocabularies - Part 1

  • August 17, 2023:
    OMOP Standardized Vocabularies - Part 2
    By: Polina Talapova
    OMOP Standardized Vocabularies - Part 2

  • August 24, 2023:
    How to download and set-up a DDL (Demo)
    By: Jared Houghtaling
    How to download and set-up a DDL (Demo)

  • August 31, 2023:
    Demo of WhiteRabbit and RabbitInAHat
    By: Jared Houghtaling
    Demo of WhiteRabbit and RabbitInAHat

  • September 7, 2023:
    ARES usefulness for ETL at Tufts
    By: Jared Houghtaling
    ARES usefulness for ETL at Tufts

  • September 14, 2023:
    Google form introduction for site progress tracking
    By: Jared Houghtaling
    Google form introduction for site progress tracking

  • September 21, 2023:
    Sample ETL Process
    By: Jared Houghtaling
    Sample ETL Process

October 12, 2023:
Google Form for Site Progress Tracking
With Jared Houghtaling and Andrew Williams
Google Form for Site Progress Tracking

  • October 26, 2023:
    Review and Prioritization of DQD Results, and Discussion of DQD Issue Severity
    With Jared Houghtaling
    Review and Prioritization of DQD Results, and Discussion of DQD Issue Severity

  • November 2, 2023:
    Principles of Mapping and Vocab Gaps Identification
    With Polina Talapova
    Principles of Mapping and Vocab Gaps Identification

  • November 9, 2023:
    Usagi & STCM Demo
    With Polina Talapova & Jared Houghtailing
    Usagi & STCM Demo

Diversity, Equity, and Inclusion Resources

Analysis with SQL (OHDSI/OMOP)

The OMOP Query Library is a library of commonly-used SQL queries for the OMOP Common Data Model (CDM).

Analysis with R

Below are some sample R queries that demonstrate how to read in OMOP tables from CSV files, join them based on the person_id and visit_occurrence_id fields, and search for specific criteria.

Note: Adjust the file paths and column names accordingly based on the actual structure and location of your CSV files. The queries below are a generic representation and may need adjustments based on the specifics of your data set.

Reading CSV files into R data frames:

# Read the CSV files into R data frames
person_df <- read.csv("path_to_person_table.csv", header=TRUE, stringsAsFactors=FALSE)
visit_occurrence_df <- read.csv("path_to_visit_occurrence_table.csv", header=TRUE, stringsAsFactors=FALSE)
condition_occurrence_df <- read.csv("path_to_condition_occurrence_table.csv", header=TRUE, stringsAsFactors=FALSE)

Join tables based on person_id:

When a person has multiple visits in the visit_occurrence table, joining the person table with the visit_occurrence table will result in multiple rows for that person, each corresponding to a different visit. This is a standard one-to-many join operation.

## Join person with visit_occurrence on 'person_id'
person_visit_df <- merge(person_df, visit_occurrence_df, by="person_id")

Joining the Person-Visit table with the Condition Occurrence table:

# Join the person-visit result with condition_occurrence on both 'person_id' and 'visit_occurrence_id'
full_df <- merge(person_visit_df, condition_occurrence_df, by=c("person_id", "visit_occurrence_id"))

Search by a list of person_ids:

# Define a list of person_ids to search for
search_person_ids <- c(1, 2, 3, 4, 5)

# Filter the data frame to only include rows with person_ids in the list
filtered_by_person_df <- subset(full_df, person_id %in% search_person_ids)

Search by a specific condition concept code:

# Define a specific condition concept code to search for
search_condition_concept_id <- 1234567

# Filter the data frame to only include rows with the specified condition concept code
filtered_by_condition_df <- subset(full_df, condition_concept_id == search_condition_concept_id)

Search by a date range:

# Define a date range to search for
start_date <- as.Date("2020-01-01")
end_date <- as.Date("2020-12-31")

## Filter the data frame to only include rows within the date range
filtered_by_date_df <- subset(full_df, visit_start_date >= start_date & visit_start_date <= end_date)

About

Resources for ILAE Big Data Commission

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages