TriCA

Trinary chart-reviewed phenotype integrated cost-effective augmented estimation

Outline

Description
TriCA Overview
Example Dataset

Description

This README is for the journal peer review of the TriCA paper, which introduces a method for cost-effective, augmented estimation in association studies. The TriCA method is particularly useful when 'undecided' cases arise during manual chart reviews. It optimally combines binary algorithm-derived phenotypes for the entire cohort with trinary chart-reviewed phenotypes from a small subset, selected through outcome-dependent sampling. This approach offers unbiased estimates with greater efficiency compared to existing methods.

To demonstrate its effectiveness (similar as discussed in the SIMULATION STUDY section of the paper), we use simulated data from 3,000 patients to analyze the association of two continuous covariates and one treatment indicator. Method 3 (which modifies the proposed approach in Tong et al. (2020)) and the proposed TriCA method are implemented in the 2-multinom-estimate.R file by functions Aug.Unif.estimate and Aug.Bias.estimate respectively.

Statement of significance

Summary	Description
Problem	No methods have been proposed to include ‘undecided’ records from manual chart review phenotypes when identifying risk factors in association studies, particularly in rare disease scenarios
What is Already Known	Electronic health records are a valuable resource for identifying risk factors through association studies. While phenotyping algorithms are efficient for obtaining clinical outcomes, they can be error prone. Manual chart review, considered the gold standard, provides unbiased estimates but is labor-intensive and limited to a small subset of patients, potentially introducing ‘undecided’ cases. Existing methods often discard these indeterminate cases, which can reduce the efficiency of estimates, particularly in rare event conditions.
What this Paper Adds	We develop an augmented estimator, TriCA, that optimally combines the algorithm-derived binary phenotypes with the chart-review trinary phenotypes selected through a biased sampling strategy. By incorporating the undecided cases from manual chart review, TriCA provides unbiased estimates with higher statistical efficiency compared to existing methods.

TriCA Overview

Example Dataset

file: data_outcome_dependent_sampled.csv and data_uniform_sampled.csv

Full Dataset

3000 rows
6 columns: (Y, S, X)
- Y: outcome/true phenotype, categorical data with 3 levels.
  - 1: No, 2: Yes, 3: Unknown. (1 is the reference level) which is different from the manuscript where 0:No, 1:Yes, 2:Unknown.
  - p(Y=2) ~ 5%
  - generate from X with parameter beta = (-3.8, 1, 1, 1, 0.5, -0.4, 0.6, -1.6)
- S: surrogate phenotype, categorical data with 2 levels.
  - 1: No, 2: Yes. (1 is the reference level)
  - generated with p(s=1|y=1)=0.9, p(s=2|y=2)=0.9, p(s=1|y=3)=0.6
- X = (X1, X2, X3) covariates.
  - X1, X2 ~ N(0,1)
  - X3 ~ BER(0.5): indicate treatment/control

Validation Dataset

Indicator : the y column of full dataset is set to be NA if it's not included in the validation.
the validation dataset are from the SAME full dataset

1 : uniform

Idea : sampling uniformly from full dataset
n_vu = 600 : number of observations in uniform validation set
rho = n_v/ n = 1/5

2 : outcome dependent

Idea : uniformly select n1 samples from the S-positive(S=2) patients and n0 samples the S-negative(S=1) patients to construct V
n_v = 600
- n0 = 300 : number of samples from S0 (S=1)
- n1 = 300 : number of samples from S1 (S=2)
W: weights [calculated when estimating Aug.Biased]
- for S = 1, w = n0 / ns0 (# of sample from S=1 / # of obs with S=1) where n0 = 300
- for S = 2, w = n1 / ns1 (# of sample from S=2 / # of obs with S=2) where n1 = 300

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
1-data-generate.R		1-data-generate.R
2-multinom-estimate.R		2-multinom-estimate.R
README.md		README.md
Visual Abstract.png		Visual Abstract.png
data_outcome_dependent_sampled.csv		data_outcome_dependent_sampled.csv
data_uniform_sampled.csv		data_uniform_sampled.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TriCA

Outline

Description

TriCA Overview

Example Dataset

Full Dataset

Validation Dataset

1 : uniform

2 : outcome dependent

About

Releases

Packages

Languages

Penncil/TriCA

Folders and files

Latest commit

History

Repository files navigation

TriCA

Outline

Description

TriCA Overview

Example Dataset

Full Dataset

Validation Dataset

1 : uniform

2 : outcome dependent

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages