Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Cumulative incidence #4

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

WIP: Cumulative incidence #4

wants to merge 1 commit into from

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Oct 9, 2017

Still do to:

  • Tests
  • Add to the documentation
  • Refactor to avoid near entire code duplication with Kaplan-Meier

Started on this the same time I started the package, never bothered to push it to GitHub. Maybe having it up here will remind me to continue.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have! Just a few comments based on a quick look.

fit(CumulativeIncidence, times, events, competing)

Given a vector of times to events, a vector of indicators for the event of interest,
and a vector of indicators for the competing event, compute the an estimate of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the an"


* `times`: Distinct event times
* `nevents`: Number of observed events of interest at each time
* `ncensor`: Number of right censorings for the event of interest at each time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call that ncensored? That's only two characters more.

Below, could also be ncompeting, though it's a bit longer so I'm not sure.

In the presence of competing risks, `1-KM`, where `KM` is the Kaplan-Meier estimate,
is uninterpretable and is a biased estimate of the failure probability. The
cumulative incidence estimator of Kalbfleisch and Prentice (1980) is a function
of the hazards of both the event of interest and the competing event, and provides
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"events"? Same below.

compete::AbstractVector{<:Integer}) where {T<:Real}
nobs = length(times)
if !(nobs == length(status) == length(compete))
throw(DimensionMismatch("the input vectors must have the same length"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to print the sizes which were actually passed.

end
p = sortperm(times)
t = times[p]
s = BitVector(status[p])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that constructor even existed. Would be more efficient and clearer as broadcast(i -> status[i] != 0, p).

end

function StatsBase.fit(::Type{CumulativeIncidence},
times::AbstractVector{T},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plans for a generalized CompetingEventTime which would hold both the time and type of (non-)event?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, hadn't thought of that, that's an interesting idea. We may want to consider adding that.

@piever
Copy link
Contributor

piever commented Oct 14, 2017

Nice! I think it's very important to also refactor the code, as there is one extra measure, which should be computed in a very similar way (cumulative hazard) that you get in more or less the same way. It really is the same computation as Kaplan-Meier but instead of:

km *= 1 - d_i/n_i

you have:

cum_haz += d_i/n_i

It's important to have that one as one can generalize it to easily compute the cumulative baseline hazard for a Cox model, for which then one can recover the other estimates in the context of a Cox model (for example, the survival function is the exponential of minus the cumulative hazard).

@tbeason
Copy link
Contributor

tbeason commented Jul 25, 2020

I've picked this up and am committed to getting a refreshed PR completed soon, using this PR as a starting point.

Question: How should ties in times be handled? I know there are lots of different ways, but perhaps random jittering might be the easiest? In my application this would be fine because I have millions of observations, so how I break ties won't matter much I think. This was not addressed in the original PR.

Plans:

  • I only have one competing risk, so I will only be coding it up for that case (like this existing PR).
  • I will make a general CompetingEventTime that handles times and status.
  • I will make a CumulativeIncidence type with estimator_update function to be consistent with current implementation of KaplanMeier, and define _estimator(::Type{CumulativeIncidence},tte::AbstractVector{CompetingEventTime{T}}) where {T}

Sound ok?

@ararslan
Copy link
Member Author

How should ties in times be handled?

My approach here was to use the cumulative incidence estimator described in the linked paper from Kalbfleisch and Prentice. I can't recall exactly how it handles ties (it's been years since I've done any survival analysis), but it should be described there. If not, there's likely a paper from John Crowley that describes an appropriate method. Sorry I don't have better information at the moment. I'll try to dig back through some papers saved on an old computer to see whether I have anything helpful soon.

@tbeason
Copy link
Contributor

tbeason commented Jul 27, 2020

So I guess it turns out that ties do not need to be handled for the basic CIF computation, you just need to aggregate across them like you had done. Only when you move to modelling the CIF with covariates and computing likelihoods do you need to start to worry about how ties are broken. That is my understanding now.

EDIT to add: None of my comparisons against SAS/Stata had any ties in the data, so a further check would be to make sure we report the same as they do in the presence of ties.

@ararslan
Copy link
Member Author

The company I worked for at the time I started working on this (before it even lived on GitHub) used SAS exclusively, but we had to implement cumulative incidence ourselves since at the time (SAS 9.3 or 9.4) an appropriate estimator was not implemented in SAS. If I recall it was a fairly straightforward computation in SAS using the KM estimated survivor function output from proc lifetest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants