-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Cumulative incidence #4
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have! Just a few comments based on a quick look.
fit(CumulativeIncidence, times, events, competing) | ||
|
||
Given a vector of times to events, a vector of indicators for the event of interest, | ||
and a vector of indicators for the competing event, compute the an estimate of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the an"
|
||
* `times`: Distinct event times | ||
* `nevents`: Number of observed events of interest at each time | ||
* `ncensor`: Number of right censorings for the event of interest at each time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call that ncensored
? That's only two characters more.
Below, could also be ncompeting
, though it's a bit longer so I'm not sure.
In the presence of competing risks, `1-KM`, where `KM` is the Kaplan-Meier estimate, | ||
is uninterpretable and is a biased estimate of the failure probability. The | ||
cumulative incidence estimator of Kalbfleisch and Prentice (1980) is a function | ||
of the hazards of both the event of interest and the competing event, and provides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"events"? Same below.
compete::AbstractVector{<:Integer}) where {T<:Real} | ||
nobs = length(times) | ||
if !(nobs == length(status) == length(compete)) | ||
throw(DimensionMismatch("the input vectors must have the same length")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to print the sizes which were actually passed.
end | ||
p = sortperm(times) | ||
t = times[p] | ||
s = BitVector(status[p]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that constructor even existed. Would be more efficient and clearer as broadcast(i -> status[i] != 0, p)
.
end | ||
|
||
function StatsBase.fit(::Type{CumulativeIncidence}, | ||
times::AbstractVector{T}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans for a generalized CompetingEventTime
which would hold both the time and type of (non-)event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, hadn't thought of that, that's an interesting idea. We may want to consider adding that.
Nice! I think it's very important to also refactor the code, as there is one extra measure, which should be computed in a very similar way (cumulative hazard) that you get in more or less the same way. It really is the same computation as Kaplan-Meier but instead of:
you have:
It's important to have that one as one can generalize it to easily compute the cumulative baseline hazard for a Cox model, for which then one can recover the other estimates in the context of a Cox model (for example, the survival function is the exponential of minus the cumulative hazard). |
I've picked this up and am committed to getting a refreshed PR completed soon, using this PR as a starting point. Question: How should ties in Plans:
Sound ok? |
My approach here was to use the cumulative incidence estimator described in the linked paper from Kalbfleisch and Prentice. I can't recall exactly how it handles ties (it's been years since I've done any survival analysis), but it should be described there. If not, there's likely a paper from John Crowley that describes an appropriate method. Sorry I don't have better information at the moment. I'll try to dig back through some papers saved on an old computer to see whether I have anything helpful soon. |
So I guess it turns out that ties do not need to be handled for the basic CIF computation, you just need to aggregate across them like you had done. Only when you move to modelling the CIF with covariates and computing likelihoods do you need to start to worry about how ties are broken. That is my understanding now. EDIT to add: None of my comparisons against SAS/Stata had any ties in the data, so a further check would be to make sure we report the same as they do in the presence of ties. |
The company I worked for at the time I started working on this (before it even lived on GitHub) used SAS exclusively, but we had to implement cumulative incidence ourselves since at the time (SAS 9.3 or 9.4) an appropriate estimator was not implemented in SAS. If I recall it was a fairly straightforward computation in SAS using the KM estimated survivor function output from |
Still do to:
Started on this the same time I started the package, never bothered to push it to GitHub. Maybe having it up here will remind me to continue.