Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n.risk at time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(*, surv.connect = TRUE) #229

Open
ThomasSoeiro opened this issue Mar 8, 2024 · 3 comments

Comments

@ThomasSoeiro
Copy link
Contributor

ThomasSoeiro commented Mar 8, 2024

System information

  • OS Platform and Distribution: Red Hat Enterprise Linux Server 7.8 (Maipo)
  • ggfortify installed from: CRAN
  • ggfortify version: ggfortify_0.4.14
  • Exact command to reproduce: see below

Describe the problem

n.risk at time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(*, surv.connect = TRUE).

Source code / logs / plots

library(survival)

fit <- survfit(Surv(time, status) ~ x, data = aml)
summary(fit)
# Call: survfit(formula = Surv(time, status) ~ x, data = aml)
# 
#                 x=Maintained 
#  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#     9     11       1    0.909  0.0867       0.7541        1.000
#    13     10       1    0.818  0.1163       0.6192        1.000
#    18      8       1    0.716  0.1397       0.4884        1.000
#    23      7       1    0.614  0.1526       0.3769        0.999
#    31      5       1    0.491  0.1642       0.2549        0.946
#    34      4       1    0.368  0.1627       0.1549        0.875
#    48      2       1    0.184  0.1535       0.0359        0.944
# 
#                 x=Nonmaintained 
#  time n.risk n.event survival std.err lower 95% CI upper 95% CI
#     5     12       2   0.8333  0.1076       0.6470        1.000
#     8     10       2   0.6667  0.1361       0.4468        0.995
#    12      8       1   0.5833  0.1423       0.3616        0.941
#    23      6       1   0.4861  0.1481       0.2675        0.883
#    27      5       1   0.3889  0.1470       0.1854        0.816
#    30      4       1   0.2917  0.1387       0.1148        0.741
#    33      3       1   0.1944  0.1219       0.0569        0.664
#    43      2       1   0.0972  0.0919       0.0153        0.620
#    45      1       1   0.0000     NaN           NA           NA

ggfortify:::fortify.survfit(fit, surv.connect = TRUE) |> head()
#   time n.risk n.event n.censor      surv    std.err     upper     lower        strata
# 1    0     11       0        0 1.0000000 0.00000000 1.0000000 1.0000000    Maintained
# 2    0     11       0        0 1.0000000 0.00000000 1.0000000 1.0000000 Nonmaintained  # <- n.risk should be 12
# 3    9     11       1        0 0.9090909 0.09534626 1.0000000 0.7541338    Maintained
# 4   13     10       1        1 0.8181818 0.14213381 1.0000000 0.6192490    Maintained
# 5   18      8       1        0 0.7159091 0.19508758 1.0000000 0.4884263    Maintained
# 6   23      7       1        0 0.6136364 0.24873417 0.9991576 0.3768671    Maintained

The issue is from:

base <- d[1, ]

Maybe it can be changed to d[d$time == ave(d$time, d$strata, FUN = min), ].

And then this will need to be simplified:

if ('strata' %in% colnames(d)) {
strata <- levels(d$strata)
base <- base[rep(seq_len(nrow(base)), length(strata)), ]
rownames(base) <- NULL
base$strata <- strata
base$strata <- factor(base$strata, levels = base$strata)
}

@ThomasSoeiro ThomasSoeiro changed the title n.risk value for time == 0 is not correct when length(strata) > 1 in fortify.survfit(). n.risk value for time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(). Mar 8, 2024
@ThomasSoeiro ThomasSoeiro changed the title n.risk value for time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(). n.risk at time == 0 is not correct when nlevels(strata) > 1 in fortify.survfit(*, surv.connect = TRUE) Mar 8, 2024
@ThomasSoeiro
Copy link
Contributor Author

Hi @terrytangyuan,
Do you confirm the issue? Would you consider a patch based on what I proposed in #229 (comment)?
Thanks!

@terrytangyuan
Copy link
Collaborator

Please submit a PR with additional test cases. Thanks!

@ThomasSoeiro
Copy link
Contributor Author

FYI, I've just realized that a much better fix would be to use summary(times = 0):

library(survival)
fit <- survfit(Surv(time, status) ~ x, aml)
fit |> summary(times = 0) |> str()

@ThomasSoeiro ThomasSoeiro reopened this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants