-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create a vignette for Kaplan Meier #41
base: main
Are you sure you want to change the base?
Conversation
adding extras codes to the KM ggsurvfit
updated the example of creation of variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much Livia, this is really coming together 😁😁
vignettes/kaplan_meier.Rmd
Outdated
death_eos = status, | ||
date_progression = date_enrol + time*runif(n(), 0.6, 0.8), | ||
date_progression = if_else(runif(n())>0.2, NA, date_progression), | ||
date_first_visit = t0_v2 + abs(rnorm(n(), 500, 20)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure about the variable renaming? Survival is usually calculated from enrollment or randomization, even if there is a visit (for screening, clinical checking, etc.) that happened before and would be the "first visit".
I agree with you on date_of_last_visit
though, much better.
However, it would be better to stick to one naming convention: either date_of_xxx
or date_xxx
, as you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understood is not always the same name for every project/data. So that first date and last date might be more understandable. However, when Ill present it to our team members in our next meeting i'll ask them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't follow_up_start
and follow_up_end
be more correct then? (or something similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
date and follow up is not really the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
date_start & date_end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My freind says "Both date_end and end_date are used, but end_date is generally more common and intuitive because it follows the natural English order (adjective + noun). Most coding styles prefer end_date since it improves readability.
However, some people prefer date_end to keep similar variable names grouped together when sorting alphabetically (e.g., date_start, date_end)."
Also if it is badly pronnounce it sounds like dead end more than date end. Which is super weird!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to choose between "good English wording" and "good R wording".
I'm sure you know what I prefer 😉.
However, this is your very vignette, so you should do as you wish.
Moreover, in the context of this vignette, autocompletion and variable naming guidelines are not the main points of interest.
vignettes/kaplan_meier.Rmd
Outdated
print(km.model_PFS) | ||
# il y a beaucoup de NA, c pas genial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Que veux-tu dire ?
Si tu parles de la médiane qui est NA, c'est normal : quand la médiane n'est pas atteinte (il y a moins d'événements que la moitié des patients), on ne peut calculer que la borne inf de l'intervalle de confiance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on peut voir ca ensemble?
vignettes/kaplan_meier.Rmd
Outdated
print(km.model_PFS) | ||
# il y a beaucoup de NA, c pas genial | ||
summary(km.model_PFS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In summary()
, the output is unreadable without the times
argument as it describes every time in the database.
Also, I prefer using tidy_survfit()
from ggsurvfit
, it has the broom::tidy()
vibe, although you have to use select()
on the output to reduce the clutter.
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you think it is nice to explain the times
argument? Clinicians often want to see the "x-Year survival" and this is how I calculate it. Do you do it in a different way?
tidy_survfit(km.model_PFS, times=c(0.05, 0.10, 0.20)) %>%
select(strata, time, n.risk, n.event, estimate, conf.high, conf.low)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, I added it. However I also kept "km.model_PFS", in order to get the median unless there is another way to present it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, there is no way with ggsurvfit
to get the median.
I know you can write median(km.model_PFS)
, but I'm not sure how to get the CI.
This might be a mission for Alderic's functions!
vignettes/kaplan_meier.Rmd
Outdated
status == 0 ~ pmax(date_of_last_visit,na.rm = T) | ||
)) %>% | ||
mutate(time_days=status_date-date_first_visit ) %>% | ||
mutate(time_years=time_days/365.25 ) | ||
mutate(time_months=time_days/30.5 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our times are very low, aren't they?
In ggsurvfit::df_colon
, times range from 0.02 to 9 years, which is a rather natural scale.
I'm not sure where this comes from, but maybe we can cheat by rescaling when creating data_surv
?
Also, don't you think creating both time_os
and time_pfs
(and both KM curves) would be educationally valuable? I think juniors can struggle with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! Alexi should work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so Alexis should work on this vignette before we publish it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could publish it for the time being and remove it once Alexi has done vignettes's KM separately for dor, os , pfs , follow_up and therefore at that point, time_DOR, time_OS, time_PFS, and time_follow_up will be created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alexis will write the vignette on response variables: ORR, DOR...
This vignette pertains to OS and PFS.
I think this mutate should create 4 variables: time_os, time_pfs, event_os, event_pfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do it but ideally we would have the response variable. This is why would be good for an update of that vignette when we will have the dataset.
i.e: mutate(PFS= ifelse((PD_RCRESP=="Progressive disease"), 1, 0 ))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have a look I add the PFS and OS differentiation. But since I changed that I cannot knitt!!! something weird
guidelines how to construct a KM at GR