Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a vignette for Kaplan Meier #41

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Conversation

Lpierott
Copy link
Contributor

guidelines how to construct a KM at GR

Use of the vignette
@Lpierott Lpierott linked an issue Nov 29, 2024 that may be closed by this pull request
Copy link
Member

@DanChaltiel DanChaltiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much Livia, this is really coming together 😁😁

death_eos = status,
date_progression = date_enrol + time*runif(n(), 0.6, 0.8),
date_progression = if_else(runif(n())>0.2, NA, date_progression),
date_first_visit = t0_v2 + abs(rnorm(n(), 500, 20)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about the variable renaming? Survival is usually calculated from enrollment or randomization, even if there is a visit (for screening, clinical checking, etc.) that happened before and would be the "first visit".
I agree with you on date_of_last_visit though, much better.
However, it would be better to stick to one naming convention: either date_of_xxx or date_xxx, as you wish.

Copy link
Contributor Author

@Lpierott Lpierott Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understood is not always the same name for every project/data. So that first date and last date might be more understandable. However, when Ill present it to our team members in our next meeting i'll ask them.

Copy link
Member

@DanChaltiel DanChaltiel Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't follow_up_start and follow_up_end be more correct then? (or something similar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date and follow up is not really the same...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date_start & date_end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My freind says "Both date_end and end_date are used, but end_date is generally more common and intuitive because it follows the natural English order (adjective + noun). Most coding styles prefer end_date since it improves readability.

However, some people prefer date_end to keep similar variable names grouped together when sorting alphabetically (e.g., date_start, date_end)."

Also if it is badly pronnounce it sounds like dead end more than date end. Which is super weird!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to choose between "good English wording" and "good R wording".
I'm sure you know what I prefer 😉.
However, this is your very vignette, so you should do as you wish.
Moreover, in the context of this vignette, autocompletion and variable naming guidelines are not the main points of interest.

print(km.model_PFS)
# il y a beaucoup de NA, c pas genial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Que veux-tu dire ?
Si tu parles de la médiane qui est NA, c'est normal : quand la médiane n'est pas atteinte (il y a moins d'événements que la moitié des patients), on ne peut calculer que la borne inf de l'intervalle de confiance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on peut voir ca ensemble?

print(km.model_PFS)
# il y a beaucoup de NA, c pas genial
summary(km.model_PFS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In summary(), the output is unreadable without the times argument as it describes every time in the database.
Also, I prefer using tidy_survfit() from ggsurvfit, it has the broom::tidy() vibe, although you have to use select() on the output to reduce the clutter.
What do you think about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think it is nice to explain the times argument? Clinicians often want to see the "x-Year survival" and this is how I calculate it. Do you do it in a different way?

tidy_survfit(km.model_PFS, times=c(0.05, 0.10, 0.20)) %>% 
    select(strata, time, n.risk, n.event, estimate, conf.high, conf.low)

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, I added it. However I also kept "km.model_PFS", in order to get the median unless there is another way to present it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, there is no way with ggsurvfit to get the median.
I know you can write median(km.model_PFS), but I'm not sure how to get the CI.
This might be a mission for Alderic's functions!

status == 0 ~ pmax(date_of_last_visit,na.rm = T)
)) %>%
mutate(time_days=status_date-date_first_visit ) %>%
mutate(time_years=time_days/365.25 )
mutate(time_months=time_days/30.5 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our times are very low, aren't they?
In ggsurvfit::df_colon, times range from 0.02 to 9 years, which is a rather natural scale.
I'm not sure where this comes from, but maybe we can cheat by rescaling when creating data_surv?

Also, don't you think creating both time_os and time_pfs (and both KM curves) would be educationally valuable? I think juniors can struggle with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! Alexi should work on this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so Alexis should work on this vignette before we publish it?

Copy link
Contributor Author

@Lpierott Lpierott Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could publish it for the time being and remove it once Alexi has done vignettes's KM separately for dor, os , pfs , follow_up and therefore at that point, time_DOR, time_OS, time_PFS, and time_follow_up will be created.

Copy link
Member

@DanChaltiel DanChaltiel Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alexis will write the vignette on response variables: ORR, DOR...
This vignette pertains to OS and PFS.
I think this mutate should create 4 variables: time_os, time_pfs, event_os, event_pfs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do it but ideally we would have the response variable. This is why would be good for an update of that vignette when we will have the dataset.
i.e: mutate(PFS= ifelse((PD_RCRESP=="Progressive disease"), 1, 0 ))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look I add the PFS and OS differentiation. But since I changed that I cannot knitt!!! something weird

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vignette: Kaplan Meier
2 participants