Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified pipeline to work with right censoring #42

Merged
merged 24 commits into from
Feb 20, 2024

Conversation

davidsantiagoquevedo
Copy link
Member

This PR refactorizes the package to work with right censoring. Main additions:

  • Updated example dataset cohortdata with simulated censoring dates death_other_causes and unique IDs id to match the data
  • Tests to control the correctness of updated data
  • Use of censoring_date_col in get_immunization_date and get_time_to_event. Set NULL by default
  • Tests for censoring_date_col when provided

davidsantiagoquevedo and others added 22 commits February 2, 2024 13:30
…osidered by hierarchically defining t0 and tf to calculate time_to_event = tf-t0
…ded coherence test between disjoint variables death_date and death_other_causes
Copy link
Member

@chartgerink chartgerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me from reading the changes. I am still trying to understand what censoring data means from a practical standpoint - why would I want to do this to begin with? It might be worthwhile to have a brief vignette on this to help users along as well (that does not mitigate the value of the current PR).

One question that remains for me is: Do we know whether the cases in cohortdata cover all the options for censoring that we are testing? That is, you're testing whether the functionality succeeds correctly but do we know whether it also fails correctly?

Another point I noticed while testing I would like to check in on (not related to censoring though): Is it normal that the immunization delay is added on top of the first vaccination, if two vaccination dates are provided? I want to check just to be sure.

@davidsantiagoquevedo
Copy link
Member Author

Hi @chartgerink, thanks for your review. The censoring impacts the estimation of the time-to-event for individuals whose event status is unknown at the end of the follow-up period. This occurs when they either drop out, are lost to follow-up, or experience a different event (e.g., death from other causes). Therefore, depending on the available information, a censoring date can be associated with the cohort. Despite not experiencing the event of interest, these individuals still contribute to the analysis with their exposure history. This information is used to estimate the hazard ratio and relative risks by providing details on exposure status and the duration of follow-up before censoring.

In addition, censoring will be also important in the matching routine because matched couples must be completely censored if one of the individuals is censored.

We are working on a vignette to explain this because it is one of the core ideas of the package.

@davidsantiagoquevedo
Copy link
Member Author

davidsantiagoquevedo commented Feb 19, 2024

@chartgerink can you please expand a bit on the other two points?

Regarding the last one, immunization_delay is added at the end of the function to data$imm_limit - data$delta_imm, where delta_imm represents the time distance to the vaccine selected for analysis. If take_first = TRUE, the function uses the first vaccine; if take_first = FALSE, it uses the one closest to imm_limit.

@chartgerink
Copy link
Member

Thanks for clarifying @davidsantiagoquevedo 👍

With respect to the testing approach - it seems like the tests only investigate that the output is in line with what you expect. I wonder whether there are any situations where you would expect the function to not work that you could test to strengthen the overall test suite for the censoring functionality.

@davidsantiagoquevedo
Copy link
Member Author

You're right, @chartgerink. I believe that to address this, we should simulate an example dataset for testing and to capture unexpected behaviors in all the functions. Because cohortdata was specifically simulated for demonstration purposes.

@chartgerink chartgerink self-requested a review February 20, 2024 15:55
Copy link
Member

@chartgerink chartgerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments resolved - thanks @davidsantiagoquevedo 🙏

@davidsantiagoquevedo davidsantiagoquevedo merged commit 5910429 into main Feb 20, 2024
7 checks passed
@davidsantiagoquevedo davidsantiagoquevedo deleted the feat-censoring branch February 20, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants