Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create bootstrapped datasets for regression problems? #187

Open
JohannesWiesner opened this issue Jan 8, 2024 · 1 comment
Open

Create bootstrapped datasets for regression problems? #187

JohannesWiesner opened this issue Jan 8, 2024 · 1 comment

Comments

@JohannesWiesner
Copy link

Hi @rishi-kulkarni, I would like to use your package to create a list of bootstrapped datasets (again referring to HCP data), but I noticed that hierarch.resampling.Bootstrapper.fit() wants to have a value for y to define a treatment and control group. However, the HCP-dataset does not have treatment and control groups (in other words: All my analyses are regression problems). Is it still possible to generate bootstrapped datasets using your functions even if there are no groups?

Reminder: I would like to generate n bootstrapped datasets from the HCP dataset. In this dataset, subjects can belong to the same family or even be twins. I need a function that respects this structure so that resampled datasets are similar in that regard.

@rishi-kulkarni
Copy link
Owner

rishi-kulkarni commented Jan 12, 2024

HI @JohannesWiesner - I'll take a closer look at what hierarch can do this weekend, but in principle yes. I'd encourage you to take a look at these lecture notes as well: https://faculty.washington.edu/yenchic/17Sp_403/Lec6-bootstrap_reg.pdf

They discuss a couple different approaches to bootstrapping regression problems. When I was putting together the hierarch paper, I found that the approach of bootstrapping residuals + permuting the values of the regression coefficient of interest was best at controlling Type I error rate. The confidence_interval function actually does this out of the box.

If you have multiple covariates at the same level in a regression problem, you can produce the correct permutation test by treating the coefficient you're trying to measure as nested within the others on the same level. See this paper for a deeper discussion of that: https://www.tandfonline.com/doi/abs/10.1080/00949650215733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants