Skip to content

Commit

Permalink
Improved notebook about bootstraps
Browse files Browse the repository at this point in the history
  • Loading branch information
cyberosa committed Jan 6, 2024
1 parent e553714 commit 9450252
Showing 1 changed file with 14 additions and 28 deletions.
42 changes: 14 additions & 28 deletions nbs/blog/posts/bootstraps/bootstraps.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Bootstrap Confidence Intervals\n",
"\n",
"> Explaination of the bootstrap method and its application in hypothesis testing using DABEST.\n",
"> Explanation of the bootstrap method and its application in hypothesis testing using **DABEST**.\n",
"\n",
"- order: 3"
]
Expand All @@ -17,7 +17,7 @@
"id": "6321ea6f",
"metadata": {},
"source": [
"## Sampling from Populations"
"## Sampling from populations"
]
},
{
Expand All @@ -27,7 +27,7 @@
"source": [
"In a typical scientific experiment, we are interested in two populations\n",
"(Control and Test), and whether there is a difference between their means\n",
"$(\\mu_{Test}-\\mu_{Control})$\n"
"$(\\mu_{Test}-\\mu_{Control})$.\n"
]
},
{
Expand All @@ -43,7 +43,7 @@
"id": "5573045c",
"metadata": {},
"source": [
"We go about this by collecting observations from the control population, and from the test population."
"We go about this by collecting observations from the control population and from the test population."
]
},
{
Expand All @@ -62,7 +62,7 @@
"We can easily compute the mean difference in our observed samples. This is our\n",
"estimate of the population effect size that we are interested in.\n",
"\n",
"**But how do we obtain a measure of precision and confidence about our estimate?\n",
"**But how do we obtain a measure of the precision and confidence about our estimate?\n",
"Can we get a sense of how it relates to the population mean difference?**\n"
]
},
Expand All @@ -79,11 +79,11 @@
"id": "fe977cc6",
"metadata": {},
"source": [
"We want to obtain a 95% confidence interval (95% CI) around the our estimate of the mean difference. The 95% indicates that any such confidence interval will capture the population mean difference 95% of the time.\n",
"We want to obtain a 95% confidence interval (95% CI) around our estimate of the mean difference. The 95% indicates that any such confidence interval will capture the population mean difference 95% of the time.\n",
"\n",
"In other words, if we repeated our experiment 100 times, gathering 100 independent sets of observations, and computing a 95% confidence interval for the mean difference each time, 95 of these intervals would capture the population mean difference. That is to say, we can be 95% confident the interval contains the true mean of the population.\n",
"In other words, if we were to repeat our experiment 100 times, gathering 100 independent sets of observations and computing a 95% confidence interval for the mean difference each time, 95 of these intervals would capture the population mean difference. That is to say, we can be 95% confident the interval contains the true mean of the population.\n",
"\n",
"We can calculate the 95% CI of the mean difference with [bootstrap resampling](https://en.wikipedia.org/wiki/Bootstrapping_(statistics))\n"
"We can calculate the 95% CI of the mean difference with [bootstrap resampling](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)).\n"
]
},
{
Expand All @@ -99,7 +99,7 @@
"id": "0685adaf",
"metadata": {},
"source": [
"The [`bootstrap`](#1)[1] is a simple but powerful technique. It was [first described] (https://projecteuclid.org/euclid.aos/1176344552) by [Bradley Efron](https://statistics.stanford.edu/people/bradley-efron).\n",
"The [`bootstrap`](#1)[1] is a simple but powerful technique. It was [first described](https://projecteuclid.org/euclid.aos/1176344552) by [Bradley Efron](https://statistics.stanford.edu/people/bradley-efron).\n",
"\n",
"It creates multiple *resamples* (with replacement) from a single set of\n",
"observations, and computes the effect size of interest on each of these\n",
Expand Down Expand Up @@ -134,11 +134,7 @@
"the Central Limit Theorem, the resampling distribution of the effect size will\n",
"approach a normality.\n",
"\n",
"2. *Easy construction of the 95% CI from the resampling distribution.* For 1000\n",
"bootstrap resamples of the mean difference, one can use the 25th value and the\n",
"975th value of the ranked differences as boundaries of the 95% confidence\n",
"interval. (This captures the central 95% of the distribution.) Such an interval\n",
"construction is known as a *percentile interval*."
"2. *Easy construction of the 95% CI from the resampling distribution.* In the context of bootstrap resampling or other non-parametric methods, the 2.5th and 97.5th percentiles are often used to define the lower and upper limits, respectively. The use of these percentiles ensures that the resulting interval contains the central 95% of the resampled distribution. Such an interval construction is known as a *percentile interval*."
]
},
{
Expand All @@ -156,12 +152,10 @@
"source": [
"While resampling distributions of the difference in means often have a normal\n",
"distribution, it is not uncommon to encounter a skewed distribution. Thus, Efron\n",
"developed the [bias-corrected and accelerated bootstrap]\n",
"(https://en.wikipedia.org/wiki/Bootstrapping_(statistics)#History) (BCa\n",
"bootstrap) to account for the skew, and still obtain the central 95% of the\n",
"developed the [bias-corrected and accelerated bootstrap](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)#History) (BCa bootstrap) to account for the skew, and still obtain the central 95% of the\n",
"distribution.\n",
"\n",
"DABEST applies the BCa correction to the resampling bootstrap distributions of\n",
"**DABEST** applies the BCa correction to the resampling bootstrap distributions of\n",
"the effect size."
]
},
Expand All @@ -186,7 +180,7 @@
"id": "fb1a8fa6",
"metadata": {},
"source": [
"The estimation plot produced by DABEST presents the rawdata and the bootstrap\n",
"The estimation plot produced by DABEST presents the raw data and the bootstrap\n",
"confidence interval of the effect size (the difference in means) side-by-side as\n",
"a single integrated plot."
]
Expand All @@ -204,7 +198,7 @@
"id": "eaad7dd5",
"metadata": {},
"source": [
"It thus tightly couples visual presentation of the raw data with an indication of the population mean difference, and its confidence interval."
"Thus, it tightly couples a visual presentation of the raw data with an indication of the population mean difference plus its confidence interval."
]
},
{
Expand All @@ -215,14 +209,6 @@
"<a id='1'></a>\n",
"`[1]`: The name is derived from the saying \"[pull oneself by one's bootstraps](https://en.wiktionary.org/wiki/pull_oneself_up_by_one%27s_bootstraps)\", often used as an exhortation to achieve success without external help.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87e5611b",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit 9450252

Please sign in to comment.