Skip to content

Commit

Permalink
chi2 exercise
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderthclark committed Apr 19, 2024
1 parent 581096a commit 448dc91
Show file tree
Hide file tree
Showing 21 changed files with 1,745 additions and 10 deletions.
Binary file modified .DS_Store
Binary file not shown.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,4 @@ book/.ipynb_checkpoints/hypothesistesting-checkpoint.ipynb
book/.ipynb_checkpoints/chi2-checkpoint.ipynb
helpers/.ipynb_checkpoints/chi2helper-checkpoint.ipynb
Data/.ipynb_checkpoints/TSwiftSongsAndLyrics-checkpoint.ipynb
book/.ipynb_checkpoints/power-checkpoint.ipynb
51 changes: 51 additions & 0 deletions book/.ipynb_checkpoints/moreAboutSig-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "68168b94",
"metadata": {},
"source": [
"(more)=\n",
"# More About Tests of Significance\n",
"\n",
"```{admonition} Important Readings\n",
":class: seealso\n",
"- {cite}`freedman2007statistics`, Chapters 29\n",
"```\n",
"\n",
"## \n",
"\n",
"\n",
"## Data Snooping\n",
"\n",
"\n",
"## Was the result important? \n",
"\n",
"\n",
"## Garbage In, Garbage Out\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Binary file modified book/_build/.doctrees/chi2.doctree
Binary file not shown.
Binary file modified book/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file added book/_build/.doctrees/moreAboutSig.doctree
Binary file not shown.
Binary file added book/_build/.doctrees/power.doctree
Binary file not shown.
11 changes: 10 additions & 1 deletion book/_build/html/_sources/chi2.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -816,7 +816,7 @@
{
"cell_type": "code",
"execution_count": 9,
"id": "7b04429d",
"id": "23c2b0fc",
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -930,6 +930,15 @@
"5. Is a six-sided die fair? \n",
"\n",
"```{exercise-end}\n",
"```\n",
"\n",
"\n",
"```{exercise-start}\n",
":label: chiTest1 \n",
"```\n",
"Travis's playlist contains 4 songs by Taylor Swift, 2 by Pavarotti, and 4 by Little Richard. He listens to 100 songs on shuffle mode, resulting in Swift being played 30 times, Pavarotti being played 25 times, and Little Richard being played 45 times. Use a $\\chi^2$-test to determine if shuffle mode is randomizing over each song with equal probability. \n",
"\n",
"```{exercise-end}\n",
"```"
]
}
Expand Down
51 changes: 51 additions & 0 deletions book/_build/html/_sources/moreAboutSig.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "68168b94",
"metadata": {},
"source": [
"(more)=\n",
"# More About Tests of Significance\n",
"\n",
"```{admonition} Important Readings\n",
":class: seealso\n",
"- {cite}`freedman2007statistics`, Chapters 29\n",
"```\n",
"\n",
"## \n",
"\n",
"\n",
"## Data Snooping\n",
"\n",
"\n",
"## Was the result important? \n",
"\n",
"\n",
"## Garbage In, Garbage Out\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
74 changes: 74 additions & 0 deletions book/_build/html/_sources/power.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "fbdfc9d0",
"metadata": {},
"source": [
"(power)=\n",
"# Statistical Power\n",
"\n",
"\n",
"## Type I and II Errors\n",
"\n",
"A **Type I error** is made by rejecting the null hypothesis when the null hypothesis is true. A **Type II Error** is made when failing to reject the null hypothesis when the null hypothesis is false. While some mistakes, like snooping, are a matter of bad practice and can be avoided, Type I and II errors are unavoidable.\n",
"\n",
"Hypothesis testing involves a binary decision. If we compare this to the deliberations of a judge, it is the null hypothesis that is on trial. Rejecting the hypothesis is akin to judging it to be guilty. Failing to reject the null hypothesis is akin to acquitting it, and this might be considered a negative result. This corresponds to the taxonomy in the table below. \n",
"\n",
"Continuing the judicial analogy, a Type I error is convicting the true, innocent null hypothesis. A Type II error lets the false, crooked null hypothesis off the hook. \n",
"\n",
"Statisticians use $\\alpha$ and $\\beta$ to denote the Type I and II conditional error rates. I call them *conditional* error rates to emphasize that each is a conditional probability. \n",
"\n",
"$$\\alpha = \\mathbb{P}(\\text{reject }H_0 \\mid H_0\\text{ true})$$\n",
" \n",
"$$\\beta = \\mathbb{P}(\\text{fail to reject }H_0 \\mid H_0\\text{ false})$$\n",
"\n",
"The value $\\alpha$ is familiar, being directly related to the confidence level. A test has an associated **power** level, $1-\\beta$. \n",
"\n",
"## Power\n",
"\n",
"The **power** of a test is the probability it will reject the null hypothesis if the null is false. \n",
"\n",
"For a given statistical test, the power depends on the significance level $\\alpha$ and the sample size $n$. \n",
"\n",
"First, the significance $\\alpha$ influences the power because you are determining how liberal or conservative to be with rejecting the null. A high $\\alpha$ means you will reject the null more often.\n",
"\n",
"\n",
"TKTK TK\n",
"\n",
"Suppose you wanted 95% power in the illustration above ($\\beta = 0.05$). According to the null hypothesis, a $z$-statistic will be drawn from a standard normal distribution (the top panel). We simplify the world to consider a single alternative hypothesis, under which the $z$-statistic is actually drawn from a distribution centered at two. To force $\\beta = 0.05$, the vertical line must be 1.645 standard deviations away from the mean of the alternative distribution. Accordingly our critical value is $z^\\star = 2-1.645 = 0.355$. A $z$-table helps show this corresponds to $\\alpha \\approx 0.361$.\n",
"\n",
"Second, $n$ increases power by lowering the standard errors and thus making the sampling distribution for the sample mean more narrow. This is because the standard error for such a distribution is $\\text{SE} = \\frac{\\text{SD}}{\\sqrt{n}}$. With less overlapping area, greater power is achieved. \n",
"\n",
"\n",
"#### Is power relevant in the world of big data? \n",
"\n",
"From Ronny Kohavi on [LinkedIn](https://www.linkedin.com/posts/ronnyk_abtesting-abtesting-statisticalpower-activity-6950341450593161216-dpTV):\n",
"\n",
"> If you think large companies with a massive userbase (Amazon, Google) have an easy time detecting tiny changes in A/B tests, you’re wrong! ... The largest companies cannot power experiments with enough users to detect a \\$10M loss.\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
13 changes: 10 additions & 3 deletions book/_build/html/chi2.html

Large diffs are not rendered by default.

Loading

0 comments on commit 448dc91

Please sign in to comment.