chi2 exercise

alexanderthclark · Apr 19, 2024 · 448dc91 · 448dc91
1 parent 581096a
commit 448dc91
Show file tree

Hide file tree

Showing 21 changed files with 1,745 additions and 10 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/.gitignore b/.gitignore
@@ -55,3 +55,4 @@ book/.ipynb_checkpoints/hypothesistesting-checkpoint.ipynb
 book/.ipynb_checkpoints/chi2-checkpoint.ipynb
 helpers/.ipynb_checkpoints/chi2helper-checkpoint.ipynb
 Data/.ipynb_checkpoints/TSwiftSongsAndLyrics-checkpoint.ipynb
+book/.ipynb_checkpoints/power-checkpoint.ipynb
diff --git a/book/.ipynb_checkpoints/moreAboutSig-checkpoint.ipynb b/book/.ipynb_checkpoints/moreAboutSig-checkpoint.ipynb
@@ -0,0 +1,51 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "68168b94",
+   "metadata": {},
+   "source": [
+    "(more)=\n",
+    "# More About Tests of Significance\n",
+    "\n",
+    "```{admonition} Important Readings\n",
+    ":class: seealso\n",
+    "- {cite}`freedman2007statistics`, Chapters 29\n",
+    "```\n",
+    "\n",
+    "## \n",
+    "\n",
+    "\n",
+    "## Data Snooping\n",
+    "\n",
+    "\n",
+    "## Was the result important? \n",
+    "\n",
+    "\n",
+    "## Garbage In, Garbage Out\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/book/_build/.doctrees/chi2.doctree b/book/_build/.doctrees/chi2.doctree
diff --git a/book/_build/.doctrees/environment.pickle b/book/_build/.doctrees/environment.pickle
diff --git a/book/_build/.doctrees/moreAboutSig.doctree b/book/_build/.doctrees/moreAboutSig.doctree
diff --git a/book/_build/.doctrees/power.doctree b/book/_build/.doctrees/power.doctree
diff --git a/book/_build/html/_sources/chi2.ipynb b/book/_build/html/_sources/chi2.ipynb
@@ -816,7 +816,7 @@
   {
    "cell_type": "code",
    "execution_count": 9,
-   "id": "7b04429d",
+   "id": "23c2b0fc",
    "metadata": {},
    "outputs": [
     {
@@ -930,6 +930,15 @@
     "5. Is a six-sided die fair? \n",
     "\n",
     "```{exercise-end}\n",
+    "```\n",
+    "\n",
+    "\n",
+    "```{exercise-start}\n",
+    ":label: chiTest1 \n",
+    "```\n",
+    "Travis's playlist contains 4 songs by Taylor Swift, 2 by Pavarotti, and 4 by Little Richard. He listens to 100 songs on shuffle mode, resulting in Swift being played 30 times, Pavarotti being played 25 times, and Little Richard being played 45 times. Use a $\\chi^2$-test to determine if shuffle mode is randomizing over each song with equal probability. \n",
+    "\n",
+    "```{exercise-end}\n",
     "```"
    ]
   }

diff --git a/book/_build/html/_sources/moreAboutSig.ipynb b/book/_build/html/_sources/moreAboutSig.ipynb
@@ -0,0 +1,51 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "68168b94",
+   "metadata": {},
+   "source": [
+    "(more)=\n",
+    "# More About Tests of Significance\n",
+    "\n",
+    "```{admonition} Important Readings\n",
+    ":class: seealso\n",
+    "- {cite}`freedman2007statistics`, Chapters 29\n",
+    "```\n",
+    "\n",
+    "## \n",
+    "\n",
+    "\n",
+    "## Data Snooping\n",
+    "\n",
+    "\n",
+    "## Was the result important? \n",
+    "\n",
+    "\n",
+    "## Garbage In, Garbage Out\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/book/_build/html/_sources/power.ipynb b/book/_build/html/_sources/power.ipynb
@@ -0,0 +1,74 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "fbdfc9d0",
+   "metadata": {},
+   "source": [
+    "(power)=\n",
+    "# Statistical Power\n",
+    "\n",
+    "\n",
+    "## Type I and II Errors\n",
+    "\n",
+    "A **Type I error** is made by rejecting the null hypothesis when the null hypothesis is true. A **Type II Error** is made when failing to reject the null hypothesis when the null hypothesis is false. While some mistakes, like snooping, are a matter of bad practice and can be avoided, Type I and II errors are unavoidable.\n",
+    "\n",
+    "Hypothesis testing involves a binary decision. If we compare this to the deliberations of a judge, it is the null hypothesis that is on trial. Rejecting the hypothesis is akin to judging it to be guilty. Failing to reject the null hypothesis is akin to acquitting it, and this might be considered a negative result. This corresponds to the taxonomy in the table below. \n",
+    "\n",
+    "Continuing the judicial analogy, a Type I error is convicting the true, innocent null hypothesis. A Type II error lets the false, crooked null hypothesis off the hook. \n",
+    "\n",
+    "Statisticians use $\\alpha$ and $\\beta$ to denote the Type I and II conditional error rates. I call them *conditional* error rates to emphasize that each is a conditional probability. \n",
+    "\n",
+    "$$\\alpha = \\mathbb{P}(\\text{reject }H_0 \\mid H_0\\text{ true})$$\n",
+    "    \n",
+    "$$\\beta = \\mathbb{P}(\\text{fail to reject }H_0 \\mid H_0\\text{ false})$$\n",
+    "\n",
+    "The value $\\alpha$ is familiar, being directly related to the confidence level. A test has an associated **power** level, $1-\\beta$. \n",
+    "\n",
+    "## Power\n",
+    "\n",
+    "The **power** of a test is the probability it will reject the null hypothesis if the null is false. \n",
+    "\n",
+    "For a given statistical test, the power depends on the significance level $\\alpha$ and the sample size $n$. \n",
+    "\n",
+    "First, the significance $\\alpha$ influences the power because you are determining how liberal or conservative to be with rejecting the null. A high $\\alpha$ means you will reject the null more often.\n",
+    "\n",
+    "\n",
+    "TKTK TK\n",
+    "\n",
+    "Suppose you wanted 95% power in the illustration above ($\\beta = 0.05$). According to the null hypothesis, a $z$-statistic will be drawn from a standard normal distribution (the top panel). We simplify the world to consider a single alternative hypothesis, under which the $z$-statistic is actually drawn from a distribution centered at two. To force $\\beta = 0.05$, the vertical line must be 1.645 standard deviations away from the mean of the alternative distribution. Accordingly our critical value is $z^\\star = 2-1.645 = 0.355$. A $z$-table helps show this corresponds to $\\alpha \\approx 0.361$.\n",
+    "\n",
+    "Second, $n$ increases power by lowering the standard errors and thus making the sampling distribution for the sample mean more narrow. This is because the standard error for such a distribution is $\\text{SE} = \\frac{\\text{SD}}{\\sqrt{n}}$. With less overlapping area, greater power is achieved. \n",
+    "\n",
+    "\n",
+    "#### Is power relevant in the world of big data? \n",
+    "\n",
+    "From Ronny Kohavi on [LinkedIn](https://www.linkedin.com/posts/ronnyk_abtesting-abtesting-statisticalpower-activity-6950341450593161216-dpTV):\n",
+    "\n",
+    "> If you think large companies with a massive userbase (Amazon, Google) have an easy time detecting tiny changes in A/B tests, you’re wrong! ... The largest companies cannot power experiments with enough users to detect a \\$10M loss.\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/book/_build/html/chi2.html b/book/_build/html/chi2.html