Skip to content

Commit

Permalink
bayes sol
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderthclark committed Mar 18, 2024
1 parent 7a2274a commit 776225b
Show file tree
Hide file tree
Showing 48 changed files with 1,887 additions and 456 deletions.
Binary file modified .DS_Store
Binary file not shown.
Binary file modified book/.DS_Store
Binary file not shown.
Binary file modified book/_build/.doctrees/chancevar.doctree
Binary file not shown.
Binary file modified book/_build/.doctrees/chancevary.doctree
Binary file not shown.
Binary file modified book/_build/.doctrees/correlation.doctree
Binary file not shown.
Binary file modified book/_build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified book/_build/.doctrees/intro.doctree
Binary file not shown.
Binary file modified book/_build/.doctrees/normal.doctree
Binary file not shown.
Binary file modified book/_build/.doctrees/solutions.doctree
Binary file not shown.
233 changes: 233 additions & 0 deletions book/_build/html/_images/HTrandomboxTree.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
271 changes: 271 additions & 0 deletions book/_build/html/_images/HTrandomboxTree2Draws.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
342 changes: 171 additions & 171 deletions book/_build/html/_images/chanceErrorHist.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions book/_build/html/_sources/chancevar.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,3 +125,5 @@ The range 470 to 530 is therefore the expected sum $\pm$ SE. The sum falls in th
```

The normal approximation is justified because of the **central limit theorem**. The central limit theorem states that, when random draws are made replacement from a box, the distribution for the sum of these draws will approximate a normal distribution. This is true even if the contents of the box do not follow a normal curve as long as the number of draws is large.

The central limit theorem is a big deal.
15 changes: 10 additions & 5 deletions book/_build/html/_sources/chancevary.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,21 @@
"cells": [
{
"cell_type": "markdown",
"id": "d0b7dbc4",
"id": "70b05281",
"metadata": {},
"source": [
"(chance)=\n",
"# Chance Error\n",
"# Chance Variability\n",
"\n",
"```{admonition} Important Readings\n",
":class: seealso\n",
"- {cite}`freedman2007statistics`, Chapter 16, 17, 18\n",
"```\n",
"\n",
"In this section, we begin to marry data and probability. This is exciting because it provides the foundation for statistical inference. \n",
"\n",
"## Chance Errors\n",
"\n",
"A fair coin is expected to land heads 50% of the time. This is expressed by the probability as a long-run frequency,\n",
"\n",
"$$\\mathbb{P}(H) = \\frac{ \\text{number of heads}} {\\text{number of tosses}}.$$\n",
Expand Down Expand Up @@ -148,7 +152,7 @@
{
"cell_type": "code",
"execution_count": 7,
"id": "eb53806e",
"id": "39c75ecb",
"metadata": {
"tags": [
"remove-input"
Expand Down Expand Up @@ -582,7 +586,7 @@
},
{
"cell_type": "markdown",
"id": "74372a1c",
"id": "d91040a8",
"metadata": {},
"source": [
"Above, for a sufficiently large number of draws, we see a familiar bell-shaped curve. If the value of the sum is converted to standard units, this will approximate the standard normal curve. \n",
Expand All @@ -592,13 +596,14 @@
"width: 90%\n",
"name: standardizedprobhistogram\n",
"---\n",
"Probability histogram for the sum of 100 draws from a box with 9 zeros and a single one.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e26dbcfe",
"id": "b0a97a52",
"metadata": {},
"outputs": [],
"source": []
Expand Down
86 changes: 85 additions & 1 deletion book/_build/html/_sources/solutions.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ A single row will contain the incomes for two partnered individuals, but an $x,y
```{solution-end}
```

## [Probability](probability)
## [Probability I](probability)

```{solution-start} conditional
:class: dropdown
Expand All @@ -203,5 +203,89 @@ $\mathbb{P}(A \mid B) + \mathbb{P}(\text{not }A \mid B)=1$ for any $A,B$.
Binomial coefficients tell you how many ways you can select $k$ items from a list of $n$, but the $k$ items are not ordered.

The president/VP pair is ordered and the co-preseident pair is not. Only the co-president pairings are counted by $\binom{3}{2}$. There will be twice as many president/VP pairings because (Alice, Bob) and (Bob, Alice) are considered distinct.
```{solution-end}
```

## [Probability II](bayes)


```{solution-start} boxes
:class: dropdown
```

1. $\mathbb{P}(H) = 0.5$
2. $\mathbb{P}(\text{box with two Ts} \mid H) = 0$ because the box must have an $H$ to draw an $H$.
3. $$\mathbb{P}(\text{box with two Hs} \mid H) = \frac{ \mathbb{P}(H \mid \text{box with two Hs}) \mathbb{P}(\text{box with two Hs})}{\mathbb{P}(H)} $$

$$= \frac{1\times 0.25}{0.5} = 0.25$$

4. To find $\mathbb{P}(\text{H on second draw} \mid \text{H on first draw})$, the subtlety is that these are not independent. The unconditional probability is $\mathbb{P}(\text{H on second draw}) = 0.5$. This allows for the possibility of a box with two $T$s. An $H$ on the first draw reveals that the box does not have two $T$s, which will push our probability up. The intuititive answer of 0.5 is therefore *wrong*. For a similar problem, see also the famously difficult [boy girl paradox](https://en.wikipedia.org/wiki/Boy_or_girl_paradox).


**4 - Solution 1**
The first draw reveals there is an $H$ in the box. By part 3, we know there is a 0.5 probability the box contains two $H$s. The probability of an $H$ on the second draw is



$$ \underbrace{\frac{1}{2}}_{a} \cdot \overbrace{1}^{b} + \underbrace{\frac{1}{2}}_{c} \cdot \overbrace{\frac{1}{2}}^{b} = \frac{3}{4}.$$


| Term | Probability of ... given $H$ on first draw |
|------|----------------------------------------------------------|
| $a$ | box with two $H$s |
| $b$ | $H$ given two two-$H$ box |
| $c$ | box with one $H$ and one $T$ |
| $d$ | $H$ given two one-$H$-one-$T$ box |

Each ticket is marked $H$ or $T$ independently and with equal chance, so $d$ is $\frac{1}{2}$.


**4 - Solution 2 (my favorite)**

This solution is remarkably similar to the previous, but with a different interpretation. The first draw reveals there is an $H$ in the box. Half of the time you will draw the same ticket on your second draw. The other half of the time you draw the other ticket.

The conditional probability of an $H$ on the second draw given an $H$ on the first draw can be expanded as the sum of the probability of *the same ticket and $H$* and the probability of *a different ticket and $H$*, all conditional on $H$ on the first draw. This is

$$ \underbrace{\frac{1}{2}}_{\alpha} \cdot \overbrace{1}^{\beta} + \underbrace{\frac{1}{2}}_{\gamma} \cdot \overbrace{\frac{1}{2}}^{\delta} = \frac{3}{4}.$$

| Term | Probability of ... given $H$ on first draw |
|----------|--------------------------------------------------------|
| $\alpha$ | the same ticket (an $H$). |
| $\beta$ | $H$ given the same ticket. |
| $\gamma$ | drawing the other ticket (which can be $H$ or $T$) |
| $\delta$ | $H$ given the other ticket |


Each ticket is marked $H$ or $T$ independently and with equal chance, so $\delta$ is $\frac{1}{2}$.

**4 - Solution 3 (the worst)**

Now we attempt to solve this by applying Bayes Theorem directly. It's important to get the notation clear. Let $H_2$ denote getting an $H$ on the second draw and $H_1$ is an $H$ on the first. We want to find $\mathbb{P}(H_2 \mid H_1)$. Bayes Theorem tells us

$$\mathbb{P}(H_2 \mid H_1) = \frac{\mathbb{P}(H_1 \mid H_2) \mathbb{P}(H_2)}{\mathbb{P}(H_1)}.$$

This doesn't do much to simplify the problem because $\mathbb{P}(H_1 \mid H_2)$ is no easier to solve for and we'll end up repeating calculations from Solution 1 or Solution 2 to find $\mathbb{P}(H_1 \mid H_2) \mathbb{P}(H_2) = \mathbb{P}(H_2 \text{ and } H_1).$ It's tempting to write this as $\mathbb{P}(H_1) \times \mathbb{P}(H_2)$, but these are dependent events so we can't. A tree is helpful.

For a single draw.
```{figure} images/tikz/HTrandomboxTree.svg
:width: 70%
:name: HTrandomboxTree2
```

For two draws.
```{figure} images/tikz/HTrandomboxTree2Draws.svg
:width: 70%
:name: HTrandomboxTree2Draws
```

Therefore the probability of $H_1$ and $H_2$ is $0.5\times 0.25 + 0.25 \times 1 = \frac{3}{8}$. Plugging this into the formula for $\mathbb{P}(H_2 \mid H_1)$, we get

$$\dfrac{\frac{3}{8}}{\frac{1}{2}} = \frac{3}{4}.$$

It follows that $\mathbb{P}(H_1 \mid H_2)$ is also $\frac{3}{4}$ because the unconditional probabilities of $H_1$ and $H_2$ are both $\frac{1}{2}$.

[Here is a Google Sheets simulation](https://docs.google.com/spreadsheets/d/1xlryzoPWZ05K4SeHVzZHiCC4yQ-XhLVaZChy-gYoWRc/edit?usp=sharing).


```{solution-end}
```
8 changes: 4 additions & 4 deletions book/_build/html/bayes.html
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@
<link rel="shortcut icon" href="_static/norm_favicon.ico"/>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Chance Error" href="chancevar.html" />
<link rel="next" title="Chance Variability" href="chancevary.html" />
<link rel="prev" title="Probability" href="probability.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
Expand Down Expand Up @@ -195,7 +195,7 @@
<li class="toctree-l1"><a class="reference internal" href="regression.html">Regression</a></li>
<li class="toctree-l1"><a class="reference internal" href="probability.html">Probability</a></li>
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Probability II</a></li>
<li class="toctree-l1"><a class="reference internal" href="chancevar.html">Chance Error</a></li>
<li class="toctree-l1"><a class="reference internal" href="chancevary.html">Chance Variability</a></li>
<li class="toctree-l1"><a class="reference internal" href="bibliography.html">Bibliography</a></li>
</ul>
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Google Sheets (optional)</span></p>
Expand Down Expand Up @@ -692,11 +692,11 @@ <h2>Exercises<a class="headerlink" href="#exercises" title="Permalink to this he
</div>
</a>
<a class="right-next"
href="chancevar.html"
href="chancevary.html"
title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">Chance Error</p>
<p class="prev-next-title">Chance Variability</p>
</div>
<i class="fa-solid fa-angle-right"></i>
</a>
Expand Down
2 changes: 1 addition & 1 deletion book/_build/html/centerspread.html
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@
<li class="toctree-l1"><a class="reference internal" href="regression.html">Regression</a></li>
<li class="toctree-l1"><a class="reference internal" href="probability.html">Probability</a></li>
<li class="toctree-l1"><a class="reference internal" href="bayes.html">Probability II</a></li>
<li class="toctree-l1"><a class="reference internal" href="chancevar.html">Chance Error</a></li>
<li class="toctree-l1"><a class="reference internal" href="chancevary.html">Chance Variability</a></li>
<li class="toctree-l1"><a class="reference internal" href="bibliography.html">Bibliography</a></li>
</ul>
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Google Sheets (optional)</span></p>
Expand Down
29 changes: 5 additions & 24 deletions book/_build/html/chancevar.html
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,6 @@
<link rel="shortcut icon" href="_static/norm_favicon.ico"/>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Bibliography" href="bibliography.html" />
<link rel="prev" title="Probability II" href="bayes.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
</head>
Expand Down Expand Up @@ -181,7 +179,7 @@
</li>
</ul>
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Main Text</span></p>
<ul class="current nav bd-sidenav">
<ul class="nav bd-sidenav">
<li class="toctree-l1"><a class="reference internal" href="gettingstarted.html">Getting Started</a></li>
<li class="toctree-l1"><a class="reference internal" href="experiments.html">Experiments</a></li>

Expand All @@ -195,7 +193,7 @@
<li class="toctree-l1"><a class="reference internal" href="regression.html">Regression</a></li>
<li class="toctree-l1"><a class="reference internal" href="probability.html">Probability</a></li>
<li class="toctree-l1"><a class="reference internal" href="bayes.html">Probability II</a></li>
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Chance Error</a></li>
<li class="toctree-l1"><a class="reference internal" href="chancevary.html">Chance Variability</a></li>
<li class="toctree-l1"><a class="reference internal" href="bibliography.html">Bibliography</a></li>
</ul>
<p aria-level="2" class="caption" role="heading"><span class="caption-text">Google Sheets (optional)</span></p>
Expand Down Expand Up @@ -465,7 +463,7 @@ <h2> Contents </h2>
<figure class="align-default" id="chanceerrorhist">
<a class="reference internal image-reference" href="_images/chanceErrorHist.svg"><img alt="_images/chanceErrorHist.svg" src="_images/chanceErrorHist.svg" width="90%" /></a>
<figcaption>
<p><span class="caption-number">Fig. 44 </span><span class="caption-text">This shows the results from 10 and 100 tosses of a coin, repeated 1000 times each. The histogram for 100 tosses is more spread out in absolute terms. But, in percentage terms, more of the 1000 trials yield close to 50% heads.</span><a class="headerlink" href="#chanceerrorhist" title="Permalink to this image">#</a></p>
<p><span class="caption-text">This shows the results from 10 and 100 tosses of a coin, repeated 1000 times each. The histogram for 100 tosses is more spread out in absolute terms. But, in percentage terms, more of the 1000 trials yield close to 50% heads.</span><a class="headerlink" href="#chanceerrorhist" title="Permalink to this image">#</a></p>
</figcaption>
</figure>
<p>In absolute terms, the chance error will increase, but more slowly than the number of tosses. In fact, if the number of tosses increases 2x, the chance error increases only by a factor of <span class="math notranslate nohighlight">\(\sqrt{2}\)</span>. If the number of tosses increases 100x, the chance error increases only by a factor of 10.</p>
Expand All @@ -475,7 +473,7 @@ <h2>Sums from Boxes<a class="headerlink" href="#sums-from-boxes" title="Permalin
<figure class="align-default" id="box01">
<a class="reference internal image-reference" href="_images/box01.svg"><img alt="_images/box01.svg" src="_images/box01.svg" width="22%" /></a>
<figcaption>
<p><span class="caption-number">Fig. 45 </span><span class="caption-text">This is a 0-1 box representing a fair coin flip.</span><a class="headerlink" href="#box01" title="Permalink to this image">#</a></p>
<p><span class="caption-text">This is a 0-1 box representing a fair coin flip.</span><a class="headerlink" href="#box01" title="Permalink to this image">#</a></p>
</figcaption>
</figure>
<p>There is chance variability based on the <em>number of draws</em> and the actual chance inherent in a process. Suppose 100 draws are made with replacement from one of the boxes below. You will win $1 if you can guess the sum within 10.</p>
Expand Down Expand Up @@ -544,6 +542,7 @@ <h2>Normal Approximation<a class="headerlink" href="#normal-approximation" title
<p class="sd-card-text">The range 470 to 530 is therefore the expected sum <span class="math notranslate nohighlight">\(\pm\)</span> SE. The sum falls in this range 68% of the time.</p>
</div>
</details><p>The normal approximation is justified because of the <strong>central limit theorem</strong>. The central limit theorem states that, when random draws are made replacement from a box, the distribution for the sum of these draws will approximate a normal distribution. This is true even if the contents of the box do not follow a normal curve as long as the number of draws is large.</p>
<p>The central limit theorem is a big deal.</p>
<hr class="footnotes docutils" />
<aside class="footnote brackets" id="id3" role="note">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id2">1</a><span class="fn-bracket">]</span></span>
Expand Down Expand Up @@ -582,24 +581,6 @@ <h2>Normal Approximation<a class="headerlink" href="#normal-approximation" title
<footer class="prev-next-footer">

<div class="prev-next-area">
<a class="left-prev"
href="bayes.html"
title="previous page">
<i class="fa-solid fa-angle-left"></i>
<div class="prev-next-info">
<p class="prev-next-subtitle">previous</p>
<p class="prev-next-title">Probability II</p>
</div>
</a>
<a class="right-next"
href="bibliography.html"
title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
<p class="prev-next-title">Bibliography</p>
</div>
<i class="fa-solid fa-angle-right"></i>
</a>
</div>
</footer>

Expand Down
Loading

0 comments on commit 776225b

Please sign in to comment.