Skip to content

Commit

Permalink
Deploying to gh-pages from @ e2bc27a 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
ismayc committed Dec 6, 2024
1 parent 201c185 commit f8e2106
Show file tree
Hide file tree
Showing 65 changed files with 1,463 additions and 1,552 deletions.
2 changes: 1 addition & 1 deletion v2/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ <h1>Page not found<a class="anchor" aria-label="anchor" href="#page-not-found"><
<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

<div class="col-12 col-md-6 mt-3">
<p>"<strong>Statistical Inference via Data Science</strong>: A ModernDive into R and the Tidyverse <br> (Second Edition)" was written by Chester Ismay, Albert Y. Kim, and Arturo Valdivia <br> Foreword by Kelly S. McConville. It was last built on December 03, 2024.</p>
<p>"<strong>Statistical Inference via Data Science</strong>: A ModernDive into R and the Tidyverse <br> (Second Edition)" was written by Chester Ismay, Albert Y. Kim, and Arturo Valdivia <br> Foreword by Kelly S. McConville. It was last built on December 06, 2024.</p>
</div>

<div class="col-12 col-md-6 mt-3">
Expand Down
Binary file modified v2/ModernDive.pdf
Binary file not shown.
1,262 changes: 616 additions & 646 deletions v2/ModernDive.tex

Large diffs are not rendered by default.

Binary file modified v2/ModernDive_files/figure-html/boot-distn-slopes-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/t-curve-hypo-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-212-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-674-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-675-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-679-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-684-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-693-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-704-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-707-1.png
Binary file modified v2/ModernDive_files/figure-html/unnamed-chunk-708-1.png
2 changes: 1 addition & 1 deletion v2/about-the-authors.html
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ <h1>About the authors<a class="anchor" aria-label="anchor" href="#about-the-auth
<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

<div class="col-12 col-md-6 mt-3">
<p>"<strong>Statistical Inference via Data Science</strong>: A ModernDive into R and the Tidyverse <br> (Second Edition)" was written by Chester Ismay, Albert Y. Kim, and Arturo Valdivia <br> Foreword by Kelly S. McConville. It was last built on December 03, 2024.</p>
<p>"<strong>Statistical Inference via Data Science</strong>: A ModernDive into R and the Tidyverse <br> (Second Edition)" was written by Chester Ismay, Albert Y. Kim, and Arturo Valdivia <br> Foreword by Kelly S. McConville. It was last built on December 06, 2024.</p>
</div>

<div class="col-12 col-md-6 mt-3">
Expand Down
22 changes: 11 additions & 11 deletions v2/appendixA.html
Original file line number Diff line number Diff line change
Expand Up @@ -233,25 +233,25 @@ <h3>
<span class="header-section-number">A.2.1</span> Additional normal calculations<a class="anchor" aria-label="anchor" href="#additional-normal-calculations"><i class="fas fa-link"></i></a>
</h3>
<p>For a normal density curve, the probabilities or areas for any given interval can be obtained using the R function <code><a href="https://rdrr.io/r/stats/Normal.html">pnorm()</a></code>. Think of the <code>p</code> in the name as __p__robability or __p__ercentage as this function finds the area under the curve to the left of any given value which is the probability of observing any number less than or equal to that value. It is possible to indicate the appropriate expected value and standard deviation as arguments in the function, but the default uses the standard normal values, <span class="math inline">\(\mu = 0\)</span> and <span class="math inline">\(\sigma = 1\)</span>. For example, the probability of observing a value that is less than or equal to 1 in the standard normal curve is given by:</p>
<div class="sourceCode" id="cb575"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb573"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">pnorm</a></span><span class="op">(</span><span class="fl">1</span><span class="op">)</span></span></code></pre></div>
<pre><code>[1] 0.841</code></pre>
<p>or 84%. This is the probability of observing a value that is less than or equal to one standard deviation above the mean.</p>
<p>Similarly, the probability of observing a standard value between -1 and 1 is given by subtracting the area to the left of -1 from the area to the left of 1. In R, we obtain this probability as follows:</p>
<div class="sourceCode" id="cb577"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb575"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">pnorm</a></span><span class="op">(</span><span class="fl">1</span><span class="op">)</span> <span class="op">-</span> <span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">pnorm</a></span><span class="op">(</span><span class="op">-</span><span class="fl">1</span><span class="op">)</span></span></code></pre></div>
<pre><code>[1] 0.683</code></pre>
<p>The probability of getting a standard value between -1 and 1, or equivalently, the probability of observing a value within one standard deviation from the mean is about 68%. Similarly, the probability of getting a value within 2 standard deviations from the mean is given by</p>
<div class="sourceCode" id="cb579"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb577"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">pnorm</a></span><span class="op">(</span><span class="fl">2</span><span class="op">)</span> <span class="op">-</span> <span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">pnorm</a></span><span class="op">(</span><span class="op">-</span><span class="fl">2</span><span class="op">)</span></span></code></pre></div>
<pre><code>[1] 0.954</code></pre>
<p>or about 95%.</p>
<p>Moreover, we do not need to restrict our study to areas within one or two standard deviations from the mean. We can find the number of standard deviations needed for any desired percentage around the mean using the R function <code><a href="https://rdrr.io/r/stats/Normal.html">qnorm()</a></code>. The <code>q</code> in the name stands for <span class="math inline">\(quantile\)</span> and this function can be thought of as the inverse or complement of <code><a href="https://rdrr.io/r/stats/Normal.html">pnorm()</a></code>. It finds the value of the random variable for a given area under the curve to the left of this value. When using the standard normal, the quantile also represents the number of standard deviations. For example, we learned that the area under the standard normal curve to the left of a standard value of 1 was approximately 84%. If instead, we want to find the standard value that corresponds to exactly an area of 84% under the curve to the left of this value, we can use the following syntax:</p>
<div class="sourceCode" id="cb581"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb579"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">qnorm</a></span><span class="op">(</span><span class="fl">0.84</span><span class="op">)</span></span></code></pre></div>
<pre><code>[1] 0.994</code></pre>
<p>In other words, there is exactly an 84% chance that the observed standard value is less than or equal to 0.994. Similarly, to have exactly a 95% chance of obtaining a value within <code>q</code> number of standard deviations from the mean, we need to select the appropriate value for <code><a href="https://rdrr.io/r/stats/Normal.html">qnorm()</a></code>.</p>
<div class="sourceCode" id="cb583"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb581"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://ggplot2.tidyverse.org/reference/ggplot.html">ggplot</a></span><span class="op">(</span><span class="cn">NULL</span>, <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/aes.html">aes</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">4</span>,<span class="fl">4</span><span class="op">)</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span></span>
<span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_ribbon.html">geom_area</a></span><span class="op">(</span>stat <span class="op">=</span> <span class="st">"function"</span>, fun <span class="op">=</span> <span class="va">dnorm</span>, fill <span class="op">=</span> <span class="st">"grey100"</span>, xlim <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">4</span>, <span class="op">-</span><span class="fl">2</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span></span>
<span> <span class="fu"><a href="https://ggplot2.tidyverse.org/reference/geom_ribbon.html">geom_area</a></span><span class="op">(</span>stat <span class="op">=</span> <span class="st">"function"</span>, fun <span class="op">=</span> <span class="va">dnorm</span>, fill <span class="op">=</span> <span class="st">"grey80"</span>, xlim <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="op">-</span><span class="fl">2</span>, <span class="fl">2</span><span class="op">)</span><span class="op">)</span> <span class="op">+</span></span>
Expand All @@ -263,13 +263,13 @@ <h3>
<span> color<span class="op">=</span><span class="st">"blue"</span><span class="op">)</span></span></code></pre></div>
<div class="inline-figure"><img src="ModernDive_files/figure-html/normal-curve-shaded-3-1.png" width="\textwidth" style="display: block; margin: auto;"></div>
<p>We want to find the standard value <code>q</code> such that the area in the middle is exactly 0.95 (or 95%). Before using <code><a href="https://rdrr.io/r/stats/Normal.html">qnorm()</a></code> we need to provide the total area under the curve to the left of <code>q</code>. Since the total area under the normal density curve is 1, the curve is symmetric, and the area in the middle is 0.95, the total area on the tails is 1 - 0.95 = 0.05 (or 5%), and the area on each tail is 0.05/2 = 0.025 (or 2.5%). The total area under the curve to the left of <code>q</code> will be the area in the middle and the area on the left tail or 0.95 + 0.025 = 0.975. We can now obtain the standard value <code>q</code> by using <code><a href="https://rdrr.io/r/stats/Normal.html">qnorm()</a></code>:</p>
<div class="sourceCode" id="cb584"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb582"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">q</span> <span class="op">&lt;-</span> <span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">qnorm</a></span><span class="op">(</span><span class="fl">0.975</span><span class="op">)</span></span>
<span><span class="va">q</span></span></code></pre></div>
<pre><code>[1] 1.96</code></pre>
<p>The probability of observing a value within 1.96 standard deviations from the mean is exactly 95%.</p>
<p>We can follow this method to obtain the number of standard deviations needed for any area, or probability, around the mean. For example, if we want an area of 98% around the mean, the area on the tails is 1 - 0.98 = 0.02, or 0.02/2 = 0.01 on each tail, the area under the curve to the left of the desired <code>q</code> value would be 0.98 + 0.01 = 0.99 so</p>
<div class="sourceCode" id="cb586"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb584"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/Normal.html">qnorm</a></span><span class="op">(</span><span class="fl">0.99</span><span class="op">)</span></span></code></pre></div>
<pre><code>[1] 2.33</code></pre>
<p>The area within 2.33 standard deviations from the mean is 98%, or there is a 98% chance of choosing a value within 2.33 standard deviations from the mean. This information will be very useful to us.</p>
Expand All @@ -283,7 +283,7 @@ <h2>
In addition, the <span class="math inline">\(t\)</span> distribution requires one additional parameter, the degrees of freedom. For the sample mean problems, the degrees of freedom needed are exactly <span class="math inline">\(n-1\)</span>, the size of the samples minus one.</p>
<p>We construct again a 95% confidence interval for the population mean, but this time using the sample standard deviation to estimate the standard error and the <span class="math inline">\(t\)</span> distribution to determine how wide the confidence interval should be.</p>
<p>We start by obtaining the sample statistics:</p>
<div class="sourceCode" id="cb588"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb586"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">almonds_sample_100</span> <span class="op">|&gt;</span> </span>
<span><span class="fu"><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize</a></span><span class="op">(</span>mean_weight <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/mean.html">mean</a></span><span class="op">(</span><span class="va">weight</span><span class="op">)</span>,</span>
<span>sd_weight <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/stats/sd.html">sd</a></span><span class="op">(</span><span class="va">weight</span><span class="op">)</span>,</span>
Expand All @@ -293,12 +293,12 @@ <h2>
&lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;
1 3.682 0.362199 100</code></pre>
<p>To obtain the number of standard deviations on the <span class="math inline">\(t\)</span> distribution to account for 95% of the values, we proceed as we did in the normal case: the area in the middle is 0.95, so the area on the tails is 1-0.95 = 0.05. Since the <span class="math inline">\(t\)</span> distribution is also symmetric, the area on each tail is 0.05/2 - 0.025. The number of standard deviation around the center is given by the value <span class="math inline">\(q\)</span> such as the area under the <span class="math inline">\(t\)</span> curve to the left of <span class="math inline">\(q\)</span> is exactly <span class="math inline">\(0.95 + 0.025 = 0.975\)</span>. Using R we get:</p>
<div class="sourceCode" id="cb590"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb588"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/stats/TDist.html">qt</a></span><span class="op">(</span><span class="fl">0.975</span>, df <span class="op">=</span> <span class="fl">100</span> <span class="op">-</span> <span class="fl">1</span><span class="op">)</span></span></code></pre></div>
<pre><code>[1] 1.98</code></pre>
<p>So, in order to account for 95% of the observations around the mean, we need to take into account all the values within 1.98 standard deviation from the mean. Compare this number with the 1.96 obtained for the standard normal; the difference is due to the fact that the <span class="math inline">\(t\)</span> curve has thicker tails than the standard normal.
We can now construct the 95% confidence interval</p>
<div class="sourceCode" id="cb592"><pre class="downlit sourceCode r">
<div class="sourceCode" id="cb590"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="va">xbar</span> <span class="op">&lt;-</span> <span class="fl">3.682</span> </span>
<span><span class="va">se_xbar</span> <span class="op">&lt;-</span> <span class="fl">0.362</span><span class="op">/</span><span class="fu"><a href="https://rdrr.io/r/base/MathFun.html">sqrt</a></span><span class="op">(</span><span class="fl">100</span><span class="op">)</span></span>
<span><span class="va">lower_bound</span> <span class="op">&lt;-</span> <span class="va">xbar</span> <span class="op">-</span> <span class="fl">1.98</span> <span class="op">*</span> <span class="va">se_xbar</span></span>
Expand Down Expand Up @@ -485,7 +485,7 @@ <h2>
<footer class="bg-primary text-light mt-5"><div class="container"><div class="row">

<div class="col-12 col-md-6 mt-3">
<p>"<strong>Statistical Inference via Data Science</strong>: A ModernDive into R and the Tidyverse <br> (Second Edition)" was written by Chester Ismay, Albert Y. Kim, and Arturo Valdivia <br> Foreword by Kelly S. McConville. It was last built on December 03, 2024.</p>
<p>"<strong>Statistical Inference via Data Science</strong>: A ModernDive into R and the Tidyverse <br> (Second Edition)" was written by Chester Ismay, Albert Y. Kim, and Arturo Valdivia <br> Foreword by Kelly S. McConville. It was last built on December 06, 2024.</p>
</div>

<div class="col-12 col-md-6 mt-3">
Expand Down
Loading

0 comments on commit f8e2106

Please sign in to comment.