Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Sep 6, 2024
1 parent 32537a0 commit 7548a90
Show file tree
Hide file tree
Showing 4 changed files with 225 additions and 16 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
017cf547
87b8fe4a
211 changes: 210 additions & 1 deletion assignments/assignment1.html
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,216 @@ <h1 data-number="1"><span class="header-section-number">1</span> General informa
<section id="assignment-questions" class="level2" data-number="1.1">
<h2 data-number="1.1" class="anchored" data-anchor-id="assignment-questions"><span class="header-section-number">1.1</span> Assignment questions</h2>
<p>For convenience the assignment questions are copied below. Answer the questions in MyCourses.</p>
sed: can't read ../cloze/quiz1.txt: No such file or directory

<div>
<div>
<h3 class="anchored">Lecture 1/Chapter 1 of BDA Quiz (80% of grade)</h3>
<h3 class="anchored"><br></h3>
<h3 class="anchored">1. Terminology</h3>
</div>
<div><strong><br></strong></div>
</div>
<div><strong>
Match the following terms with the correct definition:</strong></div>
<div>Note that the answers order and set of possible answers is the same for questions 1.1 - 1.8. Check the BDA chapter 1, the lecture slides, and Wikipedia if you are uncertain about the terms below.&nbsp;</div>
<div><br>
</div>
<div>
<ul>
<li><u>1.1 Probability</u>:
<br></li>
<li><u>1.2 Probability mass (function)</u>: <br></li>
<li><u>1.3 Probability density (function)</u>:<br></li>
<li><u>1.4 Probability distribution:</u> <br></li>
<li><u>1.5 Discrete probability distribution:</u> <br></li>
<li><u>1.6 Continuous probability distribution:</u> <br></li>
<li><u>1.7 Cumulative distribution function (cdf):</u> <br></li>
<li><u>1.8 Likelihood:</u>
</li>
</ul>
<p><br></p>
<p></p>
<div>
<div>
<h3 class="anchored">2. Notation</h3>
</div>
<div><strong><br></strong></div>
</div>
<div><strong>
Match the following notation with the correct definition: </strong><br></div>
<div><br></div>
<ul>
<li>2.1 \( \sim \): <br></li>
<li>2.2 \( \propto \): <br></li>
<li>2.3 \( \text{E}[\cdot] \): <br></li>
<li>2.4 \( p(y | \theta) \): </li>
</ul>

<p></p>
<h3 class="anchored"><br></h3>
<h3 class="anchored">3. Bayes' Theorem 1</h3>
<p>A group of researchers has designed a new inexpensive and painless test
for detecting lung cancer. The test is intended to be an initial
screening test for the population in general. A positive result
(presence of lung cancer) from the test would be followed up immediately
with medication, surgery or more extensive and expensive test.&nbsp;</p>
<p>The
researchers know from their studies the following facts:</p>
<ul>
<li>Test gives a positive result in 98% <mjx-container jax="CHTML" tabindex="0" ctxtmenu_counter="5"><mjx-math aria-hidden="true"><mjx-mn><mjx-c></mjx-c><mjx-c></mjx-c></mjx-mn><mjx-mi><mjx-c></mjx-c></mjx-mi></mjx-math></mjx-container> of the time when the test subject has lung cancer.</li>
<li>Test gives a negative result in <mjx-container jax="CHTML" tabindex="0" ctxtmenu_counter="6"><mjx-math aria-hidden="true"><mjx-mn><mjx-c></mjx-c><mjx-c></mjx-c></mjx-mn><mjx-mi><mjx-c></mjx-c></mjx-mi></mjx-math></mjx-container> 96% of the time when the test subject does not have lung cancer.</li>
<li>In general population approximately one person in 1000 has lung cancer.</li>
</ul>
<p>Here are some probability values that can help you figure out if you
copied the right conditional probabilities from the question:</p>
<ul>
<li>P(Test gives positive | Subject does not have lung cancer) = 4%</li>
<li>P(Test gives positive <strong>and</strong> Subject has lung cancer) = 0.098%
<ul>
<li>
this is also referred to as the <strong>joint probability</strong> of test being positive and the subject having lung cancer
</li>
</ul>
</li>
</ul>
<p>Your goal is calculate the probability of having cancer given a positive test result: \( P(\text{cancer} | \text{positive}) \)<br></p>
<ul>
<li>3.1 Which quantity in Bayes' Theorem does this represent? <br></li>
<li>3.2 What is the probability of the test having a positive result, given that the test subject has cancer (P(B|A))? <br></li>
<li>3.3 What is the probability of having cancer (P(A))? <br></li>
<li>3.4 What is the probability of having a positive test (P(B))? <br></li>
<li>3.5 Using your previous answers, what is the probability of having cancer given a positive test?<br></li>
</ul>
<h3 class="anchored"><br></h3>
<h3 class="anchored">4. Bayes' Theorem 2</h3>
<p></p>
<p>We have three boxes, A, B, and C. There are</p>
<ul>
<li>2 red balls and 5 white balls in the box A</li>
<li>4 red balls and 1 white ball in the box B<br></li>
<li>1 red ball and 3 white balls in the box C.</li>
</ul>
<p>Consider a random experiment in which one of the boxes is randomly
selected and from that box, one ball is randomly picked up. After
observing the color of the ball it is replaced in the box it came from.
Suppose also that on average box A is selected 40% of the time and box B
<mjx-container jax="CHTML" tabindex="0" ctxtmenu_counter="10"><mjx-math aria-hidden="true"><mjx-mn><mjx-c></mjx-c><mjx-c></mjx-c></mjx-mn><mjx-mi><mjx-c></mjx-c></mjx-mi></mjx-math></mjx-container>10% of the time (i.e. P(A) = 0.4).
</p>
<ul>
<li>4.1 What is the probability of picking a red ball from box A? <br></li>
<li>4.2 What is the probability of picking a red ball from box B? <br></li>
<li>4.3 What is the probability of picking a red ball from box C? <br></li>
<li>4.4 Considering the probabilities of selecting each box, what is the probability of picking a red ball (enter as a number between 0 and 1 with 2 decimal digit accuracy)?<br><br><br></li>
<li>
4.5 If a red ball was picked, calculate the probability that it was picked from (enter as a number between 0 and 1 with 2 decimal digit accuracy):
<ul>
<li>Box A: <br></li>
<li>Box B: <br></li>
<li>Box C: <br></li>
</ul>
</li>
</ul>
<p></p>
<h3 class="anchored"></h3>
<h3 class="anchored"><br></h3>
<h3 class="anchored">5. Bayes' Theorem 3</h3>
<h3 class="anchored"></h3>
<p>Assume that on average fraternal twins (two fertilized eggs and then
could be of different sex) occur once in 150 births and identical twins
(single egg divides into two separate embryos, so both have the same
sex) once in 400 births (<strong>Note!</strong> This is not the true
value, see Exercise 1.6, page 28, in BDA3).&nbsp;Assume
that an equal number of boys and girls are born on average.</p>
<p>American male singer-actor
Elvis Presley (1935 – 1977) had a twin brother who died in birth, your
goal is to compute the probability that Elvis was an identical twin.<br></p>
<p></p>
<p></p>5.1 What is the probability of having a twin brother, given identical twins (enter as a number between 0 and 1 no decimal digits needed)?<br><br>
<p></p>
<p>5.2 What is the total probability of having a twin brother (either fraternal or identical)? (enter as a number between 0 and 1 with 2 decimal digit accuracy)<br></p>
<p>5.3 What is the probability that Elvis was an identical twin, given that he had a twin brother? (enter as a number between 0 and 1 with 2 decimal digit accuracy)<br><br></p>
<p></p>
<h3 class="anchored"><br></h3>
<h3 class="anchored">6. Three Steps of Bayesian Data Analysis<br></h3><br>6.1 Select the three steps of Bayesian data analysis (see BDA3 p. 3):
</div>
<div><br></div>
<div>
<h3 class="anchored">7. A Binomial Model for the Roulette Table</h3>
<p>In this course, models are used to explain social and physical data, and we will be able to generate data from our models which we can use for checking how well our model does. In this example, we show how to generate outcomes from a binomial model to explain outcomes of a roulette game (there is a connection to the history of statistics). Suppose a roulette table with only red and black colours.&nbsp;Roulette tables won't be perfect and it's likely that the probability of red vs black is not exactly 0.5 (the tables can have adjustments that are randomized each day to avoid long term bias).&nbsp;<br></p>
<p>Suppose your model for the tables' ratio of red/black is a Binomial which takes as inputs the number of trials and a probability parameter, theta. Set theta to 0.6 (this is much bigger than what we would expect in real roulette, but makes it easier as a teaching example) and generate a series (for a sequence of 100 equally spaced trial values between 10 and 1000) of red/black ratios. Generate 1000 random draws from your model for each trial value and save the data in a Data frame with columns <code>Ratios</code>, <code>Nsims</code> and <code>Trials</code>. Incomplete code can be found below.</p>
<p></p>
<pre class="r"><code class="hljs"><span class="hljs-comment"># load the tidyverse package for data manipulation and plotting<br>library(tidyverse)<br><br># Ratio of red/black</span>
theta &lt;- <span class="hljs-number"># declare probability parameter for the binomial model</span>

<span class="hljs-comment"># Sequence of trials</span>
trials &lt;- seq(<span class="hljs-number">#start value of sequence</span>,<span class="hljs-number">#end value of sequence</span>,#value for spacing)

<span class="hljs-comment"># Number of simulation draws from the model</span>
nsims &lt;- <span class="hljs-number"># number of of simulations from the binomial model</span>

<span class="hljs-comment"># Helper function for getting the ratios</span>
binom_gen &lt;- <span class="hljs-keyword">function</span>(trials,theta,nsims){
df &lt;- as.data.frame(rbinom(nsims,trials,theta)/trials) |&gt; mutate(nsims = nsims,trials = trials)
colnames(df) &lt;- c(<span class="hljs-string">"Ratios"</span>,<span class="hljs-string">"Nsims"</span>,<span class="hljs-string">"Trials"</span>)
<span class="hljs-keyword">return</span>(df)
}

<span class="hljs-comment"># Create a data frame containing the draws for each number of trials</span>
ratio_60 &lt;- do.call(rbind, lapply(trials, binom_gen, <span class="hljs-number">theta</span>, <span class="hljs-number">nsims</span>)) # lapply applies elements in trials column to binom_gen function, which is then rowbound via do.call</code></pre>
<p></p>
<ul>
<li>7.1&nbsp;Suppose you are unsure whether the code to create the data frame worked. Which of the following functions should you use in order to check on the structure of the dataframe object (assuming <code>df</code> below stands for a generic dataframe object)?<br><br><br></li>
<li><span style=""><span style="">7.2 The structure checks out, but now you want to print the first 5 rows of the dataframe to check whether the values are as expected. Which of the following functions should you use?<br></span><br></span><br></li>
<li><span style=""><span>7.3&nbsp;The quick peek checks also out, but you would be more at ease scrolling all data, perhaps you'll find some interesting patterns. Which of the following actions allows you to scroll through the data in a separate window (for the below, we assume that you have the code loaded in an RStudio session)?<br></span></span></li>
</ul>
<p></p>
<p>
</p>
<p>Now, plot a histogram of the computed ratios for 10, 50 and 1000 trials, using the code below</p>
<pre class="r"><code class="hljs"><span class="hljs-comment"># Plot the Distributions</span>
subset_df &lt;- ratio_60[ratio_60$Trials %<span class="hljs-keyword">in</span>% c(#trial values), ] # Subset your

subset_df60 |&gt; ggplot(aes(Ratios)) +
geom_histogram(position = <span class="hljs-string">"identity"</span> ,bins = <span class="hljs-number">40</span>) +
facet_grid(cols = vars(Trials)) +
ggtitle(<span class="hljs-string">"Ratios for specific trials"</span>)</code></pre>
<p></p>
<p></p>
<ul>
<li>7.4&nbsp;Which histogram below is the correct one for theta = 0.6?<br><br><br></li>
<li><span><span style=""><span style="font-size: 0.9375rem;">7.5&nbsp;What do these distributions refer to?</span><br><br><br></span></span></li>
<li>7.6 Given these histograms, which number of trials gives you the most certainty about the likely red/black ratio for that table?<br><br><br></li>
<li>7.7 Given the draws from the model, give an estimate about the probability p(Ratio&lt;=0.5) for the model with 1000 trials (enter as a number between 0 and 1 with 2 decimal digit accuracy).&nbsp;<br></li>
</ul>
<p>Suppose you are now certain that theta = 0.6, plot the probability density given 1000 trials using the code below.<br></p>
</div>
<pre class="r"><code class="hljs">size = <span class="hljs-comment"># number of trials</span>
prob = <span class="hljs-comment"># probability of success</span>

binom_data &lt;- data.frame(
Success = <span class="hljs-number">0</span>:size,
Probability = dbinom(<span class="hljs-number">0</span>:size, size = size, prob = prob)
)

ggplot(binom_data, aes(x = Success, y = Probability)) +
geom_point() +
geom_line() +
labs(title = <span class="hljs-string">"PMF of Binomial Distribution"</span>, x = <span class="hljs-string">"Number of Successes"</span>, y = <span class="hljs-string">"PDF"</span>)</code></pre>

<p></p>

<div>
<ul>
<li>7.8 Which plot of the PMF is the correct one?<br><br><br></li>
<li><span><span>7.9 How does the PMF plot relate to the histogram of ratios plotted earlier?<br><br><br></span></span></li>
<li><span><span>7.10 Given the PMF for your model, calculate the probability for 1000 trials of observing less or equal to 500 red outcomes using theta = 0.6. Use the <code>pbinom</code> function in R.<br>&nbsp;<br></span></span></li>
</ul>
<p><u><strong>Click to next page</strong></u></p>
<ul><code><code>
</code></code>
</ul><code><code>
</code></code>
</div>



Expand Down
Loading

0 comments on commit 7548a90

Please sign in to comment.