Merge branch 'exercise-solutions'

* exercise-solutions: Simplify some print statements. Attempt to fix links. Finalish tweaks to exercises. Remove warning header, show some more values. Finish up porting exercises. Remove remnant of old confidence solution More solutions Working through exercise notebooks Start on exercise notebooks. Move Hamilton scores to own csv Working out what works for exercise / solution links Fix some links
resampling-stats · Jun 28, 2024 · 7fa3b4c · 7fa3b4c
2 parents bb23aeb + 707d8e0
commit 7fa3b4c
Show file tree

Hide file tree

Showing 9 changed files with 929 additions and 288 deletions.
diff --git a/notes/general.md b/notes/general.md
@@ -172,7 +172,7 @@ See <https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html>
   > *Series*.
 
 * More about histograms, bins, bin-edges.
-* f strings by `testing_counts_1`.
+* Consider f-strings by `testing_counts_1` (not currently used).
 * underscores in integers by `testing_counts_1`.
 * `paste`, `paste0`, `seq` by `testing_counts_1`.
 * `or` / `|` by testing_counts_2.

diff --git a/source/confidence_1.Rmd b/source/confidence_1.Rmd
@@ -122,17 +122,16 @@ I hope the discussion below resolves much of the confusion of the topic.
 
 ## The logic of confidence intervals
 
-To preview the treatment of confidence intervals presented below: We do
-not learn about the reliability of sample estimates of the mean (and
-other parameters) by logical inference from any one particular sample to
-any one particular universe, because this cannot be done *in principle*
-. Instead, we investigate the behavior of various universes in the
-neighborhood of the sample, universes whose characteristics are chosen
-on the basis of their similarity to the sample. In this way the
-estimation of confidence intervals is like all other statistical
-inference: One investigates the probabilistic behavior of one or more
-hypothesized universes that are implicitly suggested by the sample
-evidence but are not logically implied by that evidence.
+To preview the treatment of confidence intervals presented below: We do not
+learn about the reliability of sample estimates of the mean (and other
+parameters) by logical inference from any one particular sample to any one
+particular universe, because this cannot be done *in principle*. Instead, we
+investigate the behavior of various universes in the neighborhood of the
+sample, universes whose characteristics are chosen on the basis of their
+similarity to the sample. In this way the estimation of confidence intervals is
+like all other statistical inference: One investigates the probabilistic
+behavior of one or more hypothesized universes that are implicitly suggested by
+the sample evidence but are not logically implied by that evidence.
 
 The examples worked in the following chapter help explain why statistics
 is a difficult subject. The procedure required to transit successfully

diff --git a/source/confidence_2.Rmd b/source/confidence_2.Rmd
@@ -1192,29 +1192,93 @@ helps one's intuitive understanding.
 
 ## Exercises
 
-Solutions for problems may be found in the section titled, "Exercise
-Solutions" at the back of this book.
+You will find solutions for problems in @sec-exercise-solutions.
 
-### Exercise 1
+### Exercise: unemployment percentage {#sec-exr-unemployment-percent}
 
-In a sample of 200 people, 7 percent are found to be unemployed.
-Determine a 95 percent confidence interval for the true population
-proportion.
+::: {.notebook name="unemployment_percent_exercise" title="Unemployment percent exercise"}
 
-### Exercise 2
+In a sample of 200 people, 7 percent are found to be unemployed. Determine a 95
+percent confidence interval for the true population proportion.
+
+```{python}
+import numpy as np
+import matplotlib.pyplot as plt
+
+rnd = np.random.default_rng()
+
+# Your code here.
+```
+
+```{r}
+# Your code here.
+```
+
+See {@sec-soln-unemployment-percent}.
+
+### Exercise: battery lifetime {#sec-exr-battery-lifetime}
+
+::: {.notebook name="battery_lifetime_exercise" title="Battery lifetime exercise"}
 
 A sample of 20 batteries is tested, and the average lifetime is 28.85 months.
 Establish a 95 percent confidence interval for the true average value. The
 sample values (lifetimes in months) are listed below.
 
-30 32 31 28 31 29 29 24 30 31 28 28 32 31 24 23 31 27 27 31
+```{python}
+import numpy as np
+import matplotlib.pyplot as plt
+
+rnd = np.random.default_rng()
+
+lifetimes = np.array([30, 32, 31, 28, 31, 29, 29, 24, 30, 31,
+                      28, 28, 32, 31, 24, 23, 31, 27, 27, 31])
+
+print('Mean is:', np.mean(lifetimes))
+```
 
-### Exercise 3
+```{r}
+lifetimes <- c(30, 32, 31, 28, 31, 29, 29, 24, 30, 31,
+               28, 28, 32, 31, 24, 23, 31, 27, 27, 31)
 
-Suppose we have 10 measurements of Optical Density on a batch of HIV negative
-control:
+message('Mean is: ', mean(lifetimes))
+```
 
-.02 .026 .023 .017 .022 .019 .018 .018 .017 .022
+:::
+<!---
+End of notebook.
+-->
+
+See @sec-soln-battery-lifetime.
+
+### Exercise: optical density {#sec-exr-optical-density}
+
+::: {.notebook name="optical_density_exercise" title="Optical density exercise"}
+
+```{python}
+import numpy as np
+import matplotlib.pyplot as plt
+
+rnd = np.random.default_rng()
+```
+
+Suppose we have 10 measurements of Optical Density on a batch of HIV
+negative control samples:
+
+```{python}
+density = np.array(
+    [.02, .026, .023, .017, .022, .019, .018, .018, .017, .022])
+```
+
+```{r}
+density <- c(.02, .026, .023, .017, .022, .019, .018, .018, .017, .022)
+```
+
+Derive a 95 percent confidence interval for the sample mean. Are there
+enough measurements to produce a satisfactory answer?
+
+:::
+<!---
+End of notebook.
+-->
 
-Derive a 95 percent confidence interval for the sample mean. Are there enough
-measurements to produce a satisfactory answer?
+See: @sec-soln-optical-density.
diff --git a/source/correlation_causation.Rmd b/source/correlation_causation.Rmd
@@ -2432,9 +2432,9 @@ a random-number table are independent?
 
 ## Exercises
 
-Solutions for problems may be found at @sec-exercise-solutions.
+You can find solutions for problems at @sec-exercise-solutions.
 
-### Voter participation {#exr-voter-participation}
+### Exercise: voter participation {#sec-exr-voter-participation}
 
 @tbl-voter-participation shows voter participation rates in the various states
 in the 1844 presidential election. Should we conclude that there was a negative
@@ -2457,7 +2457,7 @@ exists?
 
 Here's a notebook to get you started.
 
-::: {.notebook name="voter_participation" title="Voter participation in 1844 election"}
+::: {.notebook name="voter_participation_exercise" title="Voter participation in 1844 election"}
 
 ::: nb-only
 Notebook for voter participation exercise.
@@ -2485,21 +2485,23 @@ spread <- voter_df$Spread
 End of notebook
 -->
 
-### Association of runs and strikeouts {#exr-runs-strikeouts}
+See: @sec-soln-voter-participation.
 
-We would like to know whether, among major-league baseball players, home
-runs (per 500 at-bats) and strikeouts (per 500 at-bat's) are correlated.
-We first use the procedure as used above for I.Q. and athletic
-ability — multiplying the elements within each pair. (We will later use
-a more "sophisticated" measure, the correlation coefficient.)
+### Exercise: association of runs and strikeouts {#sec-exr-runs-strikeouts}
+
+We would like to know whether, among major-league baseball players, home runs
+(per 500 at-bats) and strikeouts (per 500 at-bat's) are correlated. For this exercise, you should
+use the sum-of-products procedure as used above for I.Q. and athletic
+ability — multiplying the elements within each pair. The next exercise uses the more
+"sophisticated" measure, the correlation coefficient.
 
 The data for 18 randomly-selected players in the 1989 season are as
-follows, as they would appear in the first lines of the program.
+follows, as they would appear in the first lines of the notebook.
 
-::: {.notebook name="homerun_correlation" title="Homeruns and strikeout correlation"}
+::: {.notebook name="homerun_sop_exercise" title="Homeruns and strikeout sum of products."}
 
 ::: nb-only
-Exercise on relationship of home runs and strikeouts.
+Exercise on relationship of home runs and strikeouts, using sum of products.
 :::
 
 ```{python}
@@ -2529,22 +2531,27 @@ strikeout <- c(135, 153, 120, 161, 138, 175, 126, 200, 205,
 End of notebook.
 -->
 
-### Runs, strikeouts, correlation {#exr-runs-strikeout-correlation}
+See: @sec-soln-runs-strikeouts.
+
+### Exercise: runs, strikeouts, correlation coefficient {#sec-exr-runs-strikeouts-r}
 
 In the previous example relating strikeouts and home runs, we used the
 procedure of multiplying the elements within each pair. Now we use a
 more "sophisticated" measure, the correlation coefficient, which is
 simply a standardized form of the multiplicands, but sufficiently well
 known that we calculate it with a pre-set command.
 
-Exercise: Write a program that uses the correlation coefficient to test
-the significance of the association between home runs and strikeouts.
+Exercise: Write a program that uses the correlation coefficient to test the
+significance of the association between home runs and strikeouts.  You can use
+the starting notebook for the previous exercise.
 
-### Money and exchange rate {#exr-money-exchange-rate}
+See: @sec-soln-runs-strikeouts-r.
+
+### Exercise: money and exchange rate {#sec-exr-money-exchange}
 
 All the other things equal, an increase in a country's money supply is
-inflationary and should have a negative impact on the exchange rate for
-the country's currency. The data in the following table (@tbl-exchange) were
+inflationary and should have a negative impact on the exchange rate for the
+country's currency. The data in the following table (@tbl-exchange) were
 computed using data from tables in the 1983/1984 *Statistical Yearbook of the
 United Nations*.  The table shows the first 15 rows.
 
@@ -2578,7 +2585,7 @@ expect to get the same result.
 
 Here's a notebook to get you started on part 2:
 
-::: {.notebook name="exchange_rates" title="Exchange rates and money supply"}
+::: {.notebook name="exchange_rates_exercise" title="Exchange rates and money supply"}
 
 ::: nb-only
 Notebook for exercise on exchange rates and money supply.
@@ -2605,3 +2612,5 @@ money_supply <- exchange_df$money_supply
 <!---
 End of notebook.
 -->
+
+See: @sec-soln-money-exchange.
diff --git a/source/data/hamilton.csv b/source/data/hamilton.csv
@@ -0,0 +1,10 @@
+patient_no,score_before,score_after
+1,1.83,.878
+2,.50,.647
+3,1.62,.598
+4,2.48,2.05
+5,1.68,1.06
+6,1.88,1.29
+7,1.55,1.06
+8,3.06,3.14
+9,1.3,1.29