Skip to content

Commit

Permalink
Merge branch 'exercise-solutions'
Browse files Browse the repository at this point in the history
* exercise-solutions:
  Simplify some print statements.
  Attempt to fix links.
  Finalish tweaks to exercises.
  Remove warning header, show some more values.
  Finish up porting exercises.
  Remove remnant of old confidence solution
  More solutions
  Working through exercise notebooks
  Start on exercise notebooks.
  Move Hamilton scores to own csv
  Working out what works for exercise / solution links
  Fix some links
  • Loading branch information
matthew-brett committed Jun 28, 2024
2 parents bb23aeb + 707d8e0 commit 7fa3b4c
Show file tree
Hide file tree
Showing 9 changed files with 929 additions and 288 deletions.
2 changes: 1 addition & 1 deletion notes/general.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ See <https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html>
> *Series*.
* More about histograms, bins, bin-edges.
* f strings by `testing_counts_1`.
* Consider f-strings by `testing_counts_1` (not currently used).
* underscores in integers by `testing_counts_1`.
* `paste`, `paste0`, `seq` by `testing_counts_1`.
* `or` / `|` by testing_counts_2.
Expand Down
21 changes: 10 additions & 11 deletions source/confidence_1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -122,17 +122,16 @@ I hope the discussion below resolves much of the confusion of the topic.

## The logic of confidence intervals

To preview the treatment of confidence intervals presented below: We do
not learn about the reliability of sample estimates of the mean (and
other parameters) by logical inference from any one particular sample to
any one particular universe, because this cannot be done *in principle*
. Instead, we investigate the behavior of various universes in the
neighborhood of the sample, universes whose characteristics are chosen
on the basis of their similarity to the sample. In this way the
estimation of confidence intervals is like all other statistical
inference: One investigates the probabilistic behavior of one or more
hypothesized universes that are implicitly suggested by the sample
evidence but are not logically implied by that evidence.
To preview the treatment of confidence intervals presented below: We do not
learn about the reliability of sample estimates of the mean (and other
parameters) by logical inference from any one particular sample to any one
particular universe, because this cannot be done *in principle*. Instead, we
investigate the behavior of various universes in the neighborhood of the
sample, universes whose characteristics are chosen on the basis of their
similarity to the sample. In this way the estimation of confidence intervals is
like all other statistical inference: One investigates the probabilistic
behavior of one or more hypothesized universes that are implicitly suggested by
the sample evidence but are not logically implied by that evidence.

The examples worked in the following chapter help explain why statistics
is a difficult subject. The procedure required to transit successfully
Expand Down
92 changes: 78 additions & 14 deletions source/confidence_2.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -1192,29 +1192,93 @@ helps one's intuitive understanding.

## Exercises

Solutions for problems may be found in the section titled, "Exercise
Solutions" at the back of this book.
You will find solutions for problems in @sec-exercise-solutions.

### Exercise 1
### Exercise: unemployment percentage {#sec-exr-unemployment-percent}

In a sample of 200 people, 7 percent are found to be unemployed.
Determine a 95 percent confidence interval for the true population
proportion.
::: {.notebook name="unemployment_percent_exercise" title="Unemployment percent exercise"}

### Exercise 2
In a sample of 200 people, 7 percent are found to be unemployed. Determine a 95
percent confidence interval for the true population proportion.

```{python}
import numpy as np
import matplotlib.pyplot as plt
rnd = np.random.default_rng()
# Your code here.
```

```{r}
# Your code here.
```

See {@sec-soln-unemployment-percent}.

### Exercise: battery lifetime {#sec-exr-battery-lifetime}

::: {.notebook name="battery_lifetime_exercise" title="Battery lifetime exercise"}

A sample of 20 batteries is tested, and the average lifetime is 28.85 months.
Establish a 95 percent confidence interval for the true average value. The
sample values (lifetimes in months) are listed below.

30 32 31 28 31 29 29 24 30 31 28 28 32 31 24 23 31 27 27 31
```{python}
import numpy as np
import matplotlib.pyplot as plt
rnd = np.random.default_rng()
lifetimes = np.array([30, 32, 31, 28, 31, 29, 29, 24, 30, 31,
28, 28, 32, 31, 24, 23, 31, 27, 27, 31])
print('Mean is:', np.mean(lifetimes))
```

### Exercise 3
```{r}
lifetimes <- c(30, 32, 31, 28, 31, 29, 29, 24, 30, 31,
28, 28, 32, 31, 24, 23, 31, 27, 27, 31)
Suppose we have 10 measurements of Optical Density on a batch of HIV negative
control:
message('Mean is: ', mean(lifetimes))
```

.02 .026 .023 .017 .022 .019 .018 .018 .017 .022
:::
<!---
End of notebook.
-->

See @sec-soln-battery-lifetime.

### Exercise: optical density {#sec-exr-optical-density}

::: {.notebook name="optical_density_exercise" title="Optical density exercise"}

```{python}
import numpy as np
import matplotlib.pyplot as plt
rnd = np.random.default_rng()
```

Suppose we have 10 measurements of Optical Density on a batch of HIV
negative control samples:

```{python}
density = np.array(
[.02, .026, .023, .017, .022, .019, .018, .018, .017, .022])
```

```{r}
density <- c(.02, .026, .023, .017, .022, .019, .018, .018, .017, .022)
```

Derive a 95 percent confidence interval for the sample mean. Are there
enough measurements to produce a satisfactory answer?

:::
<!---
End of notebook.
-->

Derive a 95 percent confidence interval for the sample mean. Are there enough
measurements to produce a satisfactory answer?
See: @sec-soln-optical-density.
47 changes: 28 additions & 19 deletions source/correlation_causation.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2432,9 +2432,9 @@ a random-number table are independent?

## Exercises

Solutions for problems may be found at @sec-exercise-solutions.
You can find solutions for problems at @sec-exercise-solutions.

### Voter participation {#exr-voter-participation}
### Exercise: voter participation {#sec-exr-voter-participation}

@tbl-voter-participation shows voter participation rates in the various states
in the 1844 presidential election. Should we conclude that there was a negative
Expand All @@ -2457,7 +2457,7 @@ exists?

Here's a notebook to get you started.

::: {.notebook name="voter_participation" title="Voter participation in 1844 election"}
::: {.notebook name="voter_participation_exercise" title="Voter participation in 1844 election"}

::: nb-only
Notebook for voter participation exercise.
Expand Down Expand Up @@ -2485,21 +2485,23 @@ spread <- voter_df$Spread
End of notebook
-->

### Association of runs and strikeouts {#exr-runs-strikeouts}
See: @sec-soln-voter-participation.

We would like to know whether, among major-league baseball players, home
runs (per 500 at-bats) and strikeouts (per 500 at-bat's) are correlated.
We first use the procedure as used above for I.Q. and athletic
ability — multiplying the elements within each pair. (We will later use
a more "sophisticated" measure, the correlation coefficient.)
### Exercise: association of runs and strikeouts {#sec-exr-runs-strikeouts}

We would like to know whether, among major-league baseball players, home runs
(per 500 at-bats) and strikeouts (per 500 at-bat's) are correlated. For this exercise, you should
use the sum-of-products procedure as used above for I.Q. and athletic
ability — multiplying the elements within each pair. The next exercise uses the more
"sophisticated" measure, the correlation coefficient.

The data for 18 randomly-selected players in the 1989 season are as
follows, as they would appear in the first lines of the program.
follows, as they would appear in the first lines of the notebook.

::: {.notebook name="homerun_correlation" title="Homeruns and strikeout correlation"}
::: {.notebook name="homerun_sop_exercise" title="Homeruns and strikeout sum of products."}

::: nb-only
Exercise on relationship of home runs and strikeouts.
Exercise on relationship of home runs and strikeouts, using sum of products.
:::

```{python}
Expand Down Expand Up @@ -2529,22 +2531,27 @@ strikeout <- c(135, 153, 120, 161, 138, 175, 126, 200, 205,
End of notebook.
-->

### Runs, strikeouts, correlation {#exr-runs-strikeout-correlation}
See: @sec-soln-runs-strikeouts.

### Exercise: runs, strikeouts, correlation coefficient {#sec-exr-runs-strikeouts-r}

In the previous example relating strikeouts and home runs, we used the
procedure of multiplying the elements within each pair. Now we use a
more "sophisticated" measure, the correlation coefficient, which is
simply a standardized form of the multiplicands, but sufficiently well
known that we calculate it with a pre-set command.

Exercise: Write a program that uses the correlation coefficient to test
the significance of the association between home runs and strikeouts.
Exercise: Write a program that uses the correlation coefficient to test the
significance of the association between home runs and strikeouts. You can use
the starting notebook for the previous exercise.

### Money and exchange rate {#exr-money-exchange-rate}
See: @sec-soln-runs-strikeouts-r.

### Exercise: money and exchange rate {#sec-exr-money-exchange}

All the other things equal, an increase in a country's money supply is
inflationary and should have a negative impact on the exchange rate for
the country's currency. The data in the following table (@tbl-exchange) were
inflationary and should have a negative impact on the exchange rate for the
country's currency. The data in the following table (@tbl-exchange) were
computed using data from tables in the 1983/1984 *Statistical Yearbook of the
United Nations*. The table shows the first 15 rows.

Expand Down Expand Up @@ -2578,7 +2585,7 @@ expect to get the same result.

Here's a notebook to get you started on part 2:

::: {.notebook name="exchange_rates" title="Exchange rates and money supply"}
::: {.notebook name="exchange_rates_exercise" title="Exchange rates and money supply"}

::: nb-only
Notebook for exercise on exchange rates and money supply.
Expand All @@ -2605,3 +2612,5 @@ money_supply <- exchange_df$money_supply
<!---
End of notebook.
-->

See: @sec-soln-money-exchange.
10 changes: 10 additions & 0 deletions source/data/hamilton.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
patient_no,score_before,score_after
1,1.83,.878
2,.50,.647
3,1.62,.598
4,2.48,2.05
5,1.68,1.06
6,1.88,1.29
7,1.55,1.06
8,3.06,3.14
9,1.3,1.29
Loading

0 comments on commit 7fa3b4c

Please sign in to comment.