jgscott · jaredsmurray · Sep 5, 2017 · Sep 5, 2017 · Dec 28, 2019 · Dec 28, 2019
diff --git a/citytemps/citytemps.md b/citytemps/citytemps.md
@@ -1,7 +1,11 @@
+
+### Learning Objectives
+
 In this walk-through, you'll learn how to measure and visualize
 dispersion of a single quantitative variable. You will also learn how to
 change some of the default plot settings in R, like changing the axis
 labels or the number of breaks in a histogram.
+----------
 
 Data files:  
 \*
@@ -99,11 +103,19 @@ distribution: the standard deviation.
 
     ## [1] 5.698457
 
-Another measure of dispersion is the coverage interval: that is, an
+Another measure of dispersion is the *coverage* or *prediction* interval: that is, an
 interval covering a specified fraction of the observations. For example,
 to get a central 50% coverage interval, we'd need the 25th and 75
 percentiles of the distribution. By definition, 50% of the observations
-are between these two numbers. You can get these from the `qdata`
+are between these two numbers. So if we were to repeatedly sample single observations
+from this dataset completely at random, about 50% of the time they would fall into this interval
+by construction. This usually isn't so useful by itself, but if we think about trying to predict the 
+temperature on some random day in the future, we might expect the temperature on that future
+day to lie in the same interval with probability 0.50. That's why these kinds of intervals are 
+most commonly called *prediction* rather than *coverage* intervals, since they're trying
+to bracket the value of a future data point. 
+
+You can get these from the `qdata`
 function.
 
     qdata(citytemps$Temp.SanDiego)
@@ -184,7 +196,7 @@ San Diego is actually more extreme than a 10-degree day in Rapid City!
 As this example suggests, z-scores are useful for comparing numbers that
 come from different distributions, with different statistical
 properties. It tells you how extreme a number is, relative to other
-numbers from that some distribution.
+numbers from that same distribution.
 
 ### Fancier histograms
 

diff --git a/gonefishing/gonefishing.md b/gonefishing/gonefishing.md
@@ -8,7 +8,7 @@ Sampling distributions
 In this walk-through, you'll learn about sampling distributions.
 
 Data files:  
-\* [gonefishing.csv](gonefishing.csv): fictional data on fictional fish
+\* [gonefishing.csv](https://raw.githubusercontent.com/jaredsmurray/learnR/master/gonefishing/gonefishing.csv): fictional data on fictional fish
 in a fictional lake.
 
 As usual, load the mosaic library.

diff --git a/heights/files/import_options_new.png b/heights/files/import_options_new.png
diff --git a/heights/heights.md b/heights/heights.md
@@ -57,17 +57,17 @@ Read in the heights.csv data set by clicking the Import Dataset button in RStudi
 
 ![](files/import_dataset.png)
 
-When you click Import Dataset, choose the "From Text File..." option, and in the window that pops up, surf to wherever you've downloaded the heights.csv file.
+When you click Import Dataset, choose the "From CSV File..." option, and in the window that pops up, surf to wherever you've downloaded the heights.csv file.
 
 ![](files/import_file_window.png)
 
 Select the heights.csv file and open it from this window.  Now you should see a new window pop up, like this:
 
-![](files/import_options.png)
+![](files/import_options_new.png)
 
 Three common things that you'll want to double-check in this window:  
 - What do you want the data set to be called within the R environment?  By default, RStudio will name the data set after the file, so in this case the imported data frame will be stored as ``heights'' unless you provide an alternative in the "Name" field.  
-- Does the data file have a header row (i.e. is the first row the names of the variables)?  If so, make sure the "Yes" button next to "Heading" is selection.  In this case, we do have a header row providing the variable names (SHGT, MHGT, and FHGT).
+- Does the data file have a header row (i.e. is the first row the names of the variables)?  If so, make sure the "First Row as Names" option is checked. In this case, we do have a header row providing the variable names (SHGT, MHGT, and FHGT).
 - What separates the data fields?  Comma-separated files (like this one) are common; so are tab-separated files.  
 
 Usually RStudio does a good job at auto-detecting these features of the file.  But sometimes it can get tripped up, so it's good to verify what the program thinks it is seeing in this window.  

diff --git a/sat/sat.md b/sat/sat.md
@@ -5,6 +5,7 @@ layout: page
 Test scores and GPA for UT graduates
 ------------------------------------
 
+### Learning Objectives
 In this walk-through, you'll learn how to summarize and visualize the
 following kinds of relationships:  
 - between a numerical variable and a categorical variable, via
@@ -16,6 +17,8 @@ coefficients.
 You will also learn how to change more of the default plot settings in R
 plots.
 
+------------------------------------
+
 You'll need this data file:  
 \* [ut2000.csv](http://jgscott.github.io/teaching/data/ut2000.csv): data
 on SAT scores and graduating GPA for every student who entered the

diff --git a/titanic/titanic_permtest.md b/titanic/titanic_permtest.md
@@ -5,7 +5,7 @@ permutation tests in the context of a 2x2 contingency table.
 
 Data files:  
 \*
-[TitanicSurvival.csv](http://jgscott.github.io/teaching/data/TitanicSurvival.csv)
+[TitanicSurvival.csv](https://github.com/jgscott/ECO394D/raw/master/data/TitanicSurvival.csv) (right click the link and use "Save As")
 
 First download the TitanicSurvival.csv file and read it in. You can use
 RStudio's Import Dataset button, or the read.csv command:
@@ -28,8 +28,8 @@ they survived, along with their age, sex, and cabin class.
 
 ### Relative risk in 2x2 tables
 
-One of the very first contingency tables we made looked at survival
-status stratified by sex:
+We can use the `xtabs` (short for crosstabulations) and `prop.table` (to compute proportions/frequencies from a 
+table of counts) commands to compute the survival probability for men and women:
 
     t1 = xtabs(~sex + survived, data=TitanicSurvival)
     prop.table(t1, margin=1)
@@ -38,8 +38,12 @@ status stratified by sex:
     ## sex             no       yes
     ##   female 0.2725322 0.7274678
     ##   male   0.8090154 0.1909846
+
+Without the `margin=1` command, `prop.table` would compute a table
+of joint proababilities. Adding that command computes conditional probabilities
+of survival for men and women.
 
-This seems to suggest a strong association between survival status and
+The data seem to suggest a strong association between survival status and
 sex. A natural *test statistic* to quantify this association between the
 rows and columns of this table is the [relative
 risk](http://en.wikipedia.org/wiki/Relative_risk) of dying: that is, the
@@ -242,6 +246,6 @@ Again, it is zero, up to Monte Carlo accuracy.
 There are advantages and disadvantages to chi-square as a test
 statistic. The relative risk is certainly a lot easier to understand and
 interpret, especially for non-experts. On the other hand, relative risk
-only makes sense 2x2 tables, while the chi-squared statistic generalizes
+only makes sense in 2x2 tables, while the chi-squared statistic generalizes
 quite readily to tables with more than two rows or more than two
 columns.