Skip to content

Commit

Permalink
Reordering (everything) and typos
Browse files Browse the repository at this point in the history
  • Loading branch information
bhindle committed Jul 26, 2018
1 parent 136c8ae commit 7902ed2
Show file tree
Hide file tree
Showing 20 changed files with 13 additions and 13 deletions.
2 changes: 1 addition & 1 deletion 1_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# (PART) Collecting and using data {-}
# (PART) Collecting and Using Data {-}

2 changes: 1 addition & 1 deletion 3_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# (PART) Simple Parametric Statistics {-}
# (PART) Simple Statistics {-}

2 changes: 1 addition & 1 deletion 4_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# (PART) Categorical Data {-}
# (PART) Regression and ANOVA {-}


File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Introduction {#intro}

The two-sample *t*-tests evaluate whether or not the mean of a numeric variable changes among two groups or experimental conditions. At the beginning of the [Relationships and regression] chapter we pointed out that the different groups/conditions can be encoded by a categorical variable. We pointed out that we could conceptualise these *t*-tests as evaluating a relationship between between the numeric and categorical variable. The obvious question is, what happens if we need to evaluate differences among means of more than two groups? The 'obvious' thing to do might seem to be to test each pair of means using a *t*-test. However this procedure is tedious and, most importantly, statistically flawed.
The two-sample *t*-tests evaluate whether or not the mean of a numeric variable changes among two groups or experimental conditions, which can be encoded by a categorical variable. We pointed out that we could conceptualise these *t*-tests as evaluating a relationship between between the numeric and categorical variable. The obvious question is, what happens if we need to evaluate differences among means of more than two groups? The 'obvious' thing to do might seem to be to test each pair of means using a *t*-test. However this procedure is tedious and, most importantly, statistically flawed.

In this chapter we will introduce an alternative method that allows us to assess the statistical significance of differences among several means at the same time. This method is called **Analysis of Variance** (abbreviated to ANOVA). ANOVA is one of those statistical terms that unfortunately has two slightly different meanings:

Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion 5_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# (PART) Associations and Relationships {-}
# (PART) Doing More with Models {-}


File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion 6_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# (PART) Experimental Design and ANOVA (I) {-}
# (PART) Experimental Design {-}


File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion 7_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# (PART) Experimental Design and ANOVA (II) {-}
# (PART) Beyond Simple Models {-}

2 changes: 1 addition & 1 deletion 8_00.Rmd
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# (PART) Fixing Problems {-}
# (PART) Frequency Data and Non-parametric Tests {-}

Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ However, we sometimes find a situation in which the ‘measurement’ we are int

## A new kind of distribution

There are a quite a few options for dealing with categorical data^[e.g. the 'log-linear model', 'Fisher's exact test', and the 'G-test'.]. We're just going to look at one option in this book: $\chi^2$ tests. This is pronounced, and sometimes written, 'chi-square'. The 'ch' is a hard 'ch', as in 'character'. This isn't necessarily the best approach for every problem, but $\chi^2$ tests are widely used in biology so they are a good place to start.
There are quite a few options for dealing with categorical data^[e.g. the 'log-linear model', 'Fisher's exact test', and the 'G-test'.]. We're just going to look at one option in this book: $\chi^2$ tests. This is pronounced, and sometimes written, 'chi-square'. The 'ch' is a hard 'ch', as in 'character'. This isn't necessarily the best approach for every problem, but $\chi^2$ tests are widely used in biology so they are a good place to start.

```{block, type='do-something'}
It is not critical that you understand everything in this section. This material is here to help those who like to have a sense of how statistical tests work. You won't be assessed on it.
Expand All @@ -18,7 +18,7 @@ The $\chi^2$ tests that we're going to study borrow their name from a particular

1. The $\chi^2$ distribution pops up a lot in statistics. However, in contrast to the normal distribution, it isn't often used to model the distribution of a variable we've sampled (i.e. 'the data'). Instead, the $\chi^2$ distribution is often associated with a test statistic of some kind.

2. The standard $\chi^2$ distribution is completely described by only one parameter, called the degrees of freedom. This is closely related to the degrees of freedom idea introduced in the last few chapters on *t*-tests.
2. The standard $\chi^2$ distribution is completely described by only one parameter, called the degrees of freedom. This is closely related to the degrees of freedom idea introduced in the chapters on *t*-tests.

3. The $\chi^2$ distribution is appropriate for positive-valued numeric variables. Negative values can't be accommodated. This is because the distribution arises whenever we take one or more normally distributed variables, square these, and then add them up.

Expand Down Expand Up @@ -89,7 +89,7 @@ Notice that we are not interested in judging whether the proportion of males, or

### The assumptions and requirements of $\chi^{2}$ tests

It's important to realise that in terms of their assumptions, analysis of a contingency tables and goodness-of-fit tests aren't fundamentally different from one another. The difference between the two types lies in the type of hypothesis evaluated. When we carry out a goodness-of-fit test we have to supply the expected values, whereas the calculation of expected values is embedded in the formula used to carry out a contingency table test. That will make more sense once we've seen the two tests in action.
It's important to realise that in terms of their assumptions, contingency tables and goodness-of-fit tests aren't fundamentally different from one another. The difference between the two types lies in the type of hypothesis evaluated. When we carry out a goodness-of-fit test we have to supply the expected values, whereas the calculation of expected values is embedded in the formula used to carry out a contingency table test. That will make more sense once we've seen the two tests in action.

$\chi^{2}$ tests are often characterised as **non-parametric** tests because they do not assume any particular form for the distribution of the data. In fact, as with any statistical test, there are some assumptions in play, but these are relatively mild:

Expand Down
2 changes: 1 addition & 1 deletion 4_02_chi_sqr_gof.Rmd → 8_02_chi_sqr_gof.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ We want to test whether the ratio of male to female flowers differs significantl

**Step 3.** Compare the $\chi^{2}$ statistic to the theoretical predictions of the $\chi^{2}$ distribution to assess the statistical significance of the difference between observed and expected counts.

The interpretation of this *p*-value in this test is the same as for any other kind of statistical test: it is probability we would see the observed frequencies, or more extreme values, under the null hypothesis.
The interpretation of this *p*-value in this test is the same as for any other kind of statistical test: it is the probability we would see the observed frequencies, or more extreme values, under the null hypothesis.

### Assumptions of the chi-square goodness of fit test

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Let's think about what these kinds of data look like. Here are the biology stude

This is called a two-way contingency table. It is a *two-way* contingency table because it summarises the frequency distribution of two categorical variables at the same time^[This is called their 'joint distribution', in case you were wondering.]. If we had measured three variables we would have ended up with a *three-way* contingency table (e.g. 2 x 2 x 2).

A contingency table takes its name from the fact that it captures the 'contingencies' among the categorical variables: it summarises how the frequencies of one categorical variable are associated with the categories of another. The term association is use here to describe the non-independence of categories among categorical variables. Other terms used to refer to the same idea include 'linkage', 'non-independence', and 'interaction'.
A contingency table takes its name from the fact that it captures the 'contingencies' among the categorical variables: it summarises how the frequencies of one categorical variable are associated with the categories of another. The term association is used here to describe the non-independence of categories among categorical variables. Other terms used to refer to the same idea include 'linkage', 'non-independence', and 'interaction'.

Associations are evident when the proportions of objects in one set of categories (e.g. R1 and R2) depends on a second set of categories (e.g. C1 and C2). Here are two possibilities:

Expand Down
File renamed without changes.

0 comments on commit 7902ed2

Please sign in to comment.