Skip to content

Commit

Permalink
fixing models
Browse files Browse the repository at this point in the history
  • Loading branch information
dzchilds committed Nov 19, 2017
1 parent f575c33 commit fd9238b
Show file tree
Hide file tree
Showing 4 changed files with 478 additions and 2 deletions.
121 changes: 121 additions & 0 deletions _exercises/fixing_problems.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: "Transformations and non-parametric tests"
output:
html_document:
css: ../extras.css
theme: cerulean
highlight: tango
---

```{r, include = FALSE}
library(dplyr)
library(ggplot2)
```

You should work through the exercises step-by-step, following the instructions carefully. At various points we will interrupt the flow of instructions with a question. Make a note of your answers so that you can complete the MOLE quiz for this week.

### Fungal pathogen infection on leaves

No data are provided for this exercise. Spores of a particular fungal pathogen infect leaves of a tree wherever the spores happen to land and the subsequent development of the fungus causes a single distinct 'pustule' on the leaf at each infection site (typically <20 pustules are found on each leaf). Imagine you have data from a study comparing the intensity of infection between canopy and sub-canopy leaves.

```{block, type='do-something'}
**MOLE question**
What sort of transformation might be appropriate for these data?
```

### Pollution sensitive stoneflies - what’s going into the river?

The data for this exercise are in STONEFLY.CSV. Counts of the abundances of stonefly nymphs (which are generally intolerant of organic pollution) at three sites are stored in the `Stonefly` variable. The `Site` variable has three values ('Above', 'Adjacent' and 'Downstream') which index the three study site: immediately above ('Above'), adjacent to ('Adjacent'), and 0.5 km downstream ('Downstream') of a discharge point for a storm drain.

Read these data into R and examine them to evaluate whether they are suitable for using one way ANOVA to test for differences in abundance at the three sites. Hint: fit the appropriate model with `lm` and then construct the regression diagnostic plots using `plot`.

Suggest a transformation that may help. Carry out the transformation to see whether it has the desired effect.

```{block, type='do-something'}
**MOLE question**
What do you recommend and why?
What do you learn from the diagnostics derived from the ANOVA with the transformed data?
```

### Ants again

The data for this exercise are in ANTS2.CSV. These data describe ant foraging on sycamores and oaks. The number of lepidopteran caterpillars observed as prey items in ants foraging is recorded in the `Caterpillars` variable. The total number of prey items being carried during the observation period (1h) are in the `Total` variable. The `Tree` variable has two values ('Sycamore' and 'Oak') that index the tree type.

Calculate the number of lepidopteran larvae taken as a proportion of all prey (do this with `mutate`). Carry out a parametric test to determine whether caterpillars constitute a significantly higher proportion of the diet in oak than sycamore.

```{block, type='do-something'}
**MOLE question**
Is a transformation appropriate? If so, which transformation should you use?
```

```{block, type='do-something'}
**MOLE question**
What test is required?
```

```{block, type='do-something'}
**MOLE question**
What do you conclude from the test?
```

### Bryophyte diversity in a woodland

The data for this exercise are in BRYOPHYTE.CSV. As part of a survey of bryophyte communities in two areas of woodland with differing canopy species data of species diversity was recorded by randomly positioning quadrats (1m by 1m) and recording the species found in each quadrat. As part of the analysis, of the data, the surveyor wants to determine whether the species diversity (no. of spp. per quadrat) is different between the two sites. The `Site` variable indexes the site (1 or 2) and the `Bryophyte` variable contains the species diversity.

Examine these data using the `View` function and prepare a plot to visualise the distribution of Bryophyte diversity in each area of the woodland. The data are not suitable for analysis with a *t*-test.

```{block, type='do-something'}
**MOLE question**
Is there a transformation that would help?
```

### Reporting the results of non-parametric tests

```{block, type='do-something'}
**MOLE question**
You might sometimes see a statements such as:
> The means were significantly different (Mann-Whitney *U*-test: U=43, n~1~=14, n~2~=14, *p*<0.05).
What is wrong with this?
```

### Copper tolerance in *Agrostis*

A study was carried out to examine how quickly copper tolerance is acquired in the grass *Agrostis stolonifera* growing on copper contaminated soil. Plants from two lawns, planted 8 years and 14 years ago, around the buildings of a copper refinery, were tested for copper tolerance by growing them in a standard liquid culture medium with elevated levels of copper. Root extension (in mm) was measured for each plant over a 14 day period.

The data are in the file LAWNS.CSV. There are two variables: `Roots` contains the measured root extension and `Lawn` identifies the two groups (years of exposure). Read these data into R, calling the data frame `copperlawn`. Examine the data with `View`.

Have a look at the distributions of the data (using histograms, dot plots, or whatever method you think best).

With 10 and 15 values it is, as always, hard to tell whether or not the data are drawn from a normally distributed population, although they don’t look particularly normal. However, consideration of the nature of the data might also lead us to be cautious. Copper contamination may be patchy in the lawn, so there may be a mixture of more and less tolerant individuals, and depending on the nature of the genetic control of tolerance, it may have a distribution that is not clearly unimodal. In this case, it doesn’t look as though a transformation is obviously going to help, and although we might be prepared to risk a parametric test, a non-parametric test is safer.

Use an appropriate non-parametric test to evaluate whether root growth, in culture solution, differs between plants from the two lawns.

```{block, type='do-something'}
**MOLE question**
Summarise the conclusion from the test and think about what the results suggest.
```

### Measuring seed dispersal

An investigator was interested in the dispersal abilities of a number of plant species which reinvade disturbed ground by means of windborne seed. To try and measure the seed influx they put out a tray of sterilised potting soil at each of 10 locations around a newly disturbed site. Each week for 11 weeks they remove the trays and replace them with new ones. The collected trays are covered and brought into a glasshouse where any seeds they contain are allowed to germinate. From this procedure they know for each plant species the week (1-11) when it first appeared at each location - a value of 12 is given to any species that didn’t arrive at a location by the end of the experiment. You can use these data to test whether, for the four plant species studied, there is any significant difference in dispersal rates between species.

The data are in the file DISPERSAL.CSV. There are two variables: `Week` contains the arrival week and `Species` identifies the four species ('A' - 'D'). Read these data into R, examine them with `View`, and make an informative plot.

Once you understand the data, use an appropriate non-parametric test to evaluate whether the four species differ significantly in dispersal ability (at least as measured by speed of colonisation).

```{block, type='do-something'}
**MOLE question**
Write a statistically supported conclusion from the test:
```
260 changes: 260 additions & 0 deletions _exercises/fixing_problems.html

Large diffs are not rendered by default.

55 changes: 54 additions & 1 deletion data_csv/ANTS2.CSV
Original file line number Diff line number Diff line change
@@ -1 +1,54 @@
Tree,Caterpillars,TotalSycamore,3,30Sycamore,16,56Sycamore,6,45Sycamore,12,61Sycamore,7,54Sycamore,7,43Sycamore,11,78Sycamore,10,68Sycamore,4,34Sycamore,5,40Sycamore,9,39Sycamore,3,25Sycamore,3,41Sycamore,5,32Sycamore,8,46Sycamore,7,37Sycamore,3,30Sycamore,8,59Sycamore,13,56Sycamore,6,42Sycamore,5,34Sycamore,6,28Sycamore,6,57Sycamore,6,47Sycamore,9,70Sycamore,9,36Oak,5,34Oak,5,27Oak,5,48Oak,7,29Oak,6,25Oak,8,35Oak,13,60Oak,7,47Oak,13,56Oak,12,73Oak,6,35Oak,6,24Oak,15,37Oak,8,39Oak,6,32Oak,7,45Oak,23,71Oak,4,21Oak,14,57Oak,7,40Oak,12,63Oak,5,30Oak,9,45Oak,11,34Oak,7,56Oak,21,68Oak,10,41
Tree,Caterpillars,Total
Sycamore,1,30
Sycamore,4,56
Sycamore,4,45
Sycamore,5,61
Sycamore,7,54
Sycamore,7,43
Sycamore,8,78
Sycamore,4,68
Sycamore,3,34
Sycamore,2,40
Sycamore,3,39
Sycamore,1,25
Sycamore,2,41
Sycamore,1,32
Sycamore,4,46
Sycamore,3,37
Sycamore,6,30
Sycamore,8,59
Sycamore,2,56
Sycamore,2,42
Sycamore,7,34
Sycamore,3,28
Sycamore,5,57
Sycamore,1,47
Sycamore,5,70
Sycamore,1,36
Oak,14,34
Oak,5,27
Oak,11,48
Oak,8,29
Oak,8,25
Oak,11,35
Oak,12,60
Oak,14,47
Oak,14,56
Oak,14,73
Oak,9,35
Oak,2,24
Oak,11,37
Oak,8,39
Oak,7,32
Oak,12,45
Oak,17,71
Oak,8,21
Oak,16,57
Oak,8,40
Oak,16,63
Oak,6,30
Oak,8,45
Oak,7,34
Oak,17,56
Oak,18,68
Oak,6,41
44 changes: 43 additions & 1 deletion data_csv/STONEFLY.CSV
Original file line number Diff line number Diff line change
@@ -1 +1,43 @@
Stonefly,Site47,Above115,Above15,Above18,Above58,Above72,Above12,Above101,Above66,Above25,Above36,Above14,Above47,Above33,Above3,Adjacent25,Adjacent82,Adjacent10,Adjacent21,Adjacent3,Adjacent36,Adjacent47,Adjacent12,Adjacent79,Adjacent11,Adjacent31,Adjacent67,Adjacent58,Adjacent14,Downstream8,Downstream0,Downstream0,Downstream3,Downstream12,Downstream5,Downstream0,Downstream10,Downstream17,Downstream2,Downstream24,Downstream7,Downstream1,Downstream
Stonefly,Site
165,Above
151,Above
25,Above
44,Above
58,Above
99,Above
8,Above
109,Above
66,Above
25,Above
36,Above
19,Above
47,Above
33,Above
5,Adjacent
25,Adjacent
82,Adjacent
10,Adjacent
21,Adjacent
2,Adjacent
36,Adjacent
47,Adjacent
12,Adjacent
79,Adjacent
11,Adjacent
31,Adjacent
67,Adjacent
58,Adjacent
18,Downstream
8,Downstream
0,Downstream
0,Downstream
3,Downstream
12,Downstream
5,Downstream
0,Downstream
10,Downstream
17,Downstream
2,Downstream
26,Downstream
7,Downstream
1,Downstream

0 comments on commit fd9238b

Please sign in to comment.