forked from dzchilds/stats-for-bio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
478 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,121 @@ | ||
--- | ||
title: "Transformations and non-parametric tests" | ||
output: | ||
html_document: | ||
css: ../extras.css | ||
theme: cerulean | ||
highlight: tango | ||
--- | ||
|
||
```{r, include = FALSE} | ||
library(dplyr) | ||
library(ggplot2) | ||
``` | ||
|
||
You should work through the exercises step-by-step, following the instructions carefully. At various points we will interrupt the flow of instructions with a question. Make a note of your answers so that you can complete the MOLE quiz for this week. | ||
|
||
### Fungal pathogen infection on leaves | ||
|
||
No data are provided for this exercise. Spores of a particular fungal pathogen infect leaves of a tree wherever the spores happen to land and the subsequent development of the fungus causes a single distinct 'pustule' on the leaf at each infection site (typically <20 pustules are found on each leaf). Imagine you have data from a study comparing the intensity of infection between canopy and sub-canopy leaves. | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
What sort of transformation might be appropriate for these data? | ||
``` | ||
|
||
### Pollution sensitive stoneflies - what’s going into the river? | ||
|
||
The data for this exercise are in STONEFLY.CSV. Counts of the abundances of stonefly nymphs (which are generally intolerant of organic pollution) at three sites are stored in the `Stonefly` variable. The `Site` variable has three values ('Above', 'Adjacent' and 'Downstream') which index the three study site: immediately above ('Above'), adjacent to ('Adjacent'), and 0.5 km downstream ('Downstream') of a discharge point for a storm drain. | ||
|
||
Read these data into R and examine them to evaluate whether they are suitable for using one way ANOVA to test for differences in abundance at the three sites. Hint: fit the appropriate model with `lm` and then construct the regression diagnostic plots using `plot`. | ||
|
||
Suggest a transformation that may help. Carry out the transformation to see whether it has the desired effect. | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
What do you recommend and why? | ||
What do you learn from the diagnostics derived from the ANOVA with the transformed data? | ||
``` | ||
|
||
### Ants again | ||
|
||
The data for this exercise are in ANTS2.CSV. These data describe ant foraging on sycamores and oaks. The number of lepidopteran caterpillars observed as prey items in ants foraging is recorded in the `Caterpillars` variable. The total number of prey items being carried during the observation period (1h) are in the `Total` variable. The `Tree` variable has two values ('Sycamore' and 'Oak') that index the tree type. | ||
|
||
Calculate the number of lepidopteran larvae taken as a proportion of all prey (do this with `mutate`). Carry out a parametric test to determine whether caterpillars constitute a significantly higher proportion of the diet in oak than sycamore. | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
Is a transformation appropriate? If so, which transformation should you use? | ||
``` | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
What test is required? | ||
``` | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
What do you conclude from the test? | ||
``` | ||
|
||
### Bryophyte diversity in a woodland | ||
|
||
The data for this exercise are in BRYOPHYTE.CSV. As part of a survey of bryophyte communities in two areas of woodland with differing canopy species data of species diversity was recorded by randomly positioning quadrats (1m by 1m) and recording the species found in each quadrat. As part of the analysis, of the data, the surveyor wants to determine whether the species diversity (no. of spp. per quadrat) is different between the two sites. The `Site` variable indexes the site (1 or 2) and the `Bryophyte` variable contains the species diversity. | ||
|
||
Examine these data using the `View` function and prepare a plot to visualise the distribution of Bryophyte diversity in each area of the woodland. The data are not suitable for analysis with a *t*-test. | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
Is there a transformation that would help? | ||
``` | ||
|
||
### Reporting the results of non-parametric tests | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
You might sometimes see a statements such as: | ||
> The means were significantly different (Mann-Whitney *U*-test: U=43, n~1~=14, n~2~=14, *p*<0.05). | ||
What is wrong with this? | ||
``` | ||
|
||
### Copper tolerance in *Agrostis* | ||
|
||
A study was carried out to examine how quickly copper tolerance is acquired in the grass *Agrostis stolonifera* growing on copper contaminated soil. Plants from two lawns, planted 8 years and 14 years ago, around the buildings of a copper refinery, were tested for copper tolerance by growing them in a standard liquid culture medium with elevated levels of copper. Root extension (in mm) was measured for each plant over a 14 day period. | ||
|
||
The data are in the file LAWNS.CSV. There are two variables: `Roots` contains the measured root extension and `Lawn` identifies the two groups (years of exposure). Read these data into R, calling the data frame `copperlawn`. Examine the data with `View`. | ||
|
||
Have a look at the distributions of the data (using histograms, dot plots, or whatever method you think best). | ||
|
||
With 10 and 15 values it is, as always, hard to tell whether or not the data are drawn from a normally distributed population, although they don’t look particularly normal. However, consideration of the nature of the data might also lead us to be cautious. Copper contamination may be patchy in the lawn, so there may be a mixture of more and less tolerant individuals, and depending on the nature of the genetic control of tolerance, it may have a distribution that is not clearly unimodal. In this case, it doesn’t look as though a transformation is obviously going to help, and although we might be prepared to risk a parametric test, a non-parametric test is safer. | ||
|
||
Use an appropriate non-parametric test to evaluate whether root growth, in culture solution, differs between plants from the two lawns. | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
Summarise the conclusion from the test and think about what the results suggest. | ||
``` | ||
|
||
### Measuring seed dispersal | ||
|
||
An investigator was interested in the dispersal abilities of a number of plant species which reinvade disturbed ground by means of windborne seed. To try and measure the seed influx they put out a tray of sterilised potting soil at each of 10 locations around a newly disturbed site. Each week for 11 weeks they remove the trays and replace them with new ones. The collected trays are covered and brought into a glasshouse where any seeds they contain are allowed to germinate. From this procedure they know for each plant species the week (1-11) when it first appeared at each location - a value of 12 is given to any species that didn’t arrive at a location by the end of the experiment. You can use these data to test whether, for the four plant species studied, there is any significant difference in dispersal rates between species. | ||
|
||
The data are in the file DISPERSAL.CSV. There are two variables: `Week` contains the arrival week and `Species` identifies the four species ('A' - 'D'). Read these data into R, examine them with `View`, and make an informative plot. | ||
|
||
Once you understand the data, use an appropriate non-parametric test to evaluate whether the four species differ significantly in dispersal ability (at least as measured by speed of colonisation). | ||
|
||
```{block, type='do-something'} | ||
**MOLE question** | ||
Write a statistically supported conclusion from the test: | ||
``` |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,54 @@ | ||
Tree,Caterpillars,TotalSycamore,3,30Sycamore,16,56Sycamore,6,45Sycamore,12,61Sycamore,7,54Sycamore,7,43Sycamore,11,78Sycamore,10,68Sycamore,4,34Sycamore,5,40Sycamore,9,39Sycamore,3,25Sycamore,3,41Sycamore,5,32Sycamore,8,46Sycamore,7,37Sycamore,3,30Sycamore,8,59Sycamore,13,56Sycamore,6,42Sycamore,5,34Sycamore,6,28Sycamore,6,57Sycamore,6,47Sycamore,9,70Sycamore,9,36Oak,5,34Oak,5,27Oak,5,48Oak,7,29Oak,6,25Oak,8,35Oak,13,60Oak,7,47Oak,13,56Oak,12,73Oak,6,35Oak,6,24Oak,15,37Oak,8,39Oak,6,32Oak,7,45Oak,23,71Oak,4,21Oak,14,57Oak,7,40Oak,12,63Oak,5,30Oak,9,45Oak,11,34Oak,7,56Oak,21,68Oak,10,41 | ||
Tree,Caterpillars,Total | ||
Sycamore,1,30 | ||
Sycamore,4,56 | ||
Sycamore,4,45 | ||
Sycamore,5,61 | ||
Sycamore,7,54 | ||
Sycamore,7,43 | ||
Sycamore,8,78 | ||
Sycamore,4,68 | ||
Sycamore,3,34 | ||
Sycamore,2,40 | ||
Sycamore,3,39 | ||
Sycamore,1,25 | ||
Sycamore,2,41 | ||
Sycamore,1,32 | ||
Sycamore,4,46 | ||
Sycamore,3,37 | ||
Sycamore,6,30 | ||
Sycamore,8,59 | ||
Sycamore,2,56 | ||
Sycamore,2,42 | ||
Sycamore,7,34 | ||
Sycamore,3,28 | ||
Sycamore,5,57 | ||
Sycamore,1,47 | ||
Sycamore,5,70 | ||
Sycamore,1,36 | ||
Oak,14,34 | ||
Oak,5,27 | ||
Oak,11,48 | ||
Oak,8,29 | ||
Oak,8,25 | ||
Oak,11,35 | ||
Oak,12,60 | ||
Oak,14,47 | ||
Oak,14,56 | ||
Oak,14,73 | ||
Oak,9,35 | ||
Oak,2,24 | ||
Oak,11,37 | ||
Oak,8,39 | ||
Oak,7,32 | ||
Oak,12,45 | ||
Oak,17,71 | ||
Oak,8,21 | ||
Oak,16,57 | ||
Oak,8,40 | ||
Oak,16,63 | ||
Oak,6,30 | ||
Oak,8,45 | ||
Oak,7,34 | ||
Oak,17,56 | ||
Oak,18,68 | ||
Oak,6,41 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,43 @@ | ||
Stonefly,Site47,Above115,Above15,Above18,Above58,Above72,Above12,Above101,Above66,Above25,Above36,Above14,Above47,Above33,Above3,Adjacent25,Adjacent82,Adjacent10,Adjacent21,Adjacent3,Adjacent36,Adjacent47,Adjacent12,Adjacent79,Adjacent11,Adjacent31,Adjacent67,Adjacent58,Adjacent14,Downstream8,Downstream0,Downstream0,Downstream3,Downstream12,Downstream5,Downstream0,Downstream10,Downstream17,Downstream2,Downstream24,Downstream7,Downstream1,Downstream | ||
Stonefly,Site | ||
165,Above | ||
151,Above | ||
25,Above | ||
44,Above | ||
58,Above | ||
99,Above | ||
8,Above | ||
109,Above | ||
66,Above | ||
25,Above | ||
36,Above | ||
19,Above | ||
47,Above | ||
33,Above | ||
5,Adjacent | ||
25,Adjacent | ||
82,Adjacent | ||
10,Adjacent | ||
21,Adjacent | ||
2,Adjacent | ||
36,Adjacent | ||
47,Adjacent | ||
12,Adjacent | ||
79,Adjacent | ||
11,Adjacent | ||
31,Adjacent | ||
67,Adjacent | ||
58,Adjacent | ||
18,Downstream | ||
8,Downstream | ||
0,Downstream | ||
0,Downstream | ||
3,Downstream | ||
12,Downstream | ||
5,Downstream | ||
0,Downstream | ||
10,Downstream | ||
17,Downstream | ||
2,Downstream | ||
26,Downstream | ||
7,Downstream | ||
1,Downstream |