Skip to content

Writing download functions

Will Pearse edited this page Jun 23, 2017 · 2 revisions

Are you ready to write some functions?!1?

After writing a few of these download functions (in the downloads.R file), I've compiled a list of some notes and helpful tips to make the experience all the more pleasant. Any additional tips or tricks are welcome!

Getting started

  • After opening up R, there are a few things you need to load in before running any of the downloads.R code:

    • library(reshape2)
    • library(devtools)
    • install_github("willpearse/fulltext")
    • library(fulltext)
    • source('/path/to/nacdb/R/utility.R')
  • Once you get settled, find a paper to get data from and download to your computer. Then you can open the file in R to have a look at it:

      data <- read.delim("~/Desktop/PanTHERIA_1-0_WR05_Aug2008.txt")
    
  • Look through the metadata and begin to figure out which columns are useful, what the units are, etc.

  • You can use names(data) to pull out just the names for each column, which can make it easier to extract just the ones you'd like to keep.

  • Make sure that any meaningless info is removed, and that NAs are in place where data is absent.

There are two kinds of wrapper functions for you to use

The first is .matrix.melt, and the second is .df.melt. You can use either - what matters is you use the one that simplest for the kind of data you have when you download it from the website.

You cannot write a download function without using one of these options

How to use .df.melt

.df.melt turns your downloaded data into a format that nacdb can work with. .df.melt takes five arguments, only three of which are required:

  • species - a vector with all the species that were found in the study
  • sites - a vector with all the sites that were found in the study
  • value - the abundances or presence/absence information for all the observations in the study
  • species.metadata - (optional, but recommended!) a data.frame containing the meta-data for all the species in the study
  • sites.metadata - (optional, but recommended!) a data.frame containing the meta-data for all the sites in the study

An example:

.adler.2007 <- function(...){
    data <- read.csv(ft_get_si("E088-161", "allrecords.csv", from = "esa_archives"))
    site <- sapply(strsplit(data$plotyear, "-"), function(x) x[1])
    year <- sapply(strsplit(data$plotyear, "-"), function(x) x[2])
    return(.df.melt(data$species, site, data$area, site.metadata=data.frame(year=year))
}

Here we've written a function that downloads data from a paper whose first author was Adler (it was in 2007). We grab the data from the ESA Archives paper associated with it (whose ID is E088-161), and then we split out the site IDs from the year in which each plot was surveyed. We then give this information to .df.melt, making sure that our site.metadata is stored in a data.frame. The tricky party about this is finding the data and figuring out the format it's in: using df.melt is, depressingly, the (comparatively) easy part.

How to use .matrix.melt

Sometimes, your data will come in a different format: a matrix where sites are rows, and species are columns. In that case, you can use .matrix.melt, which takes three arguments, only one of which is needed:

  • x - a matrix where species' are in columns, and sites in rows, and the elements of the matrix are the abundances or presence/absences of species at each site
  • site.metadata - (optional, but recommended!) a data.frame containing the meta-data for all the species in the study. Note that this should have as many rows as there are sites in the dataset
  • species.metadata - (optional, but recommended!) a data.frame containing the meta-data for all the species in the study. Note that this should have as many rows as there are species in the dataset

An example:

.adler.2007 <- function(...){
    data <- read.csv(ft_get_si("E088-161", "allrecords.csv", from = "esa_archives"))
    comm <- with(data, tapply(area, list(species, plot_year), sum, na.rm=TRUE))
    return(.matrix.melt(comm))
}

This is a slightly contrived example, because I wanted to use the same dataset as we had above, but you can see the general pattern. We grab data and then in order to showcase .matrix.melt we turn it into a matrix, but of course we wouldn't do this for a real study. We can then use matrix.melt to merge up this data, and then return it out to the user.