Skip to content

Data preprocessing

miturbide edited this page Oct 28, 2016 · 8 revisions

###Data formats and available data in mopa

####Presence data

In Iturbide et al., 2015, we indicate the adequacy of using different ecotypes for modelling species distributions. Thus, functions in the mopa package are intended to deal with more than one group of presences simultaneously. In this example, we use a list of 2 different Oak haplotypes (H11 and H5, see Table 1 in the manuscript), but the same steps may be followed in a joint analysis of multiple species. Of course, the most typical case (i.e., dealing with with a single group of species) can be also easily accomplished by providing a data.frame. The data set of species in this example is Oak_phylo2 and is provided with the mopa package. This is a modified subset of the Quercus sp Europe Petit 2002 database (Petit et al., 2002b), which is available in the Georeferenced Database of Genetic Diversity or (GD)2 . To aid in map representation, a dataset called wrld containing a World map is also included in the package.

# Load map and Oak data
data(wrld)
data(Oak_phylo2)

# Map
plot(wrld, asp = 1, xlim= c(-10,50), ylim=c(40,60))
for (i in 1:length(Oak_phylo2)) {
        points(Oak_phylo2[[i]], pch = "*", cex = 0.5, col = colors()[i*50])
}

####Environmental variables

Predictor variables are typically stored in raster files. The different raster layers can be efficiently handled in R using the utilities of the raster package. mopa uses as input this type of raster objects. In particular, multiple layers can be arranged in a collection of RasterLayers objects called a RasterStack (see ?raster::raster for more information on raster objects).

# RatserStack of environmental variables
data(biostack)
plot(biostack)

####Background grid

The regularly distributed grid points covering the continental area can be created from the environmental rasters with functions from the raster and sp packages as follows:

# Extract raster values at grid coordinates
ac <- xyFromCell(biostack[[1]], 1:ncell(biostack[[1]]))
ex <- extract(biostack[[1]], ac)
# Convert to a Spatial object and define projection
sp_grid_world <- SpatialPoints(ac[-which(is.na(ex)), ])
projection(sp_grid_world) <- CRS("+proj=longlat +init=epsg:4326")

For ease of use, the previous steps can be skipped, as the sp_grid_world object is already included in the mopa package, covering the whole world at 10 km resolution.

data(sp_grid_world)

There is another SpatialPoints object available, this is sp_grid_europe, and covers Europe at 30km resolution.

data(sp_grid_europe)

###Delimit the study area (modelling background)

Function boundingCoords creates a matrix of bounding coordinates around a set of localities.

Function delimit creates a rectangular polygon shape from the bounding coordinates and does the intersection of the background points (e.g. sp_grid_world). A list with two objects is obtained: (1)bbs: polygon shape of the bounding boxes and (2)bbs.grid: a list of data frames of the background point grid limited by the bounding coordinates.

####Bounded study area around presences

If we want to bound the modelling area to the species distribution domain, the object to be passed to function boundingCoords is the known presence localities of the target species. In this case, since the Oak_phylo2 list contains two groups of points (one for each haplotype considered), a list of two matrices is created.

oak.extension <- boundingCoords(xy = Oak_phylo2)
box.grid <- delimit(bounding.coords = oak.extension, grid = sp_grid_europe)
# Plot presences and bounding boxes
plot(box.grid$bbs, asp = 1)
for (i in 1:length(Oak_phylo2)){
     points(Oak_phylo2[[i]], col = colors()[i*50])
}

####Considering the whole study area

Alternatively, if we prefer to use the whole study area in the modelling phase, the object to be passed to function boundingCoords are the coordinates of the background grid covering our study area, e.g. object sp_grid_europe.

oak.extension <- boundingCoords(xy =  rep(list(coordinates(sp_grid_europe)), length(Oak_phylo2)))
box.grid <- delimit(bounding.coords = oak.extension, grid = sp_grid_europe, names = "europe")
# Plot presences and bounding boxes
plot(box.grid$bbs, asp = 1)
for (i in 1:length(Oak_phylo2)){
     points(Oak_phylo2[[i]], col = colors()[i*50])
}