-
Notifications
You must be signed in to change notification settings - Fork 6
Pseudo absence data generation and modelling
###Three-step pseudo-absences generation
In this section we illustrate the steps followed to generate pseudo-absences following the RSEP, TS and TSKM procedures described in Iturbide et al., 2015.
####STEP1: environmental profiling
The first step is the selection of the environmentally unsuitable areas using a presence-only algorithm. In mopa this is done using a support vector machine based algorithm that performs a preliminary binary classification of the study
region (suitable/unsuitable) using as input the environmental conditions of the presence localities. This is done by function OCSVMprofiling
which runs the one-class support vector machine algorithm (OCSVM) for each Oak group of the example:
unsuitable.bg <-OCSVMprofiling(xy = Oak_phylo2, varstack = biostack, bbs.grid = box.grid$bbs.grid)
# Plot areas predicted as suitable (presence) and unsuitable
# (absence) for group H11
plot(unsuitable.bg$absence$H11, pch = "*", asp = 1)
points(unsuitable.bg$presence$H11, pch = "*", col = "pink2")
If the RSEP method (as described in Iturbide et al., 2015) is selected for pseudo-absence data generation, only the first step is needed, thus, at this point we can create random pseudo-absences in the unsuitable background and perform SDM. The same functions as in the TS and TSKM methods might be used for this purpose, these are PseudoAbsences
and allModeling
, which are detailed in the section below (STEP2).
# Pseudo-absences are generated at random, in equal number to presences (prevalence) and keeping a 10 km distance to presences (exclusion buffer)
rsep_random <-PseudoAbsences(xy = Oak_phylo2, bg.grids = unsuitable.bg$absence, exclusion.buffer = 0.083, prevalence = 0.5, kmeans = FALSE)
# Bind presences and absences before modelling
presausRSEP <- list()
for (i in 1:length(Oak_phylo2)){
xyp <- cbind(Oak_phylo2[[i]], rep(1, nrow(Oak_phylo2[[i]])))
xya <- cbind(rsep_random[[i]], rep(0, nrow(rsep_random[[i]])))
colnames(xyp) <- c("x", "y", "p")
colnames(xya) <- colnames(xyp)
presausRSEP[[i]]<- rbind(xyp, xya)
}
names(presausRSEP) <- names(Oak_phylo2)
# Modelling
modirsRSEP <- allModeling(data = presausRSEP, varstack = biostack, k = 10, algorithm = "mars", destdir = getwd(), projection = CRS("+proj=longlat +init=epsg:4326"))
####STEP 2: SDM performing with pseudo-absences generated into different extents of the unsuitable background
In the second step, SDMs are performed with pseudo-absences generated into different extents of the unsuitable background. Several functions are involved in this step. Function bgRadio
performs the partition of the background space considering multiple distance thresholds. In other words, it creates backgrounds of different spatial extent for each species/population. In the example below, extents are created for a sequence of 10 km between distances, from 20 km to half the length of the diagonal of the bounding box, as described in Sec. 2.4 of the manuscript. A list of matrices containing xy coordinates is returned, each matrix corresponding to a different background extent tested.
ext <- bgRadio(xy = Oak_phylo2, bounding.coords = oak.extension,
bg.absence = unsuitable.bg$absence, start = 0.166, by = 0.083, unit = "decimal degrees")
# Plot presences for group H11 and background extents of 20, 120 and 520 km
plot(ext$H11$km520, col = "green4", pch = "*", asp = 1)
points(ext$H11$km120, pch = "*")
points(ext$H11$km20, pch = "*", col = "blue")
points(Oak_phylo2$H11, col = "red", pch = ".", cex = 1.5)
Function PseudoAbsences
creates pseudo-absences either at random (TS method) or using the k-means clustering approach (TSKM method) for all the background extents considered. Prevalence (proportion of presences against pseudo-absences) and the exclusion buffer (minimum distance to be kept to presences without pseudo-absences) can also be set in this function using the arguments prevalence
and exclusion.buffer
.
######At random (TS method) In the example below, pseudo-absences are generated at random, in equal number to presences (prevalence) and keeping a 10 km distance to presences (exclusion buffer).
pa_random <-PseudoAbsences(xy = Oak_phylo2, bg.grids = ext, exclusion.buffer = 0.083, prevalence = 0.5, kmeans = FALSE)
# Plot presences/pseudo-absences for group H11 considering the background extent of 120 km
plot(ext$H11$km120, pch="*", col= "grey", cex=.5, asp=1)
points(pa_random$H11$km120, col= "red", pch=".", cex=3)
points(Oak_phylo2$H11, col= "blue", pch=".", cex=3)
######With k-means clustering (TSKM method)
In the example below, pseudo-absences are generated with k-means clustering, in equal number to presences (prevalence) and keeping a 10 km distance to presences (exclusion buffer).
pa_kmeans <-PseudoAbsences(xy = Oak_phylo2, bg.grids = ext, exclusion.buffer = 0.083, prevalence = 0.5, kmeans = TRUE,
varstack = biostack)
# Plot presences/pseudo-absences for group H11 considering the background extent of 120 km
plot(ext$H11$km120, pch = "*", col = "grey", cex = .5, asp = 1)
points(pa_kmeans$H11$km120, col = "red", pch = ".", cex = 3)
points(Oak_phylo2$H11, col = "blue", pch = ".", cex = 3)
Function bindPresAbs
binds presence and absence data for each background extension.
presausTS <- bindPresAbs(presences = Oak_phylo2, absences = pa_random)
The allModeling
function performs the species distribution modelling and the k-fold cross-validation for a set of presence/absence data per species, corresponding to different background extents. Algorithms supported are ”glm”, ”svm”, ”maxent”, ”mars”, ”randomForest”, ”cart.rpart” and ”cart.tree”. In the example below, we perform a 10-fold cross validation using the ”mars” modelling algorithm.
modirsTS <- allModeling(data = presausTS, varstack = biostack, k = 10, algorithm = "mars", destdir = getwd(), projection = CRS("+proj=longlat +init=epsg:4326"))
Named .Rdata
objects are stored in the specified directory in destdir
. Each file is automatically named, indicating the algorithm, background extent, and species/population in this order (if a single species is provided, no name is given for the species). For instance, the file mars_bgkm20_hgH11.Rdata stores the R object (a list) containing the SDM results for the MARS algorithm, considering a 20 km background extent and the Oak group H11. In particular, the output list contains the following components:
(1)allmod: fitted model with all data for training (2)auc: AUC statistic in the cross validation (3)kappa: kappa statistic in the cross validation (4)tss: true skill statistic in the cross validation (5)mod: fitted model with partitioned data (6)p: cross model prediction.
####STEP3: selection of the optimum background extent and corresponding fitted model
In the third step, AUCs obtained and corresponding extents are fitted to three non linear models (Michaelis-Menten, exponential2 and exponential3). The model that scores the lowest error is automatically selected to extract the Vm coefficient (equation 1 in the manuscript). Then, the minimum extent at which the AUC surpasses the Vm value is selected as the threshold extent (see Figure 3 in the manuscript), being the corresponding fitted SDM the definitive to predict suitability probabilities in the study area.
We next indicate how to plot the results of the optimal spatial extent selection using the lattice package. First we load the data generated with the allModeling
function and extract the corresponding AUC values. Function loadTestValues
loads and stores AUC data in a matrix:
auc_mars <- loadTestValues(models = modirsTS, test = "auc")
library(lattice)
levelplot(auc_mars, aspect = 5,
scales = list(y = list(cex = 0.8, at = c(1, 49, ncol(auc_mars)),
labels = c(colnames(auc_mars)[1],
colnames(auc_mars)[49],
colnames(auc_mars)[ncol(auc_mars)]))),
at = seq(0.6, 1, 0.01),
col.regions = bpy.colors,
xlab = "Haplogroups",
ylab = "Background extent (km)",
main = "AUC")
Model fitting is done by function indextent
, that internally uses the nls
function of R package stats
. An index of the threshold extents is obtained. A fitted model plot (as in Fig. 3 in Iturbide et al., 2015) is also returned if
argument diagrams
is set to TRUE
.
ind <- indextent(testmat = auc_mars, diagrams = TRUE)
ind
##km380 km190
##37 18
The ind
object in the example above gives the index to extract the best model components and associated data, by means of function loadDefinitiveModel
.
def <- loadDefinitiveModel(ind, slot = "allmod")
Once the optimal SDMs are chosen, we can generate the resulting suitability maps. In the example below we use function biomat
for preparing a matrix with the variables for prediction in the study area. Then, the predictions are converted to a raster format with functions SpatialPixelsDataFrame
(from the sp package) and raster
(from the raster package).
# Suitability map for the Oak group H11
# Function ✬biomat✬ prepares matrix with variables for projection
projectionland <- biomat(cbind(box.grid$bbs.grid$H11, rep(1, nrow(box.grid$bbs.grid$H11))), biostack)
# Prediction
p <- predict(def$H11, projectionland[ ,-1])
p[which(p < 0)] <- 0
p[which(p > 1)] <- 1
# Convert prediction to a raster object
spp <- SpatialPixelsDataFrame(box.grid$bbs.grid$H11, as.data.frame(p))
ras <- raster(spp)
plot(ras)
We can combine functions in mopa to apply alternative methods of pseudo-absence data generation. Functions performing each step in RSEP, TS and TSKM are indicated in the conceptual diagram of the manuscript (Fig. 2). Functions involved in the TS and TSKM methods are:
boundingCoords + delimit + OCSVMprofiling + bgRadio + pseudoAbsences + bindPresAbs + allModeling + loadTestValues + indextent + loadDefinitiveModel,
while the RSEP method only applies the first step of the Three-step methods, being the involved functions:
boundingCoords + delimit + OCSVMprofiling + pseudoAbsences + allModeling.
If we want to establish a threshold distance of the background but are not interested in doing an environmental profiling of the background in the previous step, we can combine functions this way:
boundingCoords + delimit + bgRadio + pseudoAbsences + bindPresAbs + allModeling + loadTestValues + indextent + loadDefinitiveModel.