Skip to content

Pseudoabsence data generation

miturbide edited this page Mar 21, 2017 · 17 revisions

In this section we illustrate the steps that are followed to generate pseudo-absences following the RSEP, TS and TSKM procedures described in Iturbide et al., 2015.

RSEP: environmental profiling + pseudo-absence generation (2 steps)

The first step is to perform a preliminary binary classification of the background (suitable/unsuitable) using as input the environmental conditions of the presence localities. In mopa, this is done by applying a support vector machine based algorithm with function OCSVMprofiling, which runs the one-class support vector machine algorithm (OCSVM) for each Oak group of the example:

bg.profiled <- OCSVMprofiling(xy = Oak_phylo2, varstack = biostack$baseline, background = bg$xy)

# Plot areas predicted as suitable (presence) and unsuitable
# (absence) for group H11
plot(bg.profiled$absence$H11, pch = "*", asp = 1, cex = 0.5)
points(bg.profiled$presence$H11, pch = "*", col = "pink2", cex = 0.5)

If the RSEP method (as described in Iturbide et al., 2015) is selected for pseudo-absence data generation, at this point, we can create random pseudo-absences in the unsuitable background with function pseudoAbsences. This function creates pseudo-absences either at random or using the k-means clustering approach. Prevalence (proportion of presences against pseudo-absences) and the exclusion buffer (minimum distance to be kept to presences without pseudo-absences) can also be set in this function using the arguments prevalence and exclusion.buffer. In the next example, pseudo-absences are generated at random, in equal number to presences (prevalence) and keeping a 50 km distance to presences (exclusion buffer).

RSEP_random  <- pseudoAbsences(xy = Oak_phylo2, background = bg.profiled$absence, exclusion.buffer = 0.083*5, prevalence = 0.5)

TS: environmental profiling + limit background to different distances + pseudo-absence generation (3 steps)

If the TS method is the choice for modeling, SDMs are performed with pseudo-absences generated into different extents of the unsuitable background. Function backgroundRadios performs the partition of the background space considering multiple distance thresholds. In other words, it creates backgrounds of different spatial extent for each species/population. In the example below, extents are created for a sequence of 100 km between distances, from 20 km to half the length of the diagonal of the bounding box, as described in Sec. 2.4 of the manuscript. A list of matrices containing xy coordinates is returned, each matrix corresponding to a different background extent tested.

bg.extents <- backgroundRadios(xy = Oak_phylo2, background = bg.profiled$absence, start = 0.166, by = 0.083*10, unit = "decimal degrees")

# Plot presences for group H11 and background extents of 120, 520 and 1020 km
plot(bg.extents$H11$km1020, col = "green4", pch = "*", asp = 1)
points(bg.extents$H11$km520, pch = "*", col = "yellow")
points(bg.extents$H11$km120, pch = "*", col = "pink")
points(Oak_phylo2$H11, pch = ".", cex = 1.5)

In the example below, pseudo-absences are generated for each background extent, at random and keeping a 50 km distance to presences (exclusion buffer). In this case, 3 times more pseudo-absences than the number of presences (prevalence) are generated in each background extent.

TS_random <- pseudoAbsences(xy = Oak_phylo2, background = bg.extents, exclusion.buffer = 0.083*5, prevalence = -0.5, kmeans = FALSE)


# Plot presences/pseudo-absences for group H11 considering the background extent of 1020 km (green)
plot(bg.extents$H11$km2920, pch = "*", col = "grey", cex = .5, asp = 1)
points(bg.extents$H11$km1020, pch = 18, col = "green4", cex =.5)
points(TS_random$realization01$H11$km1020, col= "red", pch = ".", cex = 1.5)
points(Oak_phylo2$H11, col = "blue", pch = ".", cex = 1.5)

TSKM: environmental profiling + limit background to different distances + pseudo-absence generation with k-means clustering (3 steps)

In the example below, pseudo-absences are generated with k-means clustering, for which parameters kmeans and varstack are specified.

TSKM <- pseudoAbsences(xy = Oak_phylo2, background = bg.extents, exclusion.buffer = 0.083*5, prevalence = -0.5, kmeans = TRUE, varstack = biostack$baseline)

# Plot presences/pseudo-absences for group H11 considering the background extent of 120 km
plot(bg.extents$H11$km2920, pch="*", col= "grey", cex=.5, asp=1)
points(bg.extents$H11$km1020, pch=18, col= "green4", cex=.5)
points(TSKM$realization01$H11$km1020, col= "red", pch=".", cex=1.5)
points(Oak_phylo2$H11, col= "blue", pch=".", cex=1.5)

Other methods for pseudo-absence data generation

We can combine functions in mopa to apply alternative methods of pseudo-absence data generation. Functions performing each step in RSEP, TS and TSKM that are indicated in the conceptual diagram of the manuscript (Fig. 2) are deprecated in the present package version, however, the new functions are here indicated:

Functions involved in the TS and TSKM methods are:

backgroundGrid + OCSVMprofiling + backgroundRadios + pseudoAbsences

while the RSEP method only applies the first step of the Three-step methods, being the involved functions:

backgroundGrid + OCSVMprofiling + pseudoAbsences

If we want to establish a threshold distance of the background but are not interested in doing an environmental profiling of the background in the previous step, we can combine functions this way:

backgroundGrid + backgroundRadios + pseudoAbsences

If we want to apply the RS method (random sampling of the whole study domain) we just need to use:

backgroundGrid + pseudoAbsences

The rest of the functions in mopa (e.g. mopaFitting) are common to all pseudo-absence generation methods (see section Model fitting and prediction).


go to next section -->