Skip to content
luismurao edited this page Mar 24, 2016 · 11 revisions

NicheToolbox: from getting biodiversity data to evaluating species distribution models in a friendly GUI environment.

Background

Species Distribution Modeling (SDM) also known as Ecological Niche Modeling (ENM) is a growing field of ecology which aims to estimate the geographical distribution of the species. ENM uses a set of mathematical and statistical tools to study the relationship between some environmental variables and species occurrences to estimate species niches and predict potential areas where the species can survive. This models have proved to have a huge impact in ecology and conservation plans because they are used to find geographic localities that can be used to relocate endangered species, to find biodiversity hotspots, or in other context, localities that are vulnerable to invasive species and pathogens (Peterson 2003; Peterson & Vieglais 2001). Although there are software available (MaxEnt, GARP, dismo, biomod2, hSDM) to estimate species niches and distributions, in some sense they can be obscure because: a) in the worst case their code is closed and b) the open source alternatives, represent big challenge for people not familiar with coding. In this direction we would like to propose NicheToolBox project which will be an R package with a friendly Graphical User Interface (GUI) developed using shiny framework. The package aims to facilitate the process of building niche models and estimate the species distributions. To do the above it will incorporate functions to curate species occurrence data (clean duplicated records) and build models to estimate species niches (Bioclim, MaxEnt, Ellipsoid model) and distributions. After building a model the user will have the chance to evaluate its performance using Partial Roc, Confusion matrices and the associated metrics to it (Fielding & Bell, 1997). Finally in order to make the process of niche modeling transparent, the application will have an option to download a workflow (in html, pdf and .doc) with the code that reproduces all the analysis that the user has made inside the application; this workflow can be shared with users interested to learn how to make a niche model using the R language.

Related work

Although the above packages have many functionalities related to the species distribution modeling process, their potential have not been exploited because they don’t provide a graphical user interface (GUI) that would allow people not familiar with coding in R use them directly. NicheToolBox will be a shiny application with a friendly user interface which will allow the users to get and curate biodiversity data, make niche models and evaluate them on a transparent and reproducible way because it will have a workflow with the code that reproduces the analysis that they made inside the GUI; this will help the users with no coding experience to get familiar with R and exploit the functionalities of the other R packages that NicheToolBox will import for building models. It should be noted that there are some missing functionalities that the above packages do not have; some of them are the ability to do niche models using the ellipsoid model framework, the option to curate biodiversity data using leaflet maps and an easy way to explore and visualize the environmental niche space.

Details of your coding project

The project will be divided in 4 stages, in all stage the student will develop the GUI using the shiny package:

  1. In the first stage the student will develop an GUI interface to search, curate and download biodiversity data. For this purpose the student should be familiar with the functions of spocc package and the spatial R packages (raster, sp, rgdal, rgeos) for curating duplicated data.

  2. The second stage correspond data explorations in both geographic and niche spaces; the exploration in geographic space should allow the users to see their georeferenced occurrence data on leaflet maps and have options to curate bad georeferenced records on it. For niche explorations the student will develop functions to extract information of the georeferenced records from rasters of climatic information; show this information on tables and in 3 dimensional plots.

  3. In the third stage the student will develop functions to make niche models and evaluate them using both non dependent and threshold dependent evaluation. The models that application need are: bioclim, maxEnt which are implemented in the dismo package. Similarly the student will program a version of the ellipsoid algorithm which is not in dismo. For model evaluation the student will program a function to estimate confusion matrices and the associated metrics to it; the student should program the shiny version of the function implemented on ENMGadgets package that does Partial Roc (Peterson et al. 2008).

  4. In the final stage the student will develop the methods to generate and download a workflow (in html, pdf or word) of what has been done inside the application. The workflow should show the code ran to generate the analysis.

Expected impact

Due to the discover of the large range of potential applications (like conservation decision making, study the impacts of human activities on biodiversity, effects of climate change on species distributions) of knowing where does a species live, the use of species distribution models has grown in almost an exponential rate; each year more and more scientific papers that use this kind of models are been published. As a result of the above, a couple of specialized R packages has been released (see related work section) but they have not been exploited because many distribution modelers are used to work on GUI environments. NicheToolBox pretends to bring the users with no coding experience an easy way to use the functionalities of that R packages by providing them an friendly graphical user interface where they can do many of the analysis needed to build a species distribution model. To get familiar with R the application will allow the users to see the code that generates the analysis that they did inside the application. With this we hope to gain more R users.

Mentors

Tests

Write a shiny script to display species occurrence data obtained from a GBIF or VertNet

Solutions of tests

Students, please post a link to your test results here.

References

  • Fielding AH and Bell F. (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24(1):38–49
  • Peterson AT (2003) Predicting the geography of species' invasions via ecological niche modeling. Quarterly Review of Biology 78:419-433
  • Peterson AT, Papes M., Soberon J. (2008) Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological modeling 213:63–72
  • Peterson AT, Vieglais DA (2001) Predicting species invasions using ecological niche modeling: New approaches from bioinformatics attack a pressing problem. Bioscience 51:363-371
Clone this wiki locally