Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review code #90

Open
richelbilderbeek opened this issue Feb 28, 2020 · 6 comments
Open

Review code #90

richelbilderbeek opened this issue Feb 28, 2020 · 6 comments
Assignees

Comments

@richelbilderbeek
Copy link
Collaborator

To quote @joshwlambert:

If you are happy with this as an ongoing plan please let us know. The
code to be reviewed for our work is:
DAISIE_sim_constant_rate
DAISIE_sim_constant_rate_shift
DAISIE_sim_time_dependent
DAISIE_sim_core_constant_rate
DAISIE_sim_core_constant_rate_shift
DAISIE_sim_core_time_dependent
DAISIE_rates
DAISIE_max_rates
DAISIE_format_CS_full_stt
DAISIE_sample_event_constant_rate
DAISIE_sample_event_time_dependent
DAISIE_update_state_constant_rate
DAISIE_update_state_time_dependent

@richelbilderbeek
Copy link
Collaborator Author

To complete: at the geomdynamics branch.

@richelbilderbeek
Copy link
Collaborator Author

I'll have a look 👀

@richelbilderbeek richelbilderbeek self-assigned this Feb 28, 2020
@richelbilderbeek
Copy link
Collaborator Author

Hi DAISIE team,

I've reviewed the first function only, as most of this feedback will apply
to other functions as well. My feedback is extensive, but feel free to
ignore some points.
I enjoy seeing the DAISIE code improve: well done DAISIE team!

I will review the other functions if the feedback given
here is applied to those other functions as well. Just give me the thumbs up!

Keep up the good work!

Cheers, Richel

#' @title Simulate islands with given parameters.
#' @description This function simulates islands with given cladogenesis,
#' extinction, Kprime, immigration and anagenesis parameters. If a single
#' parameter set is provided (5 parameters) it simulates islands where all
#' species have the same macro-evolutionary process. If two paramater sets
#' (10 parameters) are provided, it simulates islands where two different
#' macro-evolutionary processes operate, one applying to type 1 species and
#' other to type 2 species. If two parameter sets and a time shift (11
#' parameters) are provided, it simulates islands where at the given time
#' a shift between the parameter sets will occur.

I enjoy the extensive amount of documentation. It is recommended however,
to do the following:

#' [thing that will the title]
#' [an empty line]
#' [thing that will the description]
#' [thing that will the description]
#' [thing that will the description]
#' Returns R list object that contains the simulated islands

Superior option: use @return. Which is done below.
So, this line can be removed 👍

#' @inheritParams default_params_doc

Brilliant, could not have done any better 😁

#' @return Each simulated dataset is an element of the list, which can be
#' called using [[x]]. For example if the object is called island_replicates,
#' the last replicates is a list in itself. The first (e.g.
#' island_replicates[[x]][[1]]) element of that list has the following
#' components: \cr \code{$island_age} - the island age \cr Then, depending on
#' whether a distinction between types is made, we have:\cr \code{$not_present}
#' - the number of mainland lineages that are not present on the island \cr
#' or:\cr \code{$not_present_type1} - the number of mainland lineages of type 1
#' that are not present on the island \cr \code{$not_present_type2} - the
#' number of mainland lineages of type 2 that are not present on the island \cr
#' \code{$stt_all} - STT table for all species on the island (nI - number of
#' non-endemic species; nA - number of anagenetic species, nC - number of
#' cladogenetic species, present - number of independent colonisations present
#' )\cr \code{$stt_stt_type1} - STT table for type 1 species on the island -
#' only if 2 types of species were simulated (nI - number of non-endemic
#' species; nA - number of anagenetic species, nC - number of cladogenetic
#' species, present - number of independent colonisations present )\cr
#' \code{$stt_stt_type2} - STT table for type 2 species on the island - only if
#' 2 types of species were simulated (nI - number of non-endemic species; nA -
#' number of anagenetic species, nC - number of cladogenetic species, present -
#' number of independent colonisations present )\cr \code{$brts_table} - Only
#' for simulations under 'IW'. Table containing information on order of events
#' in the data, for use in maximum likelihood optimization.)\cr
#'
#' The subsequent elements of the list each contain information on a single
#' colonist lineage on the island and has 4 components:\cr
#' \code{$branching_times} - island age and stem age of the population/species
#' in the case of Non-endemic, Non-endemic_MaxAge and Endemic anagenetic
#' species. For cladogenetic species these should be island age and branching
#' times of the radiation including the stem age of the radiation.\cr
#' \code{$stac} - the status of the colonist \cr * Non_endemic_MaxAge: 1 \cr *
#' ndemic: 2 \cr * Endemic&Non_Endemic: 3 \cr * Non_endemic: 4 \cr
#' \code{$missing_species} - number of island species that were not sampled for
#' particular clade (only applicable for endemic clades) \cr \code{$type_1or2}
#' - whether the colonist belongs to type 1 or type 2 \cr

This is a terrible return type to read. The return type is a list,
so I'd prefer the elements of the list to be itemized (using itemize).

#' @author Luis Valente and Albert Phillimore

Taking a look at the blame/praise
I see that Pedro and Josh worked on this. I suggest to add these to the list of
names.

#' @seealso \code{\link{DAISIE_format_CS}} \code{\link{DAISIE_plot_sims}}
#' @references Valente, L.M., A.B. Phillimore and R.S. Etienne (2015).
#' Equilibrium and non-equilibrium dynamics simultaneously operate in the
#' Galapagos islands. Ecology Letters 18: 844-852.
#' Hauffe, T., D. Delicado, R.S. Etienne and L. Valente (submitted).
#' Lake expansion increases equilibrium diversity via the target effect of
#' island biogeography.
#' @keywords models

👍

#' @examples
#'
#' cat("
#' ## Simulate 40 islands for 4 million years, where all species have equal
#' ## rates, and plot the species-through-time plot. Pool size 1000.
#'
#' pars_equal = c(2.550687345,2.683454548,Inf,0.00933207,1.010073119)
#' island_replicates_equal = DAISIE_sim_constant_rate(
#'    time = 4,
#'    M = 1000,
#'    pars = pars_equal,
#'    replicates = 40
#'    )
#'
#' ## Simulate 15 islands for 4 million years with two types of species (type1
#' ## and type 2), and plot the species-through-time plot. Pool size 1000.
#' ## Fraction of type 2 species in source pool is 0.163. Function will
#' ## simulate until number of islands where type 2 species has colonised is
#' ## equal to number specified in replicates.
#'
#' pars_type1 = c(0.195442017,0.087959583,Inf,0.002247364,0.873605049)
#' pars_type2 = c(3755.202241,8.909285094,14.99999923,0.002247364,0.873605049)
#' island_replicates_2types = DAISIE_sim_constant_rate(
#'    time = 4,
#'    M = 1000,
#'    pars = c(pars_type1,pars_type2),
#'    replicates = 15,
#'    prop_type2_pool = 0.163
#'    )
#' ## Simulate a non-oceanic island for 4 million years, and plot the
#' ## species-through-time plot. Pool size 1000. Island area as a proportion
#' ## of mainland is 0.1, proportion of native species is 0.9.
#'  pars = c(2.550687345,2.683454548,Inf,0.00933207,1.010073119)
#'  island_replicates = DAISIE_sim_constant_rate(
#'    time = 4,
#'    M = 1000,
#'    pars = pars,
#'    replicates = 40,
#'    nonoceanic_pars = c(0.1, 0.9)
#'  )
#' ## Simulate 15 islands for 4 million years with a shift in immigration rate
#' ## at 0.195 Ma, and plot the species-through-time plot. Pool size 296.
#'
#' pars_before_shift = c(0.079, 0.973, Inf, 0.136, 0.413)
#' pars_after_shift = c(0.079, 0.973, Inf, 0.652, 0.413)
#' tshift = 0.195
#' island_shift_replicates = DAISIE_sim_constant_rate_shift(
#'    time = 4,
#'    M = 296,
#'    pars = c(pars_before_shift, pars_after_shift),
#'    replicates = 15,
#'    shift_times = tshift
#'  )
#' ")

These examples can definitely be improved.

Example code should be runnable, simple and clean.
Instead, it is complex and does not even run!
As a user, I want example code that is guaranteed to be correct,
that is, checked by R CMD check. Please make the code do something simple.

Also, make the example simple. I am unsure what those numbers
in pars_equal are. Add extra variables like
immigration_rate <- 3.14 and put those in pars_equal.

The style is non-Tidyverse (an equal sign for assignment is so 1970s).
This gives a bad impression.

Final point: these simulations in the examples are, as far as I remember,
published simulations. Just make functions for those, like
run_sim_mee_2012 (and user superior naming). Refer to those function
here, in e.g. @seealso. Within this example, do something simple instead.

#'
#' @export
DAISIE_sim_constant_rate <- function(
  time,
  M,

I wish I knew what M was before reading the doc. Actually, I learned
that M is -I guess- the number of mainland species, thanks to the
more descriptive interface of DAISIE_sim_core_constant_rate: kudo's
to its author(s). The name n_mainland_species would be more appropriate.

  pars,

pars is kind of general. I think these are the DAISIE simulation parameters (unlike
the DAISIE ML estimation parameters). I'd prefer a more descriptive name
like sim_pars instead.

  replicates,

I predict this is the number of replicates, instead of some other DAISIE sim
output. I suggest to call this n_replicates instead: it's clearer and
n_something is a common idiom.

  divdepmodel = "CS",

OK enough.

  nonoceanic_pars = c(0, 0),

I see why this value is chosen: I predict there are two nonoceanic_pars,
that have no effect for values of zero. I would prefer an NA here,
because it does look needlessly complex

  num_guilds = NULL,

Would an num_guilds of 1 be equivalent and thus a more natural default
value? If not, NULL is fine.

  prop_type2_pool = NA,
  replicates_apply_type2 = TRUE,
  sample_freq = 25,

No idea why a sample_freq of 25 should be the default. Wouldn't 1 be
more natural?

  plot_sims = TRUE,

This argument should be removed: it is needlessly 'convenient', and complicates
the code needlessly instead. In the doc, use DAISIE_plot_sims in the
example code and redirect the user using @seealso to DAISIE_plot_sims

  hyper_pars = NULL,
  area_pars = NULL,
  dist_pars = NULL,

OK

  verbose = TRUE,

Verbose hould be false by default: Duck for 'Unix Golden Rule of Silence'.
Also, you may have already noticed, you need to silence the output
of such a function every test. Just silence it by default.

  ...

The use of ... is almost always evil. As is in this case: it is
not even used. Please remove that troublesome operator.

) {
  testit::assert(
    "length(pars) is not five and/or shift_times is not null, set
    five parameters with no shift_times or ten parameters with
    non-null shift_times",
    length(pars) == 5 || (length(pars) == 10 && !is.na(prop_type2_pool))
  )
  testit::assert(
    "2 type islands cannot have species on the island initially",
    is.na(prop_type2_pool) || !is.na(prop_type2_pool) && nonoceanic_pars[1] == 0
  )
  testit::assert(
    "prop_type2_pool should either be NA for no type 2 species or value between
    0 and 1",
    is.na(prop_type2_pool) || (prop_type2_pool >= 0 && prop_type2_pool <= 1)
  )

I enjoy this error checking!

But first, this error checking is already quite complex. Use a function
for each check instead: this will be easier to debug, easier to read,
easier to write.

Second, it is getting clear that the function
is actually doing two things: check for 5 or 10 parameters. I suggest that
this function quickly redirects to functions specialized on 5 or 10 parameters.
Of course, there should be little code duplication, so things done for both
5 and 10 parameters should be put into little functions as well.

I do see the function is mostly a redirection function already, depending
on the diversity dependence model. You'll see that below I will request that
this function does even more so. Do add example code for each possible flow
through this function, which are ?four models (CS 5 pars, CS 10 pars, IW and GW).

Not all input parameters are checked. For example, for the diversity
dependence model: there is no check if a
valid model is used (e.g. testthat::expect_true(divdepmodel %in% c("IW", "CW", "GW"))).
If the XXX (yes, XXX indeed) diversity dependence model is used, there
is no helpful error message. Add some functions
like assert_div_dep_model/check_div_dep_model to check a
divdepmodel for validity or use get_all_div_dep_models to get all valid
models.

I see it is hard to check all input parameters: the function has tons of
arguments. This can be reduced by using more structured data, but that is
quite some work.

  totaltime <- time
  island_replicates <- list()
  if (divdepmodel == "IW") {
    for (rep in 1:replicates) {
      island_replicates[[rep]] <- DAISIE_sim_core_constant_rate(
        time = totaltime,
        mainland_n = M,
        pars = pars,
        nonoceanic_pars = nonoceanic_pars,
        hyper_pars = hyper_pars,
        area_pars = area_pars,
        dist_pars = dist_pars
      )
      if (verbose == TRUE) {
        print(paste("Island replicate ", rep, sep = ""))

Use if (verbose) ... instead of checking for TRUE explicitly.
Do not use print, use message instead, it's the superior
Tidyverse way. Use paste0 instead of paste(..., sep = ""): it's shorter.

      }
    }
    island_replicates <- DAISIE_format_IW(island_replicates = island_replicates,
                                          time = totaltime,
                                          M = M,
                                          sample_freq = sample_freq,
                                          verbose = verbose)

Would the 'convenient' plotting be removed, one can write
return(DAISIE_format_IW(...)) here.

  }
  if (divdepmodel == "CS") {
    if (length(pars) == 5) {
      for (rep in 1:replicates) {
        island_replicates[[rep]] <- list()
        full_list <- list()
        for (m_spec in 1:M) {
          full_list[[m_spec]] <- DAISIE_sim_core_constant_rate(
            time = totaltime,
            mainland_n = 1,
            pars = pars,
            nonoceanic_pars = nonoceanic_pars,
            hyper_pars = hyper_pars,
            area_pars = area_pars,
            dist_pars = dist_pars
          )
        }
        island_replicates[[rep]] <- full_list
        if (verbose == TRUE) {
          print(paste("Island replicate ", rep, sep = ""))
        }
      }
    } else if (length(pars) == 10) {
      if (replicates_apply_type2 == TRUE) {
        island_replicates <- DAISIE_sim_min_type2(
          time = totaltime,
          M = M,
          pars = pars,
          replicates = replicates,
          prop_type2_pool = prop_type2_pool,
          verbose = verbose)
      } else {
        for (rep in 1:replicates) {
          pool2 <- DDD::roundn(M * prop_type2_pool)
          pool1 <- M - pool2
          lac_1 <- pars[1]
          mu_1 <- pars[2]
          K_1 <- pars[3]
          gam_1 <- pars[4]
          laa_1 <- pars[5]
          lac_2 <- pars[6]
          mu_2 <- pars[7]
          K_2 <- pars[8]
          gam_2 <- pars[9]
          laa_2 <- pars[10]
          full_list <- list()
          #### species of pool1
          for (m_spec in 1:pool1) {
            full_list[[m_spec]] <- DAISIE_sim_core_constant_rate(
              time = totaltime,
              mainland_n = 1,
              pars = c(lac_1,
                       mu_1,
                       K_1,
                       gam_1,
                       laa_1)
            )
            full_list[[m_spec]]$type1or2  <- 1
          }
          #### species of pool2
          for (m_spec in (pool1 + 1):(pool1 + pool2)) {
            full_list[[m_spec]] <- DAISIE_sim_core_constant_rate(
              time = totaltime,
              mainland_n = 1,
              pars = c(lac_2,
                       mu_2,
                       K_2,
                       gam_2,
                       laa_2)
            )
            full_list[[m_spec]]$type1or2 <- 2
          }
          island_replicates[[rep]] <- full_list
          if (verbose == TRUE) {
            print(paste("Island replicate ", rep, sep = ""))
          }
        }

This part is definitly not a function read, with lac_1 <- pars[1],
mu_1 <- pars[2] and so on. Put this in a seperate function instead.

      }
    }
    island_replicates <- DAISIE_format_CS(
      island_replicates = island_replicates,
      time = totaltime,
      M = M,
      sample_freq = sample_freq,
      verbose = verbose
    )

Would the 'convenient' plotting be removed, one can write
return(DAISIE_format_CS(...)) here.

  }

  if (divdepmodel == "GW") {
    if (!is.numeric(num_guilds)) {
      stop("num_guilds must be numeric")
    }
    guild_size <- M / num_guilds
    testit::assert(num_guilds < M)
    testit::assert(M %% num_guilds == 0)
    for (rep in 1:replicates) {
      island_replicates[[rep]] <- list()
      full_list <- list()
      for (m_spec in 1:num_guilds) {
        full_list[[m_spec]]  <- DAISIE_sim_core_constant_rate(
          time = totaltime,
          mainland_n = guild_size,
          pars = pars,
          nonoceanic_pars = nonoceanic_pars,
          hyper_pars = hyper_pars,
          area_pars = area_pars,
          dist_pars = dist_pars
        )
      }
      island_replicates[[rep]] <- full_list
      if (verbose == TRUE) {
        print(paste("Island replicate ", rep, sep = ""))
      }
    }
    island_replicates <- DAISIE_format_GW(island_replicates = island_replicates,
                                          time = totaltime,
                                          M = M,
                                          sample_freq = sample_freq,
                                          num_guilds = num_guilds,
                                          verbose = verbose)

Would the 'convenient' plotting be removed, one can write
return(DAISIE_format_GW(...)) here.

  }
  if (plot_sims == TRUE) {
    DAISIE_plot_sims(
      island_replicates = island_replicates,
      sample_freq = sample_freq
    )
  }

Remove this 'convenient' plotting. It has nothing to do with simulating.

  return(island_replicates)
}

@richelbilderbeek
Copy link
Collaborator Author

I will unassign me now, up until I am given the green light for another round of feedback 😇

@richelbilderbeek richelbilderbeek removed their assignment Mar 2, 2020
@Neves-P
Copy link
Collaborator

Neves-P commented Mar 9, 2020

@richelbilderbeek, thank you so much for such great comments on the code! I will go over each topic with the rest of the DAISIE team to discuss/implement changes, we'll probably break this down into individual issues to go step by. Thanks again, we'll keep everyone updated! 👍

@Neves-P Neves-P self-assigned this Mar 9, 2020
@richelbilderbeek
Copy link
Collaborator Author

Thanks! I am happy to hear that the feedback is appreciated! Good luck 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants