Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BPA Ch.13: Incorrect occ_fs #99

Open
mikemeredith opened this issue Mar 17, 2017 · 8 comments
Open

BPA Ch.13: Incorrect occ_fs #99

mikemeredith opened this issue Mar 17, 2017 · 8 comments

Comments

@mikemeredith
Copy link
Contributor

(I'm a Stan newbie, but I have played with occupancy models a lot, including the WinBUGS and JAGS code in BPA.)

These are occupancy studies, and occ_fs is the number of sites in the sample which are occupied. As such, it cannot be less than the number of sites observed to be occupied (occ_obs) or greater than the total number of sites in the sample (R). The example code allows these constraints to be violated; it's just (bad!) luck that this doesn't happen with the example data from BPA.

A key idea is the conditional probability of occupancy, psi_con, conditional on the data. So if the species is not detected, we know the site is in the yellow area in the diagram below.
venn_bpa_ch13

Model site_occ.stan

The simple model has one value for psi and one for p, so all sites with no detections have the same value for psi_con, called psi_nd in the code below:

generated quantities {
  int<lower=occ_obs, upper=R> occ_fs;
  real psi_nd;            // prob present | not detected
  psi_nd = (psi * (1 - p)^T) / (psi * (1 - p)^T + (1 - psi));
  occ_fs = occ_obs + binomial_rng(R - occ_obs, psi_nd);
}

Model site_occ_cov.stan

Here each site has a different psi and p, and a good strategy is to calculate psi_con for each, which is in any case a result you may want to monitor.

generated quantities {
  int occ_fs;       // Number of occupied sites
  real psi_con[R];  // prob occupied | data
  int z[R];         // occupancy indicator, 0/1

  for (i in 1:R) {
    if (sum_y[i] == 0) {  // species not detected
      real      psi = inv_logit(logit_psi[i]);
      vector[T] q = inv_logit(-logit_p[i])';  // q = 1 - p
      real      qT = prod(q[]);
      psi_con[i] = (psi * qT) / (psi * qT + (1 - psi));
      z[i] = bernoulli_rng(psi_con[i]);
    } else {             // species detected at least once
      psi_con[i] = 1;
      z[i] = 1;
    }
  }
  occ_fs = sum(z);
}

Model bluebug.stan

The same strategy can be applied here, this time allowing for the variable number of visits to each site. So we replace

vector[T] q = inv_logit(-logit_p[i])';

with

vector[last[i]] q = inv_logit(-logit_p[i, 1:last[i]])';

Other models

I haven't looked closely at the dynamic models, but it appears that the reported z and n_occ values are also based on unconditional psi values. Of more concern are the closed-capture models in Ch.6, where the analogous value, the population size N, appears to have the same problem; this has a smaller numerical impact, as the denominator of the conditional inclusion is always large (ie, close to 1), but should be fixed.

@bob-carpenter
Copy link
Contributor

@ito4303 Would you mind looking at this?

@mikemeredith Thanks for the carefully constructed issue. Is this a problem with the BUGS code in the original book or just an issue with the translation?

I have a complete replication of Dorazio and Royle's occupancy model here:

http://mc-stan.org/documentation/case-studies/dorazio-royle-occupancy.html

I hope that's a faithful translation of their cited paper, but if not, please let me know, as I understand that model better than the BPA models. In the Dorazio and Royle approach, the result gets bounded between observed and max (pseudo-sites) because you start with the occupied ones (they're observed) then add in expected occupance of other ones, which has to be between 0 and 1. So we don't need to declare constraints to enforce the constraint.

@mikemeredith
Copy link
Contributor Author

mikemeredith commented Mar 18, 2017

Exceeding the constraints is a symptom, not the problem; enforcing constraints isn't the solution but is a check. The problem is that, in the call to binomial_rng, N is too big and theta is too small. That will often give a plausible result, but is still wrong.

In the BUGS code, they use a latent binary variable, z[i], for occupancy for each site, and then occ_fs = sum(z). They don't need to do the kind of posterior predictive thing. So it's an issue with the translation.

If I were really using Stan (instead of translating), I'd get the conditional (on the data) probability of occupancy for each site and sum those to get an expectation for occ_fs.

I'll look at the Dorazio et al stuff later, but what you describe is I think the right way to do it and what's needed for the BPA Ch.13 examples and the closed-capture examples in Ch.6.

@ito4303
Copy link
Contributor

ito4303 commented Mar 18, 2017

@mikemeredith I greatly thank you for pointing out the problems. As you pointed out, they are incorrect translations.

I will contribute corrected models as soon as possible. Or, perhaps you can create pull requests with the corrected code.

@mikemeredith
Copy link
Contributor Author

@ito4303 I'll do pull requests for the models I've looked at so far. Give me a day or 2.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Mar 19, 2017 via email

@mikemeredith
Copy link
Contributor Author

@bob-carpenter There are issues with the write-up of the Dorazio & Royle analysis, but I think the math is all ok. Where would be a good forum to discuss this?

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Mar 20, 2017 via email

@mikemeredith
Copy link
Contributor Author

@bob-carpenter I've done a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants